Exports - API Quickstart

In this tutorial we’ll cover the basics of creating an earned media export using the API.

Note that you need access to the earned media exports feature in your API package to use this feature.

Before you start

Take a look at the Platform Overview guide to understand the key concepts of the Meltwater platform.

To run through this tutorial, you will need:

  • Your Meltwater API token
  • A Saved Search in your Meltwater App account

Types of export

There are two types of exports:

  • One-time exports - these exports are run once, the data will not be refreshed automatically.
  • Recurring exports - these exports are run on a schedule you specify. Each time the export runs the data for the export is overwritten.

Authentication

You need to provide your API token when you call any Meltwater API endpoint.

You provide the API token as a header called apikey, for example:

curl -X GET \
  --url "https://api.meltwater.com/v3/searches" \
  --header "Accept: application/json" \
  --header "apikey: **********"

For full details take a look at the Authentication page.

Exports are created with an existing Meltwater search.

In this tutorial we’ll cover using an existing search, but you can create / edit searches using the API. See the Managing Saved Searches guide for more details.

To create an export you need to provide the id of the required search. You can use the GET /v3/searches endpoint to list your current searches:

curl -X GET \
  --url "https://api.meltwater.com/v3/searches" \
  --header "Accept: application/json" \
  --header "apikey: **********" 

Example response:

{
  "searches": [
    {
      "updated": "2023-01-10T14:42:10.000Z",
      "name": "Elon Musk",
      "id": 2382415
    }
  ]
}

Understanding common export options

The following options are available for both one-time and recurring exports.

Choosing an output template

The API allows you to choose from a selection of templates for your export, which you specify using the template field in your export request. The available templates are documented on the Export & Streaming Output Templates page.

We recommend using the general purpose “API JSON” template for most integrations, as this includes all the data fields most customers need.

To use the “API JSON” template you would use the following as part of your export request:

"template": {
  "name": "api.json"
}

If you do not specify a template in your request the API will use the legacy output template as documented here.

Controlling data volumes using sampling

When creating an export, you can provide optional sampling parameters to set a maximum document count and/or percentage sample.

This feature allows you to control the amount of documents you export, and so stay within your export limits, plus also export representative data for high-volume topics. Sampling is supported for both one-time and recurring exports, and returns a random sample across the matching documents that would be in a full export.

There are two parameters that control sampling of results:

  • count - the maximum number of documents you’d like to retrieve. Defaults to 2,000,000. Maximum value is 2 million.
  • percentage - the percentage of results you’d like to retrieve. Defaults to 100.

Please note, that the sampling process is approximate in that the number of results will be within ±10% of your parameters.

Example 1 - return a 1% sample of matching documents:

"sample": {
  "percentage": 1.0
}

Example 2 - return up to 50,000 documents:

"sample": {
  "count": 50000
}

Example 3 - return a 10% sample of documents, but if this results in more than 1,000 documents, reduce the sample rate to limit the total results to 1,000 documents:

"sample": {
  "count": 1000,
  "percentage": 10.0
}

By default if you do not specify these parameters your export will contain up to 2 million documents as a 100% sample. If your export request matches more than 2 million documents, then it will be sampled down to 2 million results.

Creating a one-time export

One-time exports are created using the POST /v3/exports/one-time endpoint. You need to provide a start_date, end_date, format and a search_id to create an export.

Note that times are provided in UTC timezone. This is required for one-time exports.

curl --location 'https://api.meltwater.com/v3/exports/one-time' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'apikey: **********' \
--data '{
    "onetime_export": {
            "search_ids": [<search_id>],
            "start_date": "2024-01-01T00:00:00Z",
            "end_date": "2024-02-01T00:00:00Z",
            "sample": {
                    "count": 1000,
                    "percentage": 10.0
            },
            "template": {
                    "name": "api.json"
            }
    }
}'

Example response:

{
  "onetime_export": {
    "updated_at": "2024-04-02T11:23:40.000000",
    "tags": [],
    "status_reason": "Export run has not completed yet",
    "status": "PENDING",
    "start_date": "2024-01-01T00:00:00.000000Z",
    "searches": [
      {
        "id": <search_id>,
        "name": <search_name>
      }
    ],
    "sample": {
      "count": 1000,
      "percentage": 10
    },
    "inserted_at": "2024-04-02T11:23:40.000000",
    "id": <export_id>,
    "end_date": "2024-02-01T00:00:00.000000Z",
    "data_url": <data_url>,
    "template": {
      "name": "api.json"
    }
  }
}

Fetching one-time export results

You can check the status of a one-time export using the GET /v3/exports/one-time/<export_id> endpoint.

curl -X GET \
  --url "https://api.meltwater.com/v3/exports/one-time/<export_id>" \
  --header "Accept: application/json" \
  --header "apikey: **********"

Example response:

{
    "onetime_export": {
        "updated_at": "2024-04-02T11:24:32.000000",
        "tags": [],
        "status_reason": "",
        "status": "FINISHED",
        "start_date": "2024-01-01T00:00:00.000000Z",
        "searches": [
          {
            "id": <search_id>,
            "name": <search_name>
          }
        ],
        "sample": {
            "count": 1000,
            "percentage": 10
        },
        "inserted_at": "2024-04-02T11:23:40.000000",
        "id": <export_id>,
        "end_date": "2024-02-01T00:00:00.000000Z",
        "data_url": <data_url>,
        "template": {
            "name": "api.json"
        }
    }
}

Once the status is FINISHED there will be results in JSON format at the data_url. If the status is still PENDING the data_url will return a 403 status code.

Export execution time
The size of an export depends on how many results the search generates, and how large time window it covers. The larger the export, the longer it will take to generate - this can vary from a minute for small exports up to an hour for very large exports.

Understanding export results

When you specify a CSV template for your output, the data_url will point to a CSV file for you to access, with the CSV containing columns as specified on the Export & Streaming Output Templates page.

For JSON templates, the structure of the result is as follows:

{
   "request": {
        "company_id": <the id of the account that owns the inputs used>,
        "count": <number of results>,
        "export_id": <the id of the export in the Meltwater system>,
        "inputs": [<the inputs used for the export]>,
        "period": {
            "start": <start of the export period requested>,
            "end": <end of the export period requested>
        },
        "status": <status of the export>
    },
    "docs": [
      <an object for each document in the export, according to the chosen template>
    ]
}

Note that prior to the templates feature being introduced, exports used a legacy output format as described here.

Creating a recurring export

Recurring exports run on a schedule you specify. Each time the schedule runs it overwrites the data provided at the data_url.

Specifying a time window and schedule

When you create a recurring export you have a number of parameters you can use to control the schedule and the period of data each run should include.

The window_time_unit parameter sets the frequency of the schedule, you can choose:

  • DAY - exports are run daily
  • WEEK - exports are run weekly
  • MONTH - exports are run monthly

The window_size allows you to specify the period of data to include in each export. Think of this value as multiple of the window_time_unit you chose. For example, specifying window_time_unit as DAY and window_size as 2 will set a recurring export to run every day which will include the last 2 days of data in each run.

You can use the following parameters to specify precisely the window of data included for each run:

  • For daily exports you can specify the daily start time with window_time
  • For weekly exports you can specify the day of the week with window_weekday and start time with window_time
  • For monthly exports you can specify the day of the month with window_monthday and start time with window_time

Note that you can use the timezone parameter to specify the timezone for your window_time. The timezone must be a valid zone as detailed in the IANA database.

As a full example, the following parameters create a recurring export that will run every day at 09:00 UTC including the last 7 days of data.

{
    "window_time_unit": "DAY",
    "window_size": 7,
    "window_time": "09:00:00",
    "timezone": "Etc/UTC"
}

Default values for reccuring export attributes are as follows:

  • window_time: "00:00:00"
  • window_weekday: 1 (Monday)
  • window_monthday: 1
  • timezone: Etc/UTC
Scheduling of recurring exports
Note that exports are executed by the platform 30 minutes after the end time of the required export period. This is to allow data to be ingested by the platform from providers.

Creating a recurring export

Recurring exports are created using the POST /v3/exports/recurring endpoint.

curl --location 'https://api.meltwater.com/v3/exports/recurring' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'apikey: **********' \
--data '{
  "recurring_export": {
    "search_ids": [<search_id>],
    "window_time_unit": "DAY",
    "window_time": "09:00:00",
    "window_size": 7,
    "timezone": "Etc/UTC",
    "sample": {
      "count": 1000,
      "percentage": 10.0
    },
    "template": {
      "name": "api.json"
    }
  }
}'

Example response:

{
  "recurring_export": {
    "updated_at": "2024-04-02T11:34:14.000000",
    "tags": [],
    "timezone": "Etc/UTC",
    "status_reason": "Export run has not completed yet",
    "status": "PENDING",
    "next_run_date": "2024-04-03T09:30:00Z",
    "searches": [
      {
        "id": <search_id>,
        "name": <search_name>
      }
    ],
    "sample": {
      "count": 1000,
      "percentage": 10
    },
    "inserted_at": "2024-04-02T11:34:14.000000",
    "id": <export_id>,
    "data_url": <data_url>,
    "window_time_unit": "DAY",
    "window_size": 7,
    "window_time": "09:00:00",
    "template": {
      "name": "api.json"
    }
  }
}

Fetching recurring export results

You can check the status of a recurring export using the GET /v3/exports/recurring/<export_id> endpoint.

curl -X GET \
  --url "https://api.meltwater.com/v3/exports/recurring/<export_id>" \
  --header "Accept: application/json" \
  --header "apikey: **********"

Example response:

{
  "recurring_export": {
    "updated_at": "2024-04-02T11:34:14.000000",
    "tags": [],
    "timezone": "Etc/UTC",
    "status_reason": "Export run has not completed yet",
    "status": "PENDING",
    "next_run_date": "2024-04-03T09:30:00Z",
    "searches": [
      {
        "id": <search_id>,
        "name": <search_name>
      }
    ],
    "sample": {
      "count": 1000,
      "percentage": 10
    },
    "inserted_at": "2024-04-02T11:34:14.000000",
    "id": <export_id>,
    "data_url": <data_url>,
    "window_time_unit": "DAY",
    "window_size": 7,
    "window_time": "09:00:00",
    "template": {
      "name": "api.json"
    }
  }
}

Once the status is ACTIVE there will be results in JSON format at the data_url. If the status is still PENDING, the data_url will still be available, but just contain an empty list of documents.

The first time an export is run for a recurring export the data is available at data_url. For subsequent runs the data will override previous results at the same data_url.