• This blog post was originally published on 7/07/2017 and was updated on 1/30/2019 and 11/5/2019 to include information about our new and improved Events API

We love it when developers use SparkPost webhooks to build awesome responsive services. Webhooks are great when you need real-time feedback on what your customers are doing with their messages. They work on a “push” model – you create a microservice to handle the event stream.

Did you know that SparkPost also supports a “pull” model Events API that enables you to download your event data for up to ten days afterwards? This can be particularly useful in situations such as:

  • You’re finding it difficult to create and maintain a production-ready micro-service. For example, your corporate IT policy might make it difficult for you to have open ports permanently listening;
  • You’re familiar with batch type operations and running periodic workloads, so you don’t need real-time events;
  • You’re a convinced webhooks fan, but you’re investigating issues with your almost-working webhooks receiver micro-service, and want a reference copy of those events to compare.

If this sounds like your situation, you’re in the right place! Now let’s walk through setting up a really simple tool to get those events.

Design goals

Let’s start by setting out the requirements for this project, then translate them into design goals for the tool:

  • You want it easy to customize without programming.
  • SparkPost events are a rich source of data, but some event-types and event properties might not be relevant to you. Being selective gives smaller output file sizes, which is a good thing, right?
  • Speaking of output files, you want event data in the commonly-used csv  file format. While programmers love JSON, CSV is easier for non-technical users (and results in smaller files).
  • You want to set up your SparkPost account credentials and other basic information once and once only, without having to redo them each time it’s used. Having to remember that stuff is boring.
  • You need flexibility on the event date/time ranges of interest.
  • You want to set up your local time-zone once, and then work in that zone, not converting values manually to UTC time. Of course, if you really want to work in UTC, because your other server logs are all UTC, then “make it so.”
  • Provide some meaningful comfort reporting on your screen. Extracting millions of events could take some time to run. I want to know it’s working.

Events, dear programmer, events …

Firstly, you’ll need python3 , pip and git installed on your system.  The easiest way to install is given here and needs just these commands:

git clone https://github.com/tuck1s/sparkyEvents.git
cd sparkyEvents
pip install pipenv
pipenv install
pipenv shell

For other platforms, this is a good starting point to get the latest Python download; there are many good tutorials out there on how to install.

We’re the knights who say “.ini”

Set up a sparkpost.ini  file as per the example in the Github README file here.

Replace <YOUR API KEY> with a shrubbery your specific, private API key.

Host is only needed for SparkPost Enterprise service usage; you can omit for sparkpost.com.

Events is a list, as per SparkPost Event Types; omit the line, or assign it blank, to select all event types.

Properties can be any of the SparkPost Event Properties. Definitions can split over lines using indentation, as per Python .ini file structure, which is handy as there are over seventy different properties. You can select just those properties you want, rather than everything; this keeps the output file to just the information you want.

If you run the tool without any command-line parameters, it prints a usage summary:

usage: sparkyEvents.py [-h] outfile.csv from_time to_time
sparkyEvents.py: error: the following arguments are required: outfile.csv, from_time, to_time

Use the -h option to get more help:
./sparkyEvents.py -h
usage: sparkyEvents.py [-h] outfile.csv from_time to_time

Simple command-line tool to retrieve SparkPost message events into a .CSV

positional arguments:
outfile.csv output filename (CSV format), must be writeable.
from_time Datetime in format of YYYY-MM-DDTHH:MM:ssZ, inclusive.
to_time Datetime in format of YYYY-MM-DDTHH:MM:ssZ, exclusive.

optional arguments:
-h, --help show this help message and exit

SparkPost API key, host, record event type(s) and properties are specified in

Here’s a typical run of the tool.
./sparkyEvents.py out3.csv 2019-11-05T08:00:00+05:00 2019-11-05T09:00:00+05:00
Time ranges to search are in timezone UTC+05:00
SparkPost events from 2019-11-05 08:00:00+05:00 to 2019-11-05 09:00:00+05:00, writing to out3.csv
Events:      <all>
Properties:  ['timestamp', 'type', 'subaccount_id', 'friendly_from', 'raw_rcpt_to', 'subject']
Total events to fetch:  119306
Page      1: got  10000 events in 1.219 seconds
Page      2: got  10000 events in 1.105 seconds
Page      3: got  10000 events in 0.977 seconds
Page      4: got  10000 events in 0.967 seconds
Page      5: got  10000 events in 1.320 seconds
Page      6: got  10000 events in 1.014 seconds
Page      7: got  10000 events in 1.168 seconds

That’s it! You’re ready to use the tool now. Want to take a peek inside the code? Keep reading!

Inside the code

Getting events via the SparkPost API

The SparkPost Python library doesn’t yet have built-in support for the events  endpoint. In practice the Python requests library is all we need. It provides inbuilt abstractions for handling JSON data, response status codes etc and is generally a thing of beauty.

One thing we need to take care of here is that the events endpoint is rate-limited. If we make too many requests, SparkPost replies with a 429  response code. We play nicely using the following function, which sleeps for a set time, then retries:

def getMessageEvents(url, apiKey, params):
        h = {'Authorization': apiKey, 'Accept': 'application/json'}

        moreToDo = True
        while moreToDo:
            response = requests.get(url, timeout=T, headers=h, params=params)

            # Handle possible 'too many requests' error inside this module
            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:
                if response.json()['errors'][0]['message'] == 'Too many requests':
                    snooze = 120
                    print('.. pausing', snooze, 'seconds for rate-limiting')
                    continue                # try again
                print('Error:', response.status_code, ':', response.text)
                return None

    except ConnectionError as err:
        print('error code', err.status_code)

Practically, when using event batches of 10000 I didn’t experience any rate-limiting responses even on a fast client. I had to deliberately set smaller batch sizes during testing, so you may not see rate-limiting occur for you in practice.

Selecting the Event Properties

SparkPost’s events have nearly sixty possible properties. Users may not want all of them, so let’s select those via the sparkpost.ini file. As with other Python projects, the excellent ConfigParser library does most of the work here. It supports a nice multi-line feature:

“Values can also span multiple lines, as long as they are indented deeper than the first line of the value.”

We can read the properties (applying a sensible default if it’s absent), remove any newline or carriage-return characters, and convert to a Python list in just three lines:

# If the fields are not specified, default to a basic few
properties = cfg.get('Properties', 'timestamp,type')
properties = properties.replace('\r', '').replace('\n', '')  # Strip newline and CR
fList = properties.split(',')

Writing to file

The Python csv library enables us to create the output file, complete with the required header row field names, based on the fList we’ve just read:

fh = csv.DictWriter(outfile, fieldnames=fList, restval='', extrasaction='ignore')

Using the DictWriter class, data is automatically matched to the field names in the output file, and written in the expected order on each line. restval=”  ensures we emit blanks for absent data, since not all events have every property. extrasaction=’ignore’ ensures that we skip extra data we don’t want.
for i in res['results']:
   fh.writerow(i)                          # Write out results as CSV rows in the output file

That’s pretty much everything of note. The tool is less than 150 lines of actual code.

Moving from message-events to new events API

In January 2019, I updated this tool to use the new Events API, following these migration guidelines. The new endpoint has more powerful search capabilities, but for now we’ve kept this tool functionally identical. Here are some notes on what I needed to change:

  • The events API paging mechanism is different. A cursor parameter is used, with value “initial” for the first page.
  • Instead of incrementing a “page” value, we get the cursor value for the next page from the response JSON data.

Take a look at the changes color-coded side-by-side here (thanks to GitHub). The details:

  • Change the API path from /api/v1/message-events  to /api/v1/events/message .
  • Pass a complete URL into the getMessageEvents  function, rather than building the parameters internally. This is because we get everything we need from the previous response data, making the code shorter (a good thing, right?).
  • Construct the initial URL parameters outside this function, and do it once only, for the first call made.
  • Simplify the link walking code in main (outermost scope). We don’t need to look for a links.rel.next  object any more – it’s in links.next .
  • We still increment event_page each time around, but it’s just used for human-readable comfort reporting.
  • After the first page, we set the passed-in params p  to None , because everything needed for the next call is already fully-formed in the url. The underlying requests library is happy with either format.

It took me around an hour to make those changes and another half an hour to test them. The updated code is shorter by five lines. I then made some general project improvements:

  • Install is now much easier, using pipenv
  • The project repo now uses Travis CI automated tests

Timezone handling, seconds resolution, and inclusive/exclusive range (Nov 2019 update)

Timezone is specified in the new API as part of the from  time and to  time parameters, whereas on the old API it was a separate textual parameter.

The time resolution in the new API is seconds, whereas the old API was minute resolution.

The time range is defined with more clarity in our API docs. The from  time is inclusive and the to  time is exclusive. This is generally good practice, allowing you to naturally search for a whole 24 hour time range “from midnight to midnight” UTC like this:

./sparkyEvents.py out4.csv 2019-11-05T00:00:00+0000 2019-11-06T00:00:00+0000

If you want US Eastern Standard Time, just use suffix -0500, for example. You can write offsets with or without a :  separator (-08:00  for Pacific Standard Time, for instance).

You can also write Z  as a shorthand for 0000, for UTC.

You need to consider whether you want to change your offset according to Daylight Savings in your locale, this is fully under your control (see xkcd!). For batch purposes it’s often better to work in UTC, or at least a fixed offset.

The sparkyEvents tool now makes use of the above improvements. I’ve also noticed the new API is significantly faster than the old.

You’re the Master of Events!

So that’s it! You can now download squillions of events from SparkPost, and can customize the output files you’re getting. You’re now the master of events!

—Steve Tuck, Senior Messaging Engineer

ps: If you’re looking for more resources on APIs, check out the SparkPost Academy.

Big Rewards Blog Footer