When the Gmail Thing happened in December 2020, our customers wanted to be able to quickly identify which recipients of their email had been affected. A hastily modified version of an existing tool helped them do that.

That got me thinking: the Google outage felt pretty earth-shaking for everyone involved with the email ecosystem. What about the next time something big like this happens? After all, geologists have to think like this. A little preparation can’t hurt.

We can’t foresee exactly what we might need to do the next time the email ecosystem has a major hiccup. It’s clear that it would be good to have a flexible tool, that anyone can use, to pull message events from SparkPost in bulk.

The original sparkyEvents tool dates from 2017, and only supported basic searching by date/time range. A modest update a couple of years later, switched to the new, improved SparkPost Events API. The aim was to show migrating from old to new endpoint with minimum code change; it did not attempt to support all the rich query options of the new API.

The user could enter dates/times with a local timezone offset, which the tool converts to UTC before querying the API. As you probably know, timezones are hard, with extra shenanigans when your start-time and end-times straddle daylight-saving boundaries.

Seeing this tool get used in real customer situations strengthened my desire for the principle of least astonishment. In terms of inputs, the new tool version should follow exactly what the API offers. This leads on to the following requirements:

Requirements

  • Support the full set of query options provided by the SparkPost Events API. No more, no less. Keep it simple.
  • Search times should be exactly as per the API: UTC. (If you really miss the timezone offsets, let me know).
  • Names of search parameters should be exactly as per the API documentation. That way, the API documentation is the tool documentation.
  • Keep the tool easy to update, as and when future Events API features come along.
  • This tool is mostly for ad-hoc searches, so provide the inputs directly via command-line arguments.
  • The INI file should provide the seldom-changed attributes such as API key, SparkPost host name, and the wanted output event properties (which would otherwise be too long and unwieldy to write on the command line).
  • Allow direct output to the console. This is much better for quick ad-hoc work than always dumping to an output file.
  • Use stderr for any additional “comfort reporting” output while the tool runs, so that simple output redirection to a file can also be used to capture the output, without the comfort reporting getting interleaved in.

And a little extra:

  • SparkPost occasionally gets new event types (such as amp_click) and properties (such as ab_test_id). The SparkPost API can provide the list of current event types and properties, so let’s add tool features to make that easy.
  • Some parameters support keyword searching, and these can include multiple comma-separated items, as well as whitespace. We should provide a natural way to express these. For example this will search for emails with subject lines matching “cool cats” or “hot dogs”:
--subjects "cool cats, hot dogs"

New: as of March 2021, SparkPost supports keyword searching by mailbox_providers and mailbox_provider_regions – the tool will support that too.

That’s the requirements set. To get started, follow the installation instructions given in the project README.

The new sparkyEvents in use

Once you have it installed, you’ll find there are now over 20 different command-line arguments. You can see these by running

./sparkyEvents.py -h

The tool writes directly to the console stdout if you don’t specify an output file. Here’s a simple example with no specific output properties set in the sparkpost.ini file. You’ll get “timestamp” and “type” properties by default.

./sparkyEvents.py
Writing to <stdout>
Properties:  ['timestamp', 'type']
timestamp,type
2021-03-12T11:20:36.000Z,open
2021-03-12T11:20:36.000Z,open
:
:

The following special options do not actually fetch events; they show what’s available from your SparkPost service. You’ll see there are a lot of event properties available.

./sparkyEvents.py --show_properties

 

ab_test_id,ab_test_version,amp_enabled,bounce_class,campaign_id,click_tracking,customer_id,delv_method,device_token,display_name,dr_latency,error_code,event_description,event_id,fbtype,friendly_from,geo_ip,initial_pixel,injection_time,ip_address,ip_pool,mailbox_provider,mailbox_provider_region,mailfrom,message_id,msg_from,msg_size,num_retries,open_tracking,outbound_tls,queue_time,raw_rcpt_to,raw_reason,rcpt_hash,rcpt_meta,rcpt_subs,rcpt_tags,rcpt_to,rcpt_type,reason,recipient_domain,recv_method,remote_addr,report_by,report_to,routing_domain,scheduled_time,sending_domain,sending_ip,sms_coding,sms_dst,sms_dst_npi,sms_dst_ton,sms_remoteids,sms_segments,sms_src,sms_src_npi,sms_src_ton,sms_text,stat_state,stat_type,subaccount_id,subject,target_link_name,target_link_url,template_id,template_version,timestamp,transactional,transmission_id,type,user_agent

Note that the SMS ones are related to on-premises deployments and won’t show in your cloud email event streams.

Edit your sparkpost.ini file and define what properties you want in your output.

./sparkyEvents.py --show_types
amp_click,amp_initial_open,amp_open,bounce,click,delay,delivery,generation_failure,generation_rejection,initial_open,injection,link_unsubscribe,list_unsubscribe,open,out_of_band,policy_rejection,sms_status,spam_complaint

Use any of these as a filter with the –events option, for example, to get events relating to positive user engagement:

./sparkyEvents.py --events open,click,initial_open

Negative user engagement:

./sparkyEvents.py --events link_unsubscribe,list_unsubscribe,spam_complaint

 Here’s an example of fetching the results from the Gmail December 2020 outage:

./sparkyEvents.py -o out6.csv --from 2020-12-14T22:00:00Z --to 2020-12-16T00:00:00Z --events bounce,out_of_band --bounce_classes 10 --reasons gsmtp
Writing to out6.csv
from                     2020-12-14T22:00:00Z
to                       2020-12-16T00:00:00Z
events                   bounce
bounce_classes           10
reasons                  gsmtp
Properties:  ['timestamp', 'raw_rcpt_to', 'subaccount_id']
Total events to fetch:  824
Page      1: got    824 events in 1.290 seconds

Mailbox providers and Mailbox provider regions

As of March 2021, SparkPost events API supports searching by mailbox_providers and mailbox_provider_regions. Keyword searching is supported, and names are case-insensitive as you’d expect. So, for example, to get all Outlook mailboxes that are on all European domains:

./sparkyEvents.py -o out7.csv --mailbox_providers outlook --mailbox_provider_regions "Europe"                             
Writing to out7.csv
mailbox_providers        outlook                 
mailbox_provider_regions Europe                  
Properties:  ['timestamp', 'type', 'raw_rcpt_to', 'subaccount_id', 'dr_latency', 'mailbox_provider', 'mailbox_provider_region', 'report_by', 'report_to']
Total events to fetch:  948
Page      1: got    948 events in 1.752 seconds

Note that “outlook” includes other Microsoft domains such as Hotmail, Live, and so on. You’ll see results reported such as:

2021-03-25T13:58:15.000Z,open,fred@live.co.uk,1,,Hotmail / Outlook,Europe - UK,,

Let’s get even more specific, for example just the domains for France and Germany:

./sparkyEvents.py -o out8.csv --mailbox_providers outlook --mailbox_provider_regions "France, Germany"
Writing to out8.csv
mailbox_providers        outlook                 
mailbox_provider_regions France, Germany         
Properties:  ['timestamp', 'type', 'raw_rcpt_to', 'subaccount_id', 'dr_latency', 'mailbox_provider', 'mailbox_provider_region', 'report_by', 'report_to']
Total events to fetch:  256
Page      1: got    256 events in 0.966 seconds

How to see the various keywords to use in these searches? Log in to your SparkPost account, view the Summary Report, then use “Break Down By ..” to see this.

sparkyEvents
Note that “Gsuite” and “Office 365” are broken out separately from the regular consumer domains for “Gmail” and “Hotmail / Outlook”, so you have fine-grained information available.

Further processing

The tool output is a plain text CSV file that you can incorporate into your processes directly. You can also do further processing on it using Excel, Libre Calc, or Google Sheets if the output is not huge. For large files, or if you need to do some more filtering, the excellent free csvkit command-line tools can help.

Internal coding notes

These notes are provided just for info – you don’t need this to run the tool!

The Python argparse library allows grouping of arguments that belong together, for nicer presentation in the help text. You can mark arguments as having user-defined types where appropriate. This is a great way to make parameter checks strict, rather than doing it later on, as argparse reports errors back to the user for you.

After parsing, the arguments are converted into a dict() type, giving the API query parameters in the right format.

The show_properties and show_types options are handled here. The legacy behavior of getting event types from the config file is tried, if the user did not specify –events directly.

The rest of the code (fetching events from the API, following the pagination “next” links, handling any rate-limiting or error responses seen) is mostly unchanged from the previous version; it’s straightforward API wrangling via the requests library.

In total, around 170 lines of code, which shows how concise and expressive Python can be.

Summary

I hope you find this tool useful. Please share any feedback you have via @SparkPost, or open a GitHub issue.

~ Steve