Create production-ready webhook consumers quickly and avoid pitfalls along the way.

webhooks: beyond the basics

One of the most useful elements of SparkPost are our webhooks. Webhooks are the tool of choice when your apps need real-time info on who received your email and who didn’t, who opened messages and clicked on your links, who unsubscribed and a host of other useful “what happened” type data.

In order to deliver that live stream of events, webhooks work in reverse to most APIs. SparkPost’s webhook feature is a push service: it makes HTTP requests into your apps. That means webhooks are a bit like a backwards API call and so they require some thought to use to greatest effect. 

In this article, we explore a few important considerations for consuming SparkPost webhook events. To be clear, this is not a foundation level walk through of SparkPost’s webhook mechanism. You can get that from this excellent introductory blog post, the webhooks API endpoint docs and the event structure reference. Our goal here is to help you create production-ready webhook consumers quickly and to avoid some pitfalls along the way.

Event Batches: Just Lists Of Events

Lets assume you already have a SparkPost account, you can send email and register your own HTTP services with SparkPost as webhook endpoints. In short, we’re ready to start receiving and processing those tasty tracking events. What should your shiny new endpoint expect?

At the most basic level, when you use SparkPost to send and track email, it will emit events so your apps can track the progress of and recipient responses to your mail.

To achieve this, SparkPost periodically sends your webhook endpoint POST requests containing JSON-formatted arrays of events. A batch of events looks like this (full documentation here):

where:

  • event-class describes the class of group this event belongs to (e.g. message_event, track_event, …)
  • event-type describes an exact event type within a class (e.g. delivery, click, link_unsubscribe, …)

Interpreting all the rich detail in there is the meat of your task as webhook consumer. Your ultimate use of these events will vary heavily by use case but there are some important commonalities we should each be aware of. Let’s move on from the basics to cover a few expectations and best practices.

Webhook Pings: Be Prepared

The moment you register your endpoint, SparkPost will send it a little HTTP request to verify reachability. This is our first interesting point: these little webhook ‘pings’ are not quite the same as the real event batches you’ll receive later. Instead, they look like this:

This ‘null batch’ structure actually makes sense: imagine SparkPost instead sent a few fake events to your production webhook endpoint. That might trigger untold knock-on effects as your endpoint attempts to interpret unexpected and faked-up event data. Safer then to send a minimal payload.

Still, its important to be aware of this since your endpoint might choke on this degenerate payload if it’s only expecting fully-formed events. Of course, that’s not a problem for you because you properly validate input before consuming it, right? 😉

Note: If you like to test against live APIs, you can test this case by programmatically registering a webhook endpoint using the webhook API endpoint to trigger the ‘ping’. You can always delete it afterwards.

Receiving Real Batches: Retries And Retention

So much for pings. Lets get on to the real stuff. When SparkPost sends a batch of events, it expects a 200 HTTP response from your endpoint as acknowledgement of receipt. Any other response is interpreted as failure, causing SparkPost to try again later. SparkPost will attempt re-delivery of a failed batch for 8 hours before discarding it. That gives you a useful design parameter when building your endpoint. It also offers a hint about best practice for when we run into a problem consuming a batch. Webhook endpoints should return a non-200 HTTP response if and only iff they run into trouble taking ownership of a given batch.  It would also be prudent to trigger an alert so you can investigate the issue before it becomes serious. 

Transactional Safety: Acknowledging Receipt == Taking Ownership

This next point seems obvious but it’s worth making explicit: once you tell SparkPost “200 OK” on receipt of a batch, you own that batch. You’re solely responsible for its care and feeding from that point on.

Out of this comes another design requirement: stash each batch in durable storage before you acknowledge receipt. SparkPost will wait on the line 10 seconds during batch delivery to allow you to consume a batch: that should be ample time to store it.

You might also be tempted to just interpret each event in a given batch either while SparkPost waits or worse, to acknowledge first and then consume your events.

Remember though that once we ack, we can’t go back. There’s no getting that batch back if you choke on it, if a downstream service fails or if a lightning strike hits. The failure modes that result in data loss here are numerous to say least: its risk central. Clearly, we should store, ack and consume, in that order.

There is an import scaling consideration here too as your email and therefore event volumes grow.  Attempting to both receive and process incoming event batches in a single synchronous step will have negative effects on the responsiveness of your endpoint as more, larger and occasionally parallel batches are delivered to it.  Here then is our next design requirement:

Design your SparkPost webhook endpoint to receive and store batches, then process them asynchronously to stay responsive as you scale.

Event Consumption: Make It Easy On Yourself

So we can handle pings, receive, store and acknowledge event batches. Can we start consuming these things yet? Indeed yes and once you start doing that, you’ll find another interesting commonality. Recall the event structure:

All the fun stuff is wrapped inside msys.whatever_event. So as you begin writing code to filter, extract, manipulate and consume events, you might find yourself typing a whole lot of references to msys.message_event.field_name this andmsys.track_event.field_name that.

Here’s an observation: the type field is included in all events and contains all the information required to discriminate between events. That outer ‘event class’ wrapper is therefore useful but not essential.

Might your fingers (and your colleagues’ eyeballs) tire more slowly if you strip out the first 2 layers of each event first then just rely on the type field? It certainly seems that way but there is a trade-off (isn’t there always?) against the system resource cost of that de-nesting step in our chosen software environment.

Cooking Up Test Batches

Another common task in webhook event consumption is how to harvest sample batches to test against. The ‘live API’ option is to use SparkPost’s webhook API samples endpoint directly and forward them to our endpoint, possibly even using the webhook validate endpoint to do the forwarding. For reference, this is how the ‘test webhook’ feature in the SparkPost UI works.

This sample-and-forward plan works well if you don’t care much about the content of the events themselves since you are consuming pre-generated samples. For specific messaging scenarios, a better strategy might be to send some test transmissions and capture the resulting real events for later testing.

A hybrid approach could also be helpful once you have a feel for events generated by your use case. You can use sample events to produce an event of each type you care about, then edit and replicate them to fake up a particular scenario. This approach can also work well for volume and throughput testing.

Summary

Knowing what to expect functionally beyond the raw API spec is half the battle when consuming new data sources like SparkPost webhooks. We hope this small set of starting observations helps you on your way to productivity and we look forward to seeing all the unexpected, unique, innovative and colorful things you end up building with them.