Many companies in regulated industries like the Financial, Healthcare or Utility sectors have requirements to prove that certain emails have been sent; or at least have attempted to be sent. If you’re the person in charge of tracking email activity to your customers, you know how challenging it can be. Most Email Service Providers (ESP) share that information with their customers in two ways, RESTful API queries, and webhook deliveries. This blog will review both ways and describe how to create a system that will give you, the email sender, the best chance of obtaining all the information into your backend systems for long-term archival.
The RESTful API Approach
Most ESPs keep detailed information on every action that an email takes in their system. In fact, SparkPost tracks 22 different events that an email may go through. As you can imagine, a large ESP can hold billions of these events for some customers so they tend to limit how long they keep this level of detailed information; making it important for you to get that information and store it on your own systems in a timely fashion. When pulling the information via RESTful APIs, you typically query the ESP’s systems for information for a specific time frame, a specific event or a combination of the two. The challenge is that this can produce a very large payload, so most RESTful APIs limit the amount of information that you can pull at any given time. A typical RESTful API would limit a request payload to a 1,000 records at a time; causing you to page through the data in order to pull it all down. To demonstrate the fact, I have one customer that sends approximately 11,000 emails a second; that will generate over 70 million records in under an hour (both injection and delivery SparkPost events). Pulling that data using RESTful API calls would become very challenging.
While that example may be an extreme, even a medium sized company that sends 2-3 million emails a day will create 5-7 million event records; this can result in thousands of coordinated calls between your system and the ESP. This can definitely be done, but paging through each of the 1,000 records in order to pull them down can easily get caught up in a networking nightmare.
To alleviate this challenge, many ESPs offer the ability to push message event data to your data warehouses using webhook technologies. Depending on the ESP, you may register one or more applications where the ESP will send the data; this is done by giving the ESP the URL where the data collection application can be reached. Then it’s up to the ESP to send you the requested data as it becomes available. All you need is an application that is waiting for an HTTP post of data (package). Once you receive that package, you parse and store that data. In fact, many databases even have plugins that support direct webhook connections and store the data as it’s being submitted.
The challenge here is: what if the endpoint cannot be contacted? Maybe the receiving application stopped, or the server was down, or there were internet issues. A good ESP will hold on to that data and continue to retry that package until it’s received, or a specific time limit for retrying is met. Most ESPs have two webhook restrictions; a time limit of how long they will retry sending a specific payload and/or a limit on the amount of data they hold while attempting to contact the collecting application. SparkPost will hold all message event data and will keep trying to contact your application(s) for up to 8 hours! After that, if you need the missing data you need to figure out what that data was and pull it down via the RESTful API.
Now the problem is: how do you know what you didn’t receive? It’s one thing to keep a list of each email you sent and check for matching creation or sending events, but it’s another to know if you are missing an open, click, bounce or unsubscribe event. With this issue in mind, I decided to design a fairly simple solution to the problem.
Send copies of the same data to multiple locations but keep track of all data in one location.
The number of Data Collectors are up to you and depends on what level of redundancy you want or require. Additionally, each Data Collector should be on a disparate system from one another. In fact, I would use different data centers for each Data Collector!
Using SparkPost data as an example, I would create a simple database that had 6 fields.
- Primary Key, Text field named: UID. This is a combination of three fields; the Transmission ID, the Message ID, and the Event ID.
- Indexed Text field named TransmissionID. This is a transmission ID field by itself. In SparkPost, all emails sent together will have the same transmission id for all events.
- Indexed Text field named MessageID. This is the message id field by itself. Each email address within SparkPost for a specific transmission will have the same Message ID. All events for the same email will have the same Message ID.
- Indexed Text field named EventID. This is the Event ID. Each event has a unique ID.
- Indexed Text field named email. This is the target email address.
- Text field named Raw. This holds the whole message event. This field needs to be fairly flexible and be able to hold thousands of characters. Most events are smaller and less than a couple of thousand characters, but once in a while, you may get an error that holds your substitution data which can be much larger.
By storing the data in this way I can search on multiple levels to quickly obtain email details. Optionally you may find the need to also index on other fields like Campaign or Subject.
An important note is that it’s a best practice for Data Collectors to do no data processing on the receiving data; just get it and store it in a directory so another application can process the data. This is because the webhook servers may be sending new data immediately afterward and need the Data Collector available to receive the next package. Once the data is received, the next application can parse and store the data. Since you have more than one Data Collector, there is a high probability that the data will already exist in your data warehouse. If the data is already there, great; it means another Data Collector already stored the data. Just move on to the next record. By having multiple systems you protect yourself from any one Data Collector being down and trying to figure out what data might have been missing.
Without going too deep, here are some details of the SparkPost webhook services:
- You can have as many Data Collectors as you want
- You can send any combination of the 22 different events to as many collectors as you want.
- If you have two Data Collectors requesting the same information, don’t expect that the Data Collectors will get identical packages of data. In fact, each Data Collector will receive different packages of data. Both collectors will get all of the data, but the webhook services will pull data for each targeted endpoint in their own way. It’s just how the system works.
- Expect your Data Collector to get called many times a second when you are sending emails. I did a test of one collector and saw that it was called over 200 times in one minute while sending 5k emails.
Depending on your needs, if you keep a log of all transaction IDs on emails submitted into SparkPost, you can now do a crosscheck on injection, delivery and/or bounce events to make sure you have good tracking data. If you are missing delivery data, then you can selectively pull that data using an API call. This keeps your payload targeted and small. Since email delivery can be delayed at times, I would wait for a time period (a couple of hours perhaps) before cross-referencing entries then pulling the data via API calls.
That’s it. In short, by accepting data from the ESP into multiple locations, you can create a robust system that can collect all the data you require for auditing purposes. And since you are gathering this information for auditing, with some slight changes, you may be able to use this data for great insights into your email system that allow you to send better more engaging emails or at a time that is more likely to be read by the user; the insights are infinite!