Remember in Part 2 where we said that Health Score and Engagement Recency didn’t do much on PowerMTA alone, because it doesn’t have open/click tracking built-in? We address that in this part.
What’s the starting point?
If you’re an established PowerMTA sender, you may already have a setup similar to this:
If you’re a new PowerMTA customer and are contemplating building this, then read this first. Let’s explore challenges and tradeoffs you already have (or will face) with a homegrown solution, and show which parts you can simplify with Signals.
Some challenges if you build your own analytics stack
Your email message generation will wrap links and add open-tracking pixels, prior to injecting the message into PowerMTA for delivery. Not too difficult, but you need to consider performance – the whole email body (or at least the html part) passes through that code, and each recipient needs to have uniquely identifiable links.
Next, when a recipient engages with a message, an HTTP(S) request goes from the client to your open/click tracking service. This is essentially a special-purpose web-server:
- It registers an open by returning a transparent 1×1 pixel that’s essentially invisible in the rendered mail
- It handles a “click” by redirecting the client device to their final destination using an HTTP “302 redirect”
So far, so good – every ESP (Email Service Provider) does this. There are some established technology choices in this space such as NGINX, Apache, node.js and so on.
You need to consider peak time demand and engineer the web service accordingly. You also need to consider security, resilience, and availability. You should think about what happens if a bad actor tries to hit your endpoint with a flood of traffic, or if your links could easily be “spoofed” by someone wanting to mess with your statistics or just probe weaknesses in your infrastructure.
Assuming you contain those issues, the fun really starts. You have to aggregate all those deliveries, bounces, opens and clicks from log files (which can get large) into databases (which can get HUGE). You have to decide for how long you’re going to keep this information. You have to host the database somewhere. You may use the word “Petabyte” more frequently in conversations at the water cooler!
To provide reasonable response times for your users, you’ll invest time tuning that database, choosing which fields to index for best space/time tradeoff, experimenting with ways to speed up common queries, toying with various exotic technologies, and likely switching underlying database platform a few times. The old adage “fast, cheap, good .. pick two” applies.
You need to back it up and provide geo-redundancy in case of node or site failures, as well as plan how you will maintain your stack as technologies evolve in future.
Next, you need to build a User Interface to allow marketing people to easily view their campaign data. Again, not simple .. front-end UI design is a fast-changing field, easy to get it “almost OK” but still being annoying to use. There’s a reason why users hate in-house-developed applications – there’s never enough “customer” pressure to make it really good, while also keeping it simple.
Your developers also hate it, as they have an ever-lengthening queue of tickets waiting to be worked on (to “just add another minor feature”) – while having to put out fires elsewhere. You could buy an existing analytics suite such as Loggly, Splunk, Tableau, Elk and so on. These are powerful, general-purpose tools, designed for a wide range of IT operations, but are not specifically oriented towards the email/deliverability world. Once you have them running, you build your own custom reports to see anything useful. Also, these tools ain’t cheap, particularly for large data volumes.
You also have to worry about keeping recipient details in that database. You need to care about privacy – not just GDPR, but an alphabet soup of new regulation coming along such as CCPA. How long do you keep recipient details? Do you anonymize after a time? Are your processes secure? Will you pass an on-site audit? How do you respect a recipient’s right to be forgotten from your systems?
Finally, your database only has your campaign delivery stats in it, so it has limited value as a benchmark. You can tell how well you’re doing compared to last week, last month and so on, but you don’t have a wide range of data to compare trends against. In contrast, SparkPost Signals analytics is based on the world’s largest email data footprint; your Health Score dashboard uses machine learning that’s trained on a huge data-set across thousands of senders, with a wide range of real-world deliverability scenarios.
Right, that’s probably enough to show we’re solving a hard problem. Let’s explore what we can do to find a simpler solution.
A simple engagement tracking / SparkPost Signals integration
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
Antoine de Saint-Exupéry
We’ve seen in Part 1 and Part 2 how SparkPost Signals integrates easily with PowerMTA to provide email analytics, based on delivery, bounce, spam complaint and out-of-band bounce events. Unlike Momentum – covered in Part 3 of this series – PowerMTA does not have its own engagement tracking built in. The Signals Health Score requires at least open events to be meaningful.
PowerMTA users have their own message generation already and might have open/click tracking services. If you don’t, then read on – we have free open-source code for you!
Here’s the high-level architecture of what we are going to build:
Let’s set out some project goals:
- A “batteries included” implementation that covers everything needed for Engagement Tracking with PowerMTA and SparkPost Signals.
- You can pick the parts you need (i.e. highly modular). Maybe you just want the “Signals ingest” part, for example.
- Platform-independent (Linux, Windows, etc).
- Use high-performance, reliable, scalable technology, so that it could be used for millions of emails per hour, which suggests:
- Choose a strongly typed, compiled language, with good multi-threading support.
- Pick a database technology that’s lean and fast (while enabling other choices).
- Store a minimum of customer-specific data, and easily permit that data to expire and be deleted after a known time.
- Use PowerMTA and SparkPost signals features to reduce the new code burden.
Here’s the next level of detail, expanding that blue box into separate processes.
Woah, I thought this would be simple, I hear you say. Well, each active part has just one job, and it’s really not that complex.
Message generation (colored green) is the same as before; it can be thought of as outside this project. We’re just using it to show where the messages are coming from. In fact, let’s assume we have really basic message generation that is not even capable of wrapping your html links or inserting open pixels. The new “wrapper” process will do that for you.
If you wish, your generator can add identifying headers to your messages, to leverage “reporting facets” in Signals:
- x-job (aka “campaign ID”) and
As we saw in Part 2, you can use these to provide more granular reporting of your message streams.
The “wrapper” is a simple “SMTP in, SMTP out” process that wraps your html links, adds tracking pixels to text/html MIME parts present, and adds a unique x-sp-message-id header that will later tie the opens and clicks back to the specific email.
This acts as an SMTP reverse proxy that sits in the message flow (with TLS support both upstream and downstream), so it can be completely independent of your message generator. Alternatively you could call the wrapper code from your generator if you prefer.
Tracking AMP HTML MIME parts is a possible future extension to this project.
The acct_etl (extract, transform, load) process takes message delivery records from PowerMTA and stores them as a key-value pair in a database, for fast lookup by the feeder process. It uses the accounting pipe feature of PowerMTA. Each time a message is delivered, PowerMTA sends a text record to our program containing fields we specify in PowerMTA config. In our case, we want the message_id, recipient, and (if present) the SparkPost subaccount ID.
Each record will be quite small – around 100 bytes – and will be given a “time to live” before automatic deletion, to safeguard PII (Personally Identifiable Information) and minimize storage space. These records enrich the engagement-tracking, but are not essential.
This is a web service that decodes and acts upon client email opens and clicks:
- Open pixel requests are served a transparent tracking pixel.
- Clicked link requests are served a 302 redirect, causing the user’s email client or web browser to go to the desired landing-page destination.
The server responds quickly because minimal processing is done before giving the HTTP response. The opens and clicks are pushed into a Redis queue to the feeder task. Which brings us to…
This process takes the opens and clicks from the Redis queue and feeds them to the SparkPost Ingest API.
If you are keeping your existing open/click tracking but wish to upload events to SparkPost signals, this may be the only piece of code you need (with some code adaptation to suit your own data sources). At least this can give you an example of how to format the ingest stream.
All these modules are in this Github project, with installation and configuration instructions.
Let’s start with an in-depth look at the feeder process. We’ll cover the other processes in forthcoming blog articles.
More on the feeder process
The command-line for feeder is very simple – just give it an optional log filename to write to.
$ ./feeder -h
Takes the opens and clicks from the Redis queue and feeds them to the SparkPost Ingest API
Requires environment variable SPARKPOST_API_KEY_INGEST and optionally SPARKPOST_HOST_INGEST
Usage of ./feeder:
File written with message logs
If you omit -logfile, output will go to the console (stdout).
The SparkPost ingest API key (and optionally, the host base URL) is passed in environment variables:
export SPARKPOST_API_KEY_INGEST=###your API key here##
You’ll typically want to run this as a background process on startup – see the project README, cronfile and start.sh for examples of how to do that.
Here’s a typical log content as it runs:
2019/12/13 16:00:39 Uploaded 625498 bytes raw, 32628 bytes gzipped. SparkPost Ingest response: 200 OK,
2019/12/13 16:10:40 Uploaded 445221 bytes raw, 23477 bytes gzipped. SparkPost Ingest response: 200 OK,
2019/12/13 16:20:40 Uploaded 579650 bytes raw, 29571 bytes gzipped. SparkPost Ingest response: 200 OK,
feeder – code internals
The main package is in cmd/feeder/feeder.go, and makes use of functions in the sparkypmtatracking package, including feed.go and others.
The function main() gathers the logfile, SparkPost API, and Redis resources needed; then calls feedForever(), which waits for events to arrive in the Redis queue.
Open, initial open, and click events are formatted by sparkPostEventNDJSON() and makeSparkPostEvent(), which unmarshals the event from the internal compact format ( TrackEvent class), augments it with message-ID keyed delivery info from Redis, and returns a SparkPostEvent struct (see file eventdefs.go).
SparkPost events require some attributes to be present in a specific format:
- delv_method is set to the constant string esmtp
- event_id is set to a string, carrying a unique decimal value in the range 0 .. (2^63-1)
The message_id attribute is a string, unique per message, of length 20 characters in a specific hex format (which is added by the wrapper process). More on that later.
The timestamp attribute is in Unix epoch format.
Here’s an example event:
"rcpt_to": "[email protected]",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
Batches of events are collected by feedForever() in a byte buffer, until either the max batch size (currently 5MB) is reached, or the buffer content has matured for a set time (currently 30 seconds). Then sparkPostIngest() uploads that batch, using the Gzip encoding required by the /ingest API endpoint.
An idea for future work: the code could be extended to populate geo_ip information (with a service such as MaxMind), similar to what SparkPost cloud delivery service does.
Hints and tips on tooling
If you’re working with Go, I highly recommend the free VS Code editor, with the Go ‘delve’ debugger plugin.
In this article, we’ve looked at:
- What you might already have, as a PowerMTA user, and why running this is harder than it looks
- A simple engagement tracking / SparkPost Signals integration
- .. and we’ve looked in detail at the “feeder” process for uploading opens and clicks, including an example JSON format event.
In the next article, we’ll continue a walk-through of the other parts, including tracker, acct_etl and wrapper.