• This blog post was originally published on March 30, 2020 and was updated on May 7, 2020. PowerMTA now has built-in Engagement Tracking that can be used instead of this code, see here for details.

Introduction

 We’ve reached Part 5 in this trilogy, and are in for a gripping finale. The story so far:

  • Part 1 – Introducing SparkPost Signals for on-premises deployments
  • Part 2 – Setting up PowerMTA step-by-step
  • Part 3 – Setting up Momentum
  • Part 4 – PowerMTA with Signals Engagement tracking, which we could paraphrase as “Marvin said your diodes would hurt, if you try to build your own analytics”.

First, a recap of the architecture we’re building:

Full architecture of PowerMTA tracking solution
The analytics heavy-lifting is offloaded to the SparkPost cloud – making your long-term storage, querying performance, resilience and privacy issues mostly Somebody Else’s Problem.

We covered the feeder process in Part 4. If you have your own engagement tracking stack already, this may be all you need – which is why we covered it first. In this article, we’ll work through the tracker, acct_etl  and wrapper  processes. Don’t Panic if this looks complicated – it really isn’t that bad.

Before we go into detail, a word of encouragement. The code is quite easy to install and build from source – step-by-step instructions here. Go takes care of the library dependencies for you.

If you just want to run the code as-is, the project has pre-built binaries (executables) for Linux, Mac OSX, Windows and FreeBSD, in 32-bit and 64-bit versions for download here, saving you from mucking around with building from source.

Either way, the run-time pre-requisites are Redis and (optional, recommended) NGINX; there are installation tips in the README.

Tracker

This is a web service that decodes and acts upon client email opens and clicks:

  • Open pixel requests are served a transparent tracking pixel.
  • Clicked link requests are served a 302 redirect, causing the user’s email client or web browser to go to the desired landing-page destination.

If you omit -logfile, output will go to the console (stdout) – in fact that behavior is the same for all the programs in this project.

The logfile records the action (open/click), target URL, datetime, user_agent, and remote (client) IP address:

Inside Tracker

Go makes it easy to create robust, highly scalable web services that can support thousands of simultaneous sessions, with just a few lines of code.

The main program function can be found in cmd/tracker/tracker.go; it just reads command-line arguments and starts the web service, which is in track_srv.go.

Function TrackingServer  does the actual work of reading the incoming URL path, which carries base64-encoded (URL safe), Zlib-compressed, minified JSON.

Each event is augmented with:

– The event type (open, initial_open, click)

– The user agent (which identifies the client’s browser type and version)

– timestamp (time of opening / clicking)

– client IP address

.. and sent to the Redis queue for the feeder task (using RPUSH).

Finally, if the request is valid, TrackingServer returns an HTTP response containing a GIF tracking pixel (for “open” and “initial_open” actions) or a 302 redirect (for “click” actions).

Automated unit tests for this module are in file track_test.go. You can run these standalone (using  go test -cover  for example). They use the httptest library to mock incoming requests. Tests run automatically on Travis CI each time a new version of code is checked in, and code coverage is measured with Coveralls.

Using NGINX proxy

Public web service endpoints like this are usually protected with a proxy such as NGINX.  NGINX can act as the TLS termination proxy, so tracked links and opens can be served via HTTPS.

The project includes an example NGINX config, for you to review and adapt to your own setup. Once running, it’s a good idea to test your endpoint with an external tool such as SSL Labs. If all is well, you will achieve an “A” rating.


The mail recipient’s client IP address is useful information for your email analytics. SparkPost will display the client IP for you in open & click events searches, webhooks feeds and so on. Unfortunately, a proxy would make this invisible to our tracker app! Fortunately, Nginx has an option for this:

This tells NGINX to forward the client IP address to TrackingServer via http header X-Real-IP. If TrackingServer cannot find this header, it will use the regular client IP address it sees, which will be “localhost” 127.0.0.1 as our Nginx is on the same host.

Responses back to the client also have the outgoing Server header set in our example NGINX config to value msys-http. This overrides what the app sends, and you can change this to your own value. This uses an optional NGINX feature called headers-more, so depending on your platform you may need specific installation steps – see here.

Acct_etl

The acct_etl  (extract, transform, load) process takes message delivery records from PowerMTA and stores them as a key-value pair in a database, for fast lookup by the feeder process. It uses the accounting pipe feature of PowerMTA. Unlike the others, this program does not run continuously. Instead, it’s run by PowerMTA when needed. The start.sh program sets this up for you, by copying (and changing ownership of) the binary. The program location needs to match the PowerMTA config (example here).

Each time a message is delivered, PowerMTA sends a text record to this program containing fields we specify in PowerMTA config. The program gets that record on stdin  (hence the name “accounting pipe”). In our case, we want the message_id, recipient, and (if present) the subaccount ID.

Each record will be quite small – around 100 bytes. Each is given a time to live in Redis, to safeguard your recipient PII (personally identifiable information) and minimize storage space. Redis automatically deletes a record for us as its time-to-live expires. In the meantime, your engagement-tracking data will be augmented with the stored information and you’ll really know where your towel is.

The acct_etl  code follows the same pattern as before – a small main program function in cmd/acct_etl/acct_etl.go. This takes an optional input file for testing, and a logfile option. 

The real work is done in etl.go. Function AccountETL scans the input records and decides if this is a header row or a data row. The header row can vary, depending on your PowerMTA configuration, but must provide a minimum set of fields – “type” and “header_x-sp-message-id”. 

The “rcpt” and “header_x-sp-subaccount-id” fields are optional, but provide additional info to augment the events with. Because each program invocation from PowerMTA is a separate process, Redis is used to persist the header field names and positions. You can snoop on the current value that acct_etl  is using, with:

Finally, automated unit tests for this module are in file etl_test.go  which feeds in various canned .CSV inputs to exercise different error paths etc. 

Wrapper

This is a fairly large program – it has a lot to do:

  • Run an SMTP reverse proxy that sits in your message flow (effectively a client-server sandwich)
  • Provide TLS support both upstream (to the MTA) and downstream (to the client), using a certificate/key pair that you provide
  • Pass SMTP commands, authentication, DATA and responses through transparently
  • Pick apart the MIME parts in each email message payload, looking for text/html parts
  • Wrap the email html links, add tracking pixels, and add a unique x-sp-message-id header that will later tie the opens and clicks back to the specific email

Wrapper provides control of this process via many command-line flags, which are described in the README.

The core SMTP protocol handling is done in a standalone Go package go-smtpproxy. This is based on a nice existing SMTP project, but with changes for command-response transparency, removing internal authentication mechanisms so we rely on the upstream MTA etc.

The Go language provides interfaces. The go-smtpproxy package exposes a few that wrapper uses:

Interface Purpose
Backend Method Init is called by the proxy at the start of an attempted connection. Our app uses this to hold specifics such as upstream port number, logging options, html wrapping settings etc.
Session A session is created by the proxy once an incoming HELO/EHLO is received. Our app uses this to hold specifics such as its Backend (for logging), and its associated upstream Client connection.

While the proxy takes care of the SMTP protocol, message sequence and responses, the app provides functions to respond to each incoming SMTP command/phase.

These include Greet, StartTLS, Auth, Mail, Rcpt, DataCommand, Data, Reset, Quit. Each of these functions result in communication upstream through its Client.

Client go-smtpproxy provides functions that enable the app to drive the upstream SMTP conversation with the MTA. Many of these are essentially “passthru”, but Hello, StartTLS, Data, Close have a bit more work to do.

 

This separation keeps the wrapper specifics short and sweet, given the complexity of the task. wrap_smtp.go  is mostly Backend and Session structures, connection upgrade to TLS, human-readable logging, command passthru, and the Data phase.

The Data phase is where the interesting things happen. The function MailCopy provides an important part of this. Similar to the classic Go library io.Copy pattern, it streams content from an io.Reader  to a io.Writer, returning any error found along the way. When wrapping is inactive, it actually just does an io.Copy and returns the result.

When wrapping is active, a series of functions decompose and reassemble the message body layers – following the email message body syntax (RFC2822, now RFC5322) and MIME multipart syntax (RFC2045). The code calling tree follows this structure:

TrackHTML is in wrap_html.go. At this point, we already know we’re dealing with a text/html MIME part. Go’s standard library html.Tokenizer is used to whizz through the message, looking for:

<A HREF> Link. Replace raw URL with a tracked URL.
<BODY> Insert a top “initial” open-tracking HTML fragment, containing an open pixel, with tracked URL.
</BODY>  Insert the bottom open-tracking HTML fragment, containing an open pixel, with tracked URL.


The tokenizer loop calls supporting functions WrapURL
, InitialOpenPixel and OpenPixel. Also here: EncodeLink and DecodeLink functions, used by a small command-line program linktool, described below.

The unit test code in wrap_html_test.go essentially passes a variety of different HTML samples through the wrapping function, checking they come out as expected.

The wrap_smtp_test.go code does something a bit more ambitious. It creates a client / proxy / mock upstream server “sandwich” comprising three goroutines, and passes a variety of whole emails through, including exercising the upstream/downstream TLS negotiation with self-signed certs.

Phew! Wrapper is the most complex part – it felt a bit like the Long Dark Tea-Time of the Soul. We’ll go more Gently on the next one.

linktool

This one’s nice and simple – it’s a command-line tool for encoding and decoding wrapped links – useful during testing, not part of the main project run-time code. For example, you can make a wrapped link using

Which will give you the following cryptic-looking output, resembling Vogon Poetry:

Then you can decode this back again ..

This shows you the raw JSON inside the link, and the equivalent encode flags:

Summary

In this article, we’ve looked at the remaining pieces of the engagement-tracking project, including tracker , acct_etl, and (the big one) wrapper, with a side order of linktool.

Thanks for hitching a ride through this series! You have reached the Engagement Tracker at the End of the Universe. It feels like we’ve journeyed through Life, the Universe and Everything in between. I wish you ”So Long, and Thanks for All the Fish”.

If you use this code, let us know! You should find it to be Mostly Harmless, and perhaps useful. We love feedback – you can get in touch via Github (open an Issue), Community Slack, or Twitter (@SparkPost, @tuck1s).

~ Steve