• This blog post was originally published on March 30, 2020 and was updated on May 7, 2020. PowerMTA now has built-in Engagement Tracking that can be used instead of this code, see here for details.

Introduction

 We’ve reached Part 5 in this trilogy, and are in for a gripping finale. The story so far:

  • Part 1 – Introducing SparkPost Signals for on-premises deployments
  • Part 2 – Setting up PowerMTA step-by-step
  • Part 3 – Setting up Momentum
  • Part 4 – PowerMTA with Signals Engagement tracking, which we could paraphrase as “Marvin said your diodes would hurt, if you try to build your own analytics”.

First, a recap of the architecture we’re building:

Full architecture of PowerMTA tracking solution
The analytics heavy-lifting is offloaded to the SparkPost cloud – making your long-term storage, querying performance, resilience and privacy issues mostly Somebody Else’s Problem.

We covered the feeder process in Part 4. If you have your own engagement tracking stack already, this may be all you need – which is why we covered it first. In this article, we’ll work through the tracker, acct_etl  and wrapper  processes. Don’t Panic if this looks complicated – it really isn’t that bad.

Before we go into detail, a word of encouragement. The code is quite easy to install and build from source – step-by-step instructions here. Go takes care of the library dependencies for you.

If you just want to run the code as-is, the project has pre-built binaries (executables) for Linux, Mac OSX, Windows and FreeBSD, in 32-bit and 64-bit versions for download here, saving you from mucking around with building from source.

Either way, the run-time pre-requisites are Redis and (optional, recommended) NGINX; there are installation tips in the README.

Tracker

This is a web service that decodes and acts upon client email opens and clicks:

  • Open pixel requests are served a transparent tracking pixel.
  • Clicked link requests are served a 302 redirect, causing the user’s email client or web browser to go to the desired landing-page destination.

./tracker -h
Web service that decodes client email opens and clicks
Runs in plain mode, it should proxied (e.g. by nginx) to provide https and protection.
Usage of ./tracker:
  -in_hostport string
        host:port to serve incoming HTTP requests (default ":8888")
  -logfile string
        File written with message logs

If you omit -logfile, output will go to the console (stdout) – in fact that behavior is the same for all the programs in this project.

The logfile records the action (open/click), target URL, datetime, user_agent, and remote (client) IP address:

2020/01/09 15:40:27 Timestamp 1578584427, IPAddress 127.0.0.1, UserAgent Mozilla/5.0 (Linux; Android 4.4.2; XMP-6250 Build/HAWK) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Safari/537.36 ADAPI/2.0 (UUID:9e7df0ed-2a5c-4a19-bec7-2cc54800f99d) RK3188-ADAPI/1.2.84.533 (MODEL:XMP-6250), Action c, URL http://example.com/index.html, MsgID 00006449175e39c767c2
2020/01/09 15:40:27 Timestamp 1578584427, IPAddress 127.0.0.1, UserAgent Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36, Action o, URL , MsgID 00006449175eea2bd529

Inside Tracker

Go makes it easy to create robust, highly scalable web services that can support thousands of simultaneous sessions, with just a few lines of code.

The main program function can be found in cmd/tracker/tracker.go; it just reads command-line arguments and starts the web service, which is in track_srv.go.

Function TrackingServer  does the actual work of reading the incoming URL path, which carries base64-encoded (URL safe), Zlib-compressed, minified JSON.

Each event is augmented with:

– The event type (open, initial_open, click)

– The user agent (which identifies the client’s browser type and version)

– timestamp (time of opening / clicking)

– client IP address

.. and sent to the Redis queue for the feeder task (using `RPUSH`).

Finally, if the request is valid, TrackingServer returns an HTTP response containing a GIF tracking pixel (for “open” and “initial_open” actions) or a 302 redirect (for “click” actions).

Automated unit tests for this module are in file track_test.go. You can run these standalone (using go test -cover  for example). They use the httptest library to mock incoming requests. Tests run automatically on Travis CI each time a new version of code is checked in, and code coverage is measured with Coveralls.

Using NGINX proxy

Public web service endpoints like this are usually protected with a proxy such as NGINX.  NGINX can act as the TLS termination proxy, so tracked links and opens can be served via HTTPS.

The project includes an example NGINX config, for you to review and adapt to your own setup. Once running, it’s a good idea to test your endpoint with an external tool such as SSL Labs. If all is well, you will achieve an “A” rating.


The mail recipient’s client IP address is useful information for your email analytics. SparkPost will display the client IP for you in open & click events searches, webhooks feeds and so on. Unfortunately, a proxy would make this invisible to our tracker app! Fortunately, Nginx has an option for this:

proxy_set_header X-Real-IP $remote_addr;

This tells NGINX to forward the client IP address to TrackingServer via http header X-Real-IP. If TrackingServer cannot find this header, it will use the regular client IP address it sees, which will be “localhost” 127.0.0.1 as our Nginx is on the same host.

Responses back to the client also have the outgoing Server header set in our example NGINX config to value msys-http. This overrides what the app sends, and you can change this to your own value. This uses an optional NGINX feature called headers-more, so depending on your platform you may need specific installation steps – see here.

Acct_etl

The acct_etl  (extract, transform, load) process takes message delivery records from PowerMTA and stores them as a key-value pair in a database, for fast lookup by the feeder process. It uses the accounting pipe feature of PowerMTA. Unlike the others, this program does not run continuously. Instead, it’s run by PowerMTA when needed. The start.sh program sets this up for you, by copying (and changing ownership of) the binary. The program location needs to match the PowerMTA config (example here).

Each time a message is delivered, PowerMTA sends a text record to this program containing fields we specify in PowerMTA config. The program gets that record on stdin  (hence the name “accounting pipe”). In our case, we want the message_id, recipient, and (if present) the subaccount ID.

Each record will be quite small – around 100 bytes. Each is given a time to live in Redis, to safeguard your recipient PII (personally identifiable information) and minimize storage space. Redis automatically deletes a record for us as its time-to-live expires. In the meantime, your engagement-tracking data will be augmented with the stored information and you’ll really know where your towel is.

The acct_etl  code follows the same pattern as before – a small main program function in cmd/acct_etl/acct_etl.go. This takes an optional input file for testing, and a logfile option. 

The real work is done in etl.go. Function AccountETL scans the input records and decides if this is a header row or a data row. The header row can vary, depending on your PowerMTA configuration, but must provide a minimum set of fields – “type” and “header_x-sp-message-id”. 

The “rcpt” and “header_x-sp-subaccount-id” fields are optional, but provide additional info to augment the events with. Because each program invocation from PowerMTA is a separate process, Redis is used to persist the header field names and positions. You can snoop on the current value that acct_etl  is using, with:

redis-cli get acct_headers

"{\"header_x-sp-message-id\":2,\"header_x-sp-subaccount-id\":3,\"rcpt\":1,\"type\":0}"

Finally, automated unit tests for this module are in file etl_test.go  which feeds in various canned .CSV inputs to exercise different error paths etc. 

Wrapper

This is a fairly large program – it has a lot to do:

  • Run an SMTP reverse proxy that sits in your message flow (effectively a client-server sandwich)
  • Provide TLS support both upstream (to the MTA) and downstream (to the client), using a certificate/key pair that you provide
  • Pass SMTP commands, authentication, DATA and responses through transparently
  • Pick apart the MIME parts in each email message payload, looking for text/html parts
  • Wrap the email html links, add tracking pixels, and add a unique x-sp-message-id header that will later tie the opens and clicks back to the specific email

Wrapper provides control of this process via many command-line flags, which are described in the README.

SMTP proxy that accepts incoming messages from your downstream client, applies engagement-tracking
(wrapping links and adding open tracking pixels) and relays on to an upstream server.
Usage of ./wrapper:
  -certfile string
    	Certificate file for this server
  -downstream_debug string
    	File to write downstream server SMTP conversation for debugging
  -in_hostport string
    	Port number to serve incoming SMTP requests (default "localhost:587")
  -insecure_skip_verify
    	Skip check of peer cert on upstream side
  -logfile string
    	File written with message logs (also to stdout)
  -out_hostport string
    	host:port for onward routing of SMTP requests (default "smtp.sparkpostmail.com:587")
  -privkeyfile string
    	Private key file for this server
  -track_click
    	Wrap links in HTML mail, to track clicks
  -track_initial_open
    	Insert an initial_open tracking pixel at top of HTML mail
  -track_open
    	Insert an open tracking pixel at bottom of HTML mail (default true)
  -tracking_url string
    	URL of your tracking service endpoint (default "http://localhost:8888")
  -upstream_data_debug string
    	File to write upstream DATA for debugging
  -verbose
    	print out lots of messages

The core SMTP protocol handling is done in a standalone Go package go-smtpproxy. This is based on a nice existing SMTP project, but with changes for command-response transparency, removing internal authentication mechanisms so we rely on the upstream MTA etc.

The Go language provides interfaces. The go-smtpproxy package exposes a few that wrapper uses:

Interface Purpose
Backend Method Init is called by the proxy at the start of an attempted connection. Our app uses this to hold specifics such as upstream port number, logging options, html wrapping settings etc.
Session A session is created by the proxy once an incoming HELO/EHLO is received. Our app uses this to hold specifics such as its Backend (for logging), and its associated upstream Client connection.

While the proxy takes care of the SMTP protocol, message sequence and responses, the app provides functions to respond to each incoming SMTP command/phase.

These include Greet, StartTLS, Auth, Mail, Rcpt, DataCommand, Data, Reset, Quit. Each of these functions result in communication upstream through its Client.

Client go-smtpproxy provides functions that enable the app to drive the upstream SMTP conversation with the MTA. Many of these are essentially “passthru”, but Hello, StartTLS, Data, Close have a bit more work to do.

 

This separation keeps the wrapper specifics short and sweet, given the complexity of the task. wrap_smtp.go  is mostly Backend and Session structures, connection upgrade to TLS, human-readable logging, command passthru, and the Data phase.

The Data phase is where the interesting things happen. The function MailCopy provides an important part of this. Similar to the classic Go library io.Copy pattern, it streams content from an io.Reader  to a io.Writer, returning any error found along the way. When wrapping is inactive, it actually just does an io.Copy and returns the result.

When wrapping is active, a series of functions decompose and reassemble the message body layers – following the email message body syntax (RFC2822, now RFC5322) and MIME multipart syntax (RFC2045). The code calling tree follows this structure:

MailCopy
-   ProcessMessageHeaders
	-   UniqMessageID
	-   SetMessageInfo
-   writeMessageHeaders
-   HandleMessagePart
	-   TrackHTML
	-   handleMultiPart
	-   handlePlainPart

TrackHTML is in wrap_html.go. At this point, we already know we’re dealing with a text/html MIME part. Go’s standard library html.Tokenizer is used to whizz through the message, looking for:

<A HREF> Link. Replace raw URL with a tracked URL.
<BODY> Insert a top “initial” open-tracking HTML fragment, containing an open pixel, with tracked URL.
</BODY>  Insert the bottom open-tracking HTML fragment, containing an open pixel, with tracked URL.


The tokenizer loop calls supporting functions WrapURL
, InitialOpenPixel and OpenPixel. Also here: EncodeLink and DecodeLink functions, used by a small command-line program linktool, described below.

The unit test code in wrap_html_test.go essentially passes a variety of different HTML samples through the wrapping function, checking they come out as expected.

The wrap_smtp_test.go code does something a bit more ambitious. It creates a client / proxy / mock upstream server “sandwich” comprising three goroutines, and passes a variety of whole emails through, including exercising the upstream/downstream TLS negotiation with self-signed certs.

Phew! Wrapper is the most complex part – it felt a bit like the Long Dark Tea-Time of the Soul. We’ll go more Gently on the next one.

linktool

This one’s nice and simple – it’s a command-line tool for encoding and decoding wrapped links – useful during testing, not part of the main project run-time code. For example, you can make a wrapped link using

./linktool encode -tracking_url https://my-tracking-domain.com -rcpt_to fred@thetucks.com -action click -target_link_url https://thetucks.com -message_id 00000deadbeeff00d1337

Which will give you the following cryptic-looking output, resembling Vogon Poetry:
https://my-tracking-domain.com/eJxUzLEOQiEMRuF3-WciGAaTTr4JwbaIUSKBMhnf_Ybxnv18P2Q2EBgOltb4gFDN-iTvraotfs8Lfxsc2nyml4AQdqJZHqqlhCDXGG9wGNw3VYbK_fT-jwAAAP__f2Mg1g==

Then you can decode this back again ..
./linktool decode https://my-tracking-domain.com/eJxUzLEOQiEMRuF3-WciGAaTTr4JwbaIUSKBMhnf_Ybxnv18P2Q2EBgOltb4gFDN-iTvraotfs8Lfxsc2nyml4AQdqJZHqqlhCDXGG9wGNw3VYbK_fT-jwAAAP__f2Mg1g==

This shows you the raw JSON inside the link, and the equivalent encode flags:
JSON:
{"act":"c","t_url":"https://thetucks.com","msg_id":"00000deadbeeff00d1337","rcpt":"fred@thetucks.com"}

Equivalent to encode -tracking_url https://my-tracking-domain.com -rcpt_to fred@thetucks.com -action click -target_link_url https://thetucks.com -message_id 00000deadbeeff00d1337

Summary

In this article, we’ve looked at the remaining pieces of the engagement-tracking project, including tracker , acct_etl, and (the big one) wrapper, with a side order of linktool.

Thanks for hitching a ride through this series! You have reached the Engagement Tracker at the End of the Universe. It feels like we’ve journeyed through Life, the Universe and Everything in between. I wish you ”So Long, and Thanks for All the Fish”.

If you use this code, let us know! You should find it to be Mostly Harmless, and perhaps useful. We love feedback – you can get in touch via Github (open an Issue), Community Slack, or Twitter (@SparkPost, @tuck1s).

~ Steve