What’s In Our Email Infrastructure Stack
Openness is extremely important to us at SparkPost. As engineers, we value transparency, and we bet you do too. We’re also deeply geeky about email infrastructure, so this post is a peek into our production software stack. On our tour, we’ll proceed “front to back,” as it were, following the path your API requests take from the public internet through our service. We’ll also touch briefly on the technologies that make up our email API and expose a few of the key technology decisions we made along the way. If you’d like a list of the tools we use in production, operationally and internally, our StackShare page has exactly that.
0: Network Edge
Our tour starts when a request arrives at our network from the public internet. Whether you’re delivering messages through our Transmission endpoint, checking engagement metrics or pulling fine-grained Message Events into your app, your request hits our load balancer first. It isn’t particularly complex; it maintains health checks on our internal hosts and distributes inbound traffic evenly across the SparkPost application.
If you’re paying attention, you might wonder how we handle your SMTP traffic. Well, SMTP comes in through the front door too, but it follows a different path than HTTP. We’ll explore SMTP in a moment. The next stop for HTTP requests is the application boundary: nginx.
As an aside: This section is labelled “0” not for any geeky array-indexing reason but because our network edge is really just outside of our application, so it’s sort of a “layer 0.”
1: The Application Boundary
The outermost layer in our application is based on the venerable nginx proxy. In actual fact, we use OpenResty, which packages nginx up with a load of very useful modules. For each inbound request, nginx first calls on our internal authentication service to identify and authenticate the caller. After tagging the request with authentication details, nginx then forwards it to the correct component for fulfillment.
As we scale and maintain each group of services in our infrastructure, nginx is also one of the most useful control points for fanning out requests to clustered services.
After this point, our request has 3 main paths: some go directly to our message management platform, others are handled by our application service layer, and we also use nginx to serve our web UI. We’ll look at all three components in turn.
2: Message Management
SparkPost’s Message Management layer is built around the Momentum MTA platform. Momentum has been a core element of many high-performance email stacks for over a decade and it really deserves an article of its own, so we’ll just cover the basics here. This is where Transmission requests turn into message streams. Momentum accepts Transmission requests and generates fully-formed messages. It manages queuing and IP pool assignment, then performs coordinated massively-parallel delivery across all customer traffic. Just from that list, it’s clear that Momentum is at the core of SparkPost’s email infrastructure.
Momentum is a modern standards-compliant and extensible message management platform. It’s written in C for performance and has both native SMP and clustering capabilities to let us scale up and out. We use both dimensions in the SparkPost stack. Those are all table stake features. Momentum’s other hugely valuable trait is in its flexibility. Embedded within Momentum is a Lua-based policy engine that lets us make fine-grained messaging decisions live as messages pass through the platform. Better yet, it lets us change the decisions we make with minimum friction and without sacrificing performance.
So it’s clear that SparkPost isn’t all API calls and UI. We also support SMTP delivery through our SMTP API. Interestingly, Momentum has its roots in bidirectional SMTP traffic management, so it both delivers outbound SMTP traffic and accepts inbound SMTP for our relay webhooks service.
SMTP’s path through our infrastructure is necessarily different than HTTP traffic. Of course, we accept SMTP through a load balancer at the boundary just like HTTP. However, Momentum’s policy engine lets us handle authentication, configuration, filtering, accounting, CSS inlining, engagement tracking (and the rest) in-process, in parallel and spread across our cluster of Momentum nodes.
2.3 Adaptive Email Network
We use Momentum for one other very important job: reputation protection by live traffic shaping. In essence, Momentum uses traffic shaping rules collected through our Adaptive Email Network to adhere to each receiving ISP’s acceptance policies. Our AEN rules also tie into system monitoring, alerting and automatic corrective capabilities – but that’s a whole other article.
For both SMTP and API-driven traffic, Momentum’s final task is to generate a stream of events that informs the rest of the application on everything it’s doing, from generation and delivery status to engagement tracking and accounting details. There’s more on this in the IPC and Storage sections below. First though, a very quick glance at the other parts of the SparkPost API.
Any request not serviced by our Message Management layer is handled by one of our other API services. Amongst other things, these Node.js based API endpoints provide our metrics, webhooks and account configuration capabilities, as well as fulfilling numerous internal services. Did you just have your daily limit bumped by our Support team? That was an internal service call. Switch from Pro to the SuperPro plan? Service call. Our services expose controls and config for our users and our staff. They’re small, fast, clustered and they all rely heavily on our IPC and Storage layers, coming up next!
4: Inter-process Communication
Email infrastructure is one part execution (y’know: delivery), one part analytics, and real-time email analytics needs lots of event data. As our Message Management layer generates, delivers and tracks user message streams, it pushes a detailed stream of tracking events through our internal event hose for the rest of the app to consume. Delivery phases, template composition results, bounces, FBLs, engagement events – all these (and the rest) are stuffed into our event hose for processing in our IPC layer.
The front-end of our IPC layer is based on RabbitMQ which distributes incoming events on durable queues to the rest of the application. On the consumption side, we use a suite of Node.js services that are responsible for extracting, transforming and loading events into final storage. These ETLs support the metrics, webhooks and message events services as well as various internal utilities, but they couldn’t exist without a resting place for all that data.
The SparkPost application uses two main forms of storage. We use Cassandra for operational data, account information, integration and configuration details. Our analytics capability is a little different though. Email analytics demands high volume, low-latency summaries of fine grained event data. To provide this type of live rollup query with user-controllable granularity, we spent a lot of time evaluating options and settled on the HP Vertica distributed analytics platform which is serving us (and you!) extremely well. Both of these are distributed services as you might expect but they’re quite different in their capabilities and platform requirements. Together though, they provide a solid data storage and retrieval substrate and in some ways they underpin all that we do at SparkPost.
Alright, here ends the tour. Modern email infrastructure is a broad and deep topic and we barely scratched the surface here, but hopefully you learned a little about our innards. To learn more about using the SparkPost service, check out our DevHub and if you’d like to chat or have an email-driven project in mind, come join us on Slack!