SparkPost today is synonymous with the concept of a cloud MTA. But you might not know how deep our expertise with MTAs runs. For more than a decade, the SparkPost team has been building the technology that powers some of the most demanding deployments of enterprise MTAs in the world. In fact, more than 25% of the world’s non-spam mail is sent using our MTAs every day.

Those are impressive figures to be sure. So when we say we’re proud  that SparkPost has become the world’s fastest-growing email delivery service, we know that one reason for the trust given to us is the credibility that comes from having installations of our Momentum and PowerMTA software deployed in the data centers of the largest Email Service Providers (ESPs) and other high-volume senders such as LinkedIn and Twitter.

As CTO of SparkPost, my team and I also have faced the sizable challenge—albeit a rewarding one—of migrating complex, highly optimized software like MTAs to a modern cloud architecture. Our team’s experience developing and managing high-performance email infrastructure has been a major part of why SparkPost has been successful with that transformation, but so too has been our vision of what a “cloud native” service really entails.

A few years ago, our team and many of our customers recognized that the cloud promised the ability to deliver the performance of our best-in-class messaging with dramatically improved economics and business flexibility. We understood that not only would it be more cost-effective for our customers to get started, but that it also would reduce the ongoing burden on their resources in areas like server maintenance, software maintenance, and deliverability analysis and resolution.

To get there, we knew we needed to do it the right way. Standing up servers in a data center wasn’t an option—because traditional data center models would limit our scalability, reliability, and operational flexibility in all the same ways our customers were trying to avoid!

That’s a big part of why we selected Amazon Web Services (AWS) to provide SparkPost’s underlying infrastructure. Platforms such as AWS, Microsoft Azure, Heroku, and others have many great qualities, but building a cloud-native messaging solution is conceptually a lot more than taking an MTA and installing it on a virtual machine in the sky.

There are times when architecting for the cloud necessarily embodies contradictory requirements. Just consider these architectural challenges of bringing something like an MTA into the cloud, for example:

  • Scaling Stateful Systems in the Cloud. One of the primary lures of deploying within a cloud provider is the ability to take advantage of push-button server deployments and auto-scaling. For the majority of AWS customers this is very straightforward; most of them deploy web-based applications of some form, following well established patterns for creating a stateful application using stateless web servers. A mail server, however, is inherently stateful; it implements a store-and-forward messaging protocol delivering to tens of thousands of unique endpoints. In practice some messages may need to be queued for extended periods of time (minutes/hours/days) during normal operation. Thus, like a database, it is significantly harder to handle scaling in the cloud, since typical load-driven scale-up/scale-down logic can’t be applied.
  • Limitless Limitations. Cloud infrastructure like AWS doesn’t magically change the laws of physics—even if it does make them a lot easier to manage. Still, every service has a limit, whether published or not. These limits not only affect what instance types you deploy on, but how you have to architect your solution to ensure that it scales in every direction. From published limits on how many IPs per instance you can allocate for sending, to unpublished DNS limitations, every AWS limit needs to be reviewed and planned for (and you have to be ready for the unexpected through monitoring and fault-tolerant architecture).
  • IP Reputation Management. A further complication both in general cloud email deployments, but especially in auto-scaling, is managing the dynamic allocation of sending resources without having to warm up new IPs. You need the ability to dynamically coordinate message routing across all your MTAs and to decouple the MTA processing a message from the IP assignment/management logic.
  • It Takes a Village. Moving to the cloud is not just a technology hurdle—it took the right people to make sure our customers were successful. We had to bring in expertise in engineering, security, operations, deliverability, and customer care to ensure the success of our customers in a scalable cloud-driven environment.

As I noted earlier, building and deploying a true cloud MTA is a lot more complex than putting our software up on a virtual server. But the end results show why services like SparkPost are so important to how businesses consume technology today.

The cloud can make even the most complex systems feel deceptively simple—which allows the technical and business benefits to be front and center. But if you’re a software engineer or architect building for the cloud, you understand how important solving these complex needs really are to achieve that.

So, if you’re building services like ours, I’m interested in hearing about your experiences and what you’ve run into as you’ve developed for the cloud. Ping me on Twitter, or leave a comment below.

—George Schlossnagle

Continuous Integration gears laptop mobile phone

Some two years ago, a small team met in a conference room to discuss building a self-service offering on top of Momentum, the world’s best email delivery platform. Since then, SparkPost has gone from an idea to a developer-focused service with an automated release cycle built on a culture of testing and constant iteration. So we figured it was time to share what we’ve learned and how we handle continuous integration and deployment.

Why We’ve Embraced Continuous Integration and Deployment

Before we dive into the how, you need to know why we’ve embraced continuous integration and deployment. We have 20 components that make up our service and we routinely deploy 15-20 times a week. In fact, deploying frequently allows us to focus on creating a better experience for our users iteratively. Since we can deploy small changes to any component of our service independently, we can respond quickly based on what we learn from our community. We’ve found that releasing discrete pieces of functionality for specific components lowers the risk of deployments because we can quickly verify the work and move on.

Testing is at the core of being able to continuously deploy features. The testing culture at SparkPost gives us the confidence to deploy at will. We don’t have an enforced or preferred method of testing like BDD or TDD. We have a simple rule – write tests to cover the functionality you are building. Every engineer writes unit, functional, and smoke tests using mocha, nock, sinon, and protractor. Each type of test is critical to the deployment pipeline.

Our deployment pipeline orchestration is done using Atlassian Bamboo for our private projects. We have three types of plans chained together: test, package, and deploy. During the test plan, we clone both of our automation scripts. We house all our continuous integration bash scripts, and the component we’re working on (e.g. our metrics API) in them. Bamboo then runs the unit and functional tests for that component, generating test results and coverage reports. Upon successful build, the packaging plan is triggered, generating any necessary RPM packages and uploading them to a yum repo. Once the packaging is complete, it triggers the deployment of the package. Deploy plans are responsible for installing/upgrading the component and any related configuration using Ansible, running smoke tests using protractor, and, if necessary, rolling back to a previous version.

Open source work, like our client libraries, Slack bots, and API documentation, is run through TravisCI. Check out the .travis.yml files for our Python library, PHP library, API docs, and developer hub to see what they do.

continuous integration slack screenshot

Slack and Additional Ways We Use Automation

You most likely know about our obsession with Slack by now. We use it for both manual and automated notifications related to deploying features. Before we merge/deploy code, we announce the component and the environment it will be going to. Merges to develop branches trigger deployments to UAT. Merges to master (hotfixes or develop branch promotions) trigger deployments to staging. Deployments to production are push button to allow for proper communication and timed releases of features. Once merged, it triggers the deployment pipeline outlined above. Bamboo sends email notifications upon successful plan builds, the start of a deployment, and the success or failure of a deployment. This email is sent to an internal address which is consumed by a process that posts a message in Slack.

Some additional ways we use automation include:

  • Deploying the Web UI
  • Deploying Momentum, our core platform written using C and Lua
  • Testing and upgrading Node.js
  • Making nginx configuration changes
  • Deploying one of our 18 APIs
  • Pushing customer acquisition and community data into dashboards
  • Deploying cron jobs that run cleanup tasks and reports
  • Deploying Fauxmentum, our internal tool for generating test data against our various environments

Continuous integration and deployment are vital parts of SparkPost’s ability to listen and respond to what our community members asks for. To sum up, we hope that we’ve given you some insight that will help you improve upon your own ability to build, test, and deliver features by sharing some of our experience and process. If you’d like to see some of our pipeline in action then you can sign up for an account here. Also, feel free to join our community Slack channel, and chat with us about your experiences with SparkPost. We’d love to hear from you!

—Rich Leland, Director of Growth Engineering