SparkPost today is synonymous with the concept of a cloud MTA. But you might not know how deep our expertise with MTAs runs. For more than a decade, the SparkPost team has been building the technology that powers some of the most demanding deployments of enterprise MTAs in the world. In fact, more than 25% of the world’s non-spam mail is sent using our MTAs every day.
Those are impressive figures to be sure. So when we say we’re proud that SparkPost has become the world’s fastest-growing email delivery service, we know that one reason for the trust given to us is the credibility that comes from having installations of our Momentum and PowerMTA software deployed in the data centers of the largest Email Service Providers (ESPs) and other high-volume senders such as LinkedIn and Twitter.
As CTO of SparkPost, my team and I also have faced the sizable challenge—albeit a rewarding one—of migrating complex, highly optimized software like MTAs to a modern cloud architecture. Our team’s experience developing and managing high-performance email infrastructure has been a major part of why SparkPost has been successful with that transformation, but so too has been our vision of what a “cloud native” service really entails.
A few years ago, our team and many of our customers recognized that the cloud promised the ability to deliver the performance of our best-in-class messaging with dramatically improved economics and business flexibility. We understood that not only would it be more cost-effective for our customers to get started, but that it also would reduce the ongoing burden on their resources in areas like server maintenance, software maintenance, and deliverability analysis and resolution.
To get there, we knew we needed to do it the right way. Standing up servers in a data center wasn’t an option—because traditional data center models would limit our scalability, reliability, and operational flexibility in all the same ways our customers were trying to avoid!
That’s a big part of why we selected Amazon Web Services (AWS) to provide SparkPost’s underlying infrastructure. Platforms such as AWS, Microsoft Azure, Heroku, and others have many great qualities, but building a cloud-native messaging solution is conceptually a lot more than taking an MTA and installing it on a virtual machine in the sky.
There are times when architecting for the cloud necessarily embodies contradictory requirements. Just consider these architectural challenges of bringing something like an MTA into the cloud, for example:
- Scaling Stateful Systems in the Cloud. One of the primary lures of deploying within a cloud provider is the ability to take advantage of push-button server deployments and auto-scaling. For the majority of AWS customers this is very straightforward; most of them deploy web-based applications of some form, following well established patterns for creating a stateful application using stateless web servers. A mail server, however, is inherently stateful; it implements a store-and-forward messaging protocol delivering to tens of thousands of unique endpoints. In practice some messages may need to be queued for extended periods of time (minutes/hours/days) during normal operation. Thus, like a database, it is significantly harder to handle scaling in the cloud, since typical load-driven scale-up/scale-down logic can’t be applied.
- Limitless Limitations. Cloud infrastructure like AWS doesn’t magically change the laws of physics—even if it does make them a lot easier to manage. Still, every service has a limit, whether published or not. These limits not only affect what instance types you deploy on, but how you have to architect your solution to ensure that it scales in every direction. From published limits on how many IPs per instance you can allocate for sending, to unpublished DNS limitations, every AWS limit needs to be reviewed and planned for (and you have to be ready for the unexpected through monitoring and fault-tolerant architecture).
- IP Reputation Management. A further complication both in general cloud email deployments, but especially in auto-scaling, is managing the dynamic allocation of sending resources without having to warm up new IPs. You need the ability to dynamically coordinate message routing across all your MTAs and to decouple the MTA processing a message from the IP assignment/management logic.
- It Takes a Village. Moving to the cloud is not just a technology hurdle—it took the right people to make sure our customers were successful. We had to bring in expertise in engineering, security, operations, deliverability, and customer care to ensure the success of our customers in a scalable cloud-driven environment.
As I noted earlier, building and deploying a true cloud MTA is a lot more complex than putting our software up on a virtual server. But the end results show why services like SparkPost are so important to how businesses consume technology today.
The cloud can make even the most complex systems feel deceptively simple—which allows the technical and business benefits to be front and center. But if you’re a software engineer or architect building for the cloud, you understand how important solving these complex needs really are to achieve that.
So, if you’re building services like ours, I’m interested in hearing about your experiences and what you’ve run into as you’ve developed for the cloud. Ping me on Twitter, or leave a comment below.