George Schlossnagle is a pretty smart guy, though he’s too self-deprecating to really want me to say so. But there’s no question that he’s been a major force in the way SparkPost—in fact, the entire industry of high-performance email delivery—has developed over the years. He co-founded Message Systems in the late 1990s to develop a remarkably scalable email infrastructure that has grown and evolved to serve as a key element of SparkPost’s stack today. Not only is the Momentum MTA core to the SparkPost service, but it’s also essential to the in-house infrastructure at the largest mailers like social networks, ISPs, financial services, and publishers. Momentum and SparkPost together carry more than 25% of the world’s non-spam email. So, yeah, George knows something about architecting software and services that scale really well.
What does it mean to design and architect software services at scale in the cloud? I recently asked George to share some of his perspective about the role of a software architect and how it compares to other aspects of coding and building software. Here’s what he said.
How do you define your mission as a cloud software architect? What are you trying to accomplish? What motivates you in this role?
In the simplest view, I look at my job as ensuring we design and build software to not only service the requirements we have today, but that puts us into a good place to execute on the requirements that might come tomorrow. The forward-looking part of the role is probably the most difficult—ensuring that the technology and design choices we make are not only good for the problem they’re aimed at but that they interoperate with all the other portions of the architecture and are likely to remain good choices as the architecture evolves.
What motivates me is a desire to build functional software that allows us to adapt to changes in the market and changes in how our users want to use it. I’ve been building parts of this software stack for almost 20 years, and being able to gracefully evolve to handle new requirements is the key to long-term success. Many (most?) individual technology choices end up getting reevaluated every 3–5 years, and it’s setting yourself up to manage those inevitable transitions whilst still providing a stable and reliable service that’s the exciting (and necessary) challenge.
You also happen to be a co-founder of our company, so you might have a particular understanding of how technology and business goals go together. How do you see the relationship between technical goals and business goals?
I think they’re intrinsically linked. The business being successful is what allows us to continue to innovate, and innovation is what allows the business to be successful. I can’t imagine how we could be successful as a business if our technology goals didn’t directly support our business goals.
“Ahead of its time” and “vaporware” are two great monikers for failures driven by those goals being out of sync.
Does having a software architect’s mindset affect how you approach non-technical challenges, whether in the business or elsewhere?
To the extent that an architect’s job is to look at the big picture and plan for that, yes.
Have you always been a technologist in your career? Did you begin work as a programmer?
I studied to be a mathematician and began my professional career as a systems administrator. That led me into web and database performance work, and I became heavily involved in the PHP internals community, helping build an open source compiler cache for the language. From there I helped found a scalability-focused consulting company, OmniTI, which ultimately led to where I am today.
That path gave me a real appreciation for how software runs in the real world, the impact of design choices on operability and reliability and the importance of getting actionable telemetry out of your products.
What’s difference between being a programmer and a software architect? (Or, for that matter, an engineering manager and an architect?)
Being an architect is about setting a technical strategy while a programmer is fundamentally a tactical role. As an architect, you need to ensure that all of your choices and designs come together in a cohesive way that will grow over time and adapt to future challenges. A good programmer has their focus on making the individual parts as solid as possible. While there’s often some overlap, these are independent roles and both challenging in their own right.
Engineering management is distinct from both of them. Management is the art of achieving through others; of getting a group of people to all align their activities towards a common goal. It’s faddish to diminish the contribution of managers, but a good manager can really drive the productivity of teams and it is a tremendously difficult skill. People are always the hardest part of any business—navigating personalities, egos, strengths, and weaknesses.
Looking back, what’s the moment in your career you realized this was the right role for you?
Throughout my career I’ve always been interested in how things come together to form a cohesive whole. Seeing how all of the technologies fit together to solve our business issues has always been really exciting to me.
What are you working on right now?
We are moving to the next generation of our cloud architecture now, and that’s an effort that’s spanned most of the development group here. I’ve also been active in driving a lot of our compliance technology (trying to keep malicious and unwanted users from exploiting our platform).
What’s your most important tool?
My most important asset is a phenomenal team of great people who have a great understanding of all the technologies we have. Being able to leverage that group to make informed decisions is awesome.
You’ve designed on-premises and cloud technology products. What’s been the most startling difference between architecting for the cloud instead of a more traditional, packaged software product?
I don’t know that there’s anything particularly startling. The biggest non-startling difference between architecting for the cloud is that it flips many of the traditional challenges of release software on its head. In an on-premises world, you have to be extremely conservative about quality control, because when you release a version into the wild it becomes its own entity and you cannot force your customers to upgrade. When you control the entire deployment cycle yourself, you can iterate much faster because you can instantly (and globally) roll back from any issues. This allows for a feature velocity that is tremendously larger than on-premises software.
What’s a misunderstanding about software architecture you wish you could dispel?
I think the (false) notion of hierarchy is a common misunderstanding. Recognizing that the best architect may not be the best programmer (I’m certainly not), or that great programmers may not be great architects. They’re different skill sets and reinforce rather than eclipse each other.
What should I have asked you that I didn’t?
You were pretty thorough. 🙂
I hope you enjoyed this window into how one of the principal architects of SparkPost sees his profession. The thoughtfulness with which George describes his point of view as a software architect and builder doesn’t surprise me at all—in my experience, it’s a hallmark of how George approaches most aspects of work and communication.
tl;dr – It’s easy to build systems for when things go right, it’s hard to build them for when things go wrong.
Building software to a spec is easy. Building software to perform in the situations you can anticipate is only a tad bit harder. It’s the engineering involved to build a system that is instrumented for observability and constructed with tools to operate under conditions you didn’t foresee
that is the true feat. Software operating time and a heterogeneity of customer needs are the two reagents necessary to build truly kick-ass products. Message Systems brings almost two decades of software development and a customer base extending to the world’s largest and most challenging environments (spanning huge ESPs, telcos, social networks and e-businesses).
Port25 adds to the equation similar depth of operational experience (actually slightly longer) and a broad deployment base across ESPs large and small, as well as a huge swath of e-businesses. Simply put, together we bring a depth of email experience unmatched in the market, and the experience of delivering high quality software and service in the world’s most challenging environments. I could not be more excited about the combination, not only for how it expands the markets we can serve, but because the bringing together of such engineering talent and experiences will inevitably make both sets of products stronger.
I won’t speak to the Port25 origin story, but I’m happy to share a bit about Message Systems. Back in the late 90s, my brother Theo and I were running a small scalability consultancy. We mainly helped clients who had built Internet sites in the mid 90s grow, and operationally manage those sites as traffic exploded through the back half of the decade and into the first Internet bubble. We were both active in and around the Apache project and did quite a bit of work helping people engineer for scale.
Some of our clients also engaged in very large-scale B2C email communication, including an online gaming company that sent in the 20 million mails per day range (that’s medium-sized peanuts today, but in 1998 that made them one of the largest volume emailers in existence). As we helped them scale their operations, we were constantly plagued with the lack of at-scale operability in the mail solutions available at the time. Our software was born of many sleepless nights staring at malfunctioning software solutions without the ability to easily understand the precipitating conditions, observe the current state, or manipulate that state quickly.
The solutions available at the time all suffered from a number of different flaws:
- Poor scalability and resiliency, especially under adverse conditions (large back queues, things like that)
- A lack of good administrative tools that give you fast and non-resource intensive queue insight and online management, so that when you have operational issues you can quickly diagnose, isolate and remediate them. Anyone that has ever managed mail knows both how painful and necessary this is.
- An inability to set differing delivery policies, limits and tunings for different domains and service providers; and an inability to adjust or tune delivery parameters based on disposition and handling policies at receiving ISPs.
- A lack of extensibility, meaning that if you wanted to alter or augment the way the MTA operated, you needed to fork a source code base and maintain your changes as patches, creating a maintainability nightmare (milter didn’t exist back then, and even today doesn’t cover things like the desire for custom IO management, alternate delivery protocols, integration with external data for delivery policy and custom logging).
With these issues out there (and causing us real – and personal – operational pain at customers), we decided we should build our own MTA product that provided solutions for these issues.
Some of these problems, particularly the lack of a singular in-process view of the runtime queue and scalability issues were just fundamentally difficult to solve within the architectures of the open source solutions available at that time (which were all more-or-less direct inheritors of the original Sendmail architecture), so we decided that instead of just forking one of the major open source products and building off that that we were much better off writing a product from scratch where we could design optimal solutions from the ground up.
We opted for a large monolithic process where we could keep a single in-memory index of the queue and all message metadata, with isolated queues to help isolate deliverability issues. We built powerful operational tools as a first class citizen in the product and took what was already a number of years of operational email experience into account in all of the feature definitions. We added a modular extensibility framework that allowed for adding functionality to the server through drop-in DSOs, and built various interpreter layers (first Java and Perl, both now dead) and embedded scripting languages (our own Sieve variant and later Lua) on top of it to make it easier to consume. We built that subsystem to be used by ourselves as well as customers, ensuring it was always a first-class system. We built in the ability to have messages queued and manageable multi-dimensionally across both destination provider and local tenant, giving the ability to set different policies and have completely separable manageability controls across different senders on the same platform (hugely important to our service provider clients, but also to many larger/complex non-service providers as well).
I’m not a believer that first is necessarily best, but the great benefit that the large customer community that Message Systems and Port25 bring together (which is only growing as we add infrastructure-in-the-cloud services through SparkPost) is that every customer benefits from the demands of the one that came before. That customer community consisting of folks not only /using/ our product but running it themselves in their own unique ways helps us avoid the issue of a monoculture. The precipitating conditions and shape of the system 17 years ago was different than it was 10 years ago, which is different then it was a year ago – all good indicators that the challenges in the years to come will be different still. A diversity of operating environments allows us to see that early and keep all of our customers ahead of the curve (as we have consistently in the past).
Anyone can write their own mail transfer agent (the ‘s’ in SMTP stands for simple after all – just from the spec a competent programmer could write a basic MTA from scratch in python in a day or so). But in an open network environment like email, it’s the multitude of edge cases and obscure interop behaviors that are hard to get right. Complex, custom software built in a monoculture will always be subpar to when you are a player in a rapidly changing industry. The things I’m most excited about regarding our recent acquisition of Port25 are that we reaffirm our core mission of providing the absolute best email infrastructure, that we get a whole new technology stack to help advance that mission in places we haven’t been able to serve in the past, and we get an injection of new engineering DNA that has tackled similar challenges, in a myriad different ways, over as long a period as we have. It puts a truly unmatched set of experiences and talent together and brings great perspectives that will help improve the fantastic products we both deliver.