Complex Software is Hard

George Schlossnagle
Feb. 11, 2015 by George Schlossnagle

tl;dr – It’s easy to build systems for when things go right, it’s hard to build them for when things go wrong.

george-schlossnagle340x480_0

Building software to a spec is easy. Building software to perform in the situations you can anticipate is only a tad bit harder. It’s the engineering involved to build a system that is instrumented for observability and constructed with tools to operate under conditions you didn’t foresee
that is the true feat. Software operating time and a heterogeneity of customer needs are the two reagents necessary to build truly kick-ass products. Message Systems brings almost two decades of software development and a customer base extending to the world’s largest and most challenging environments (spanning huge ESPs, telcos, social networks and e-businesses).

Port25 adds to the equation similar depth of operational experience (actually slightly longer) and a broad deployment base across ESPs large and small, as well as a huge swath of e-businesses. Simply put, together we bring a depth of email experience unmatched in the market, and the experience of delivering high quality software and service in the world’s most challenging environments. I could not be more excited about the combination, not only for how it expands the markets we can serve, but because the bringing together of such engineering talent and experiences will inevitably make both sets of products stronger.

I won’t speak to the Port25 origin story, but I’m happy to share a bit about Message Systems. Back in the late 90s, my brother Theo and I were running a small scalability consultancy. We mainly helped clients who had built Internet sites in the mid 90s grow, and operationally manage those sites as traffic exploded through the back half of the decade and into the first Internet bubble. We were both active in and around the Apache project and did quite a bit of work helping people engineer for scale.

Some of our clients also engaged in very large-scale B2C email communication, including an online gaming company that sent in the 20 million mails per day range (that’s medium-sized peanuts today, but in 1998 that made them one of the largest volume emailers in existence). As we helped them scale their operations, we were constantly plagued with the lack of at-scale operability in the mail solutions available at the time. Our software was born of many sleepless nights staring at malfunctioning software solutions without the ability to easily understand the precipitating conditions, observe the current state, or manipulate that state quickly.

The solutions available at the time all suffered from a number of different flaws:

  • Poor scalability and resiliency, especially under adverse conditions (large back queues, things like that)
  • A lack of good administrative tools that give you fast and non-resource intensive queue insight and online management, so that when you have operational issues you can quickly diagnose, isolate and remediate them. Anyone that has ever managed mail knows both how painful and necessary this is.
  • An inability to set differing delivery policies, limits and tunings for different domains and service providers; and an inability to adjust or tune delivery parameters based on disposition and handling policies at receiving ISPs.
  • A lack of extensibility, meaning that if you wanted to alter or augment the way the MTA operated, you needed to fork a source code base and maintain your changes as patches, creating a maintainability nightmare (milter didn’t exist back then, and even today doesn’t cover things like the desire for custom IO management, alternate delivery protocols, integration with external data for delivery policy and custom logging).

With these issues out there (and causing us real – and personal – operational pain at customers), we decided we should build our own MTA product that provided solutions for these issues.

Some of these problems, particularly the lack of a singular in-process view of the runtime queue and scalability issues were just fundamentally difficult to solve within the architectures of the open source solutions available at that time (which were all more-or-less direct inheritors of the original Sendmail architecture), so we decided that instead of just forking one of the major open source products and building off that that we were much better off writing a product from scratch where we could design optimal solutions from the ground up.

We opted for a large monolithic process where we could keep a single in-memory index of the queue and all message metadata, with isolated queues to help isolate deliverability issues. We built powerful operational tools as a first class citizen in the product and took what was already a number of years of operational email experience into account in all of the feature definitions. We added a modular extensibility framework that allowed for adding functionality to the server through drop-in DSOs, and built various interpreter layers (first Java and Perl, both now dead) and embedded scripting languages (our own Sieve variant and later Lua) on top of it to make it easier to consume. We built that subsystem to be used by ourselves as well as customers, ensuring it was always a first-class system. We built in the ability to have messages queued and manageable multi-dimensionally across both destination provider and local tenant, giving the ability to set different policies and have completely separable manageability controls across different senders on the same platform (hugely important to our service provider clients, but also to many larger/complex non-service providers as well).

I’m not a believer that first is necessarily best, but the great benefit that the large customer community that Message Systems and Port25 bring together (which is only growing as we add infrastructure-in-the-cloud services through SparkPost) is that every customer benefits from the demands of the one that came before. That customer community consisting of folks not only /using/ our product but running it themselves in their own unique ways helps us avoid the issue of a monoculture. The precipitating conditions and shape of the system 17 years ago was different than it was 10 years ago, which is different then it was a year ago – all good indicators that the challenges in the years to come will be different still. A diversity of operating environments allows us to see that early and keep all of our customers ahead of the curve (as we have consistently in the past).

Anyone can write their own mail transfer agent (the ‘s’ in SMTP stands for simple after all – just from the spec a competent programmer could write a basic MTA from scratch in python in a day or so). But in an open network environment like email, it’s the multitude of edge cases and obscure interop behaviors that are hard to get right. Complex, custom software built in a monoculture will always be subpar to when you are a player in a rapidly changing industry. The things I’m most excited about regarding our recent acquisition of Port25 are that we reaffirm our core mission of providing the absolute best email infrastructure, that we get a whole new technology stack to help advance that mission in places we haven’t been able to serve in the past, and we get an injection of new engineering DNA that has tackled similar challenges, in a myriad different ways, over as long a period as we have. It puts a truly unmatched set of experiences and talent together and brings great perspectives that will help improve the fantastic products we both deliver.

Share your Thoughts

Your email address will not be published.

Related Content

New Feature Announcement: Stored Templates by Subaccount

Introducing one of the most requested enhancements by both our enterprise customers and developer community, stored templates by subaccount.

read more

SparkPost’s Updated Service Plans

SparkPost’s CEO reviews our updated service plans. All paid plans now include full-service support, and we launched a new free developer account option.

read more

Operating DNS on the AWS Network: Challenges and Lessons

Learn how our team worked with AWS to address a challenging DNS performance issue—and tips for troubleshooting with the AWS support team.

read more

Start sending email in minutes!

The world’s most powerful email delivery solution is now yours in a developer-friendly, quick to set up cloud service. Open a SparkPost account today and get started for free.

Get Started

Send this to a friend