Have you ever supported an established monolithic code base— we’re talking 5 to 10 years established? I think a decent amount of us have at some point in our careers. I’m not saying it’s bad, as monoliths have their benefits, but it can be difficult to present new opportunities and learning experiences with such an established code base. It also comes with its own unique challenges, such as slow deployments, limited testing framework, and other complexities. But it also comes with many benefits, such as performance, established policies, and flexibility. Many of SparkPost’s newer services are built using a microservice architecture with established testing frameworks, quick development and deployment times, and increased reliability. SparkPost has had great success with this architecture model and it is where our team is and wants to be.
For this discussion, I’ll focus mostly on our Transmissions API. This is by far our largest API endpoint in terms of requests, and also complexity. Since the API was completely written in our monolith, it was the main driver for the project for reasons I’ll touch on below.
I came on to my team about a year and a half ago as the Technical Manager. The team was a single, large, multi-manager group working out of one massive repository. Although it was a monolith, the team had been very successful in developing a performant and extendable system for injecting and delivery email.
When I joined, the company was coming to the end of a multi-year re-architecture project encompassing many services and teams. Code deployments to production were going out about twice a week with the updates all done in-place. This, coupled with a complex and limiting test framework, lead to slow development pipeline. This was hurting our velocity for new features, specifically around our Transmissions API and templating engine.
About 8 months ago, two events coincided that really got the ball rolling in the right direction. The first was an engineering re-organization towards service based engineering squads. This broke up the one large multi-manager team I mentioned above into two smaller serviced based squads that had more control and autonomy over specific services in the monolith. One team took on the services as they related to the injection of mail, with the other team focused on mail delivery and policy. This lead to some issues as both sets of services were still in the monolith and a single repo, tying us together as far as deployments, bugs, and reliability. On the other hand, we had more ownership and responsibility for our specific services. This gave us more power to look at untying ourselves from the monolith and other teams.
The other event was the completion of the large, multi-year re-architecture of our software and environments that I mentioned above. This greatly reduced our operating costs and complexities. We no longer had to support tens of environments, but rather just a handful. The negative and real determining factor was that our monolith was hitting some scaling limits on the Transmissions API. We spent months trying to tweak, tune, and rework our system to increase stability and satisfy customer expectations. Although we did stabilize, we felt it was time to press forward with microservices, and received strong support from upper management.
How We Are Moving Forward
We’ve been on this journey for over 9 months, and I want to share some of the key steps we took that set us up for success. Our first decision was what tech stack to use. Our company has a large background in Node.js from other teams already building microservices, including a lot of tooling. On the other hand, we did have some engineers who wanted to look into Golang. We eventually went with what we knew, and while it may be obvious, try to stick with a single tech stack across your engineering organization. Having tooling and engineering know-how really lets the team hit the ground running versus stumbling through a whole new stack.
Next, we had a team of engineers that had been working in low-level languages (C/C++) for multiple years, and we wanted them to start building microservices in Node.js. Everyone was very excited to dive into something new, but we needed to establish good habits and best practices out of the gate. To help with this, we seeded the team with two very experienced Node.js developers from other teams (and eventually added two more). They provided training and guidance during the initial development. This was key to getting the team off on the right foot.
Now it was finally time to start breaking the Transmissions API out from the monolith. Instead of diving into our ultimate goal of replacing the complex services of the Transmissions API, we started with one of our simple CRUD API endpoints: Tracking Domains. We knew this could be broken away easily into a microservice and we needed to get the team trained and comfortable with the new architecture. We rotated in different engineers onto the new work while the others supported the existing service to ensure everyone had a chance to work on the microservices architecture and gain experience.
Where We Are Now
As of now, we’ve completed and released our new Tracking Domains API endpoint, built multiple new features using microservices, and are on our second phase of a complete architecture overhaul of our Transmissions API. We have completed a seamless migration to our phase one architecture into microservices for our company’s largest API, and are well underway into phase two. Given the extensive scope of the project, we broke it up into phases. This was essential to adhere to one of our core engineering tenants: always be shipping. Although our first phase required a full migration of customer traffic, it has put us into a great position to do incremental development of features directly into our new pipeline as we pull functionality away from the monolith.
To address some of the limitations stated earlier, our team is now deploying multiple times a day using AWS CodePipeline, versus our old twice a week deployments. All of our new features are in containers, with support functions in AWS Lambdas. Testing is seamless and actually somewhat enjoyable compared to our legacy framework, which is one of the highlights for the engineers. The best part, in my opinion, is that team morale is the highest it’s been since I started. Everyone is enjoying the flexibility microservices provide and finally moving our services to a modern architecture. We’ve even had a couple more transfers to the team because more people want to get involved with what we’re doing.
I hope some of these insights into how we are handled our transition are helpful. One good takeaway is to always set your team up for success. We ensured that the team had resources, training, and guidance to make the necessary transition to microservices. As such, the team morale is high, productivity is high, and our new architecture allows for faster product development. For our customers, this translates to an increase in the cadence of new features, scalability, and reliability, while decreasing our time to fix bugs and error rates. Ultimately, this will lead to more growth both for our company and our customers while providing new and exciting work for our engineers.
~ Nate Durant
Technical Manager – Transmissions