Performance Monitoring at SparkPost

Jamison Moody
Feb. 19, 2016 by Jamison Moody

Performance Monitoring of Injector Node Recommendations

performance monitoring injector node

As the lead performance engineer at SparkPost, I get asked to simulate a lot of different customer injection profiles to search for possible problems with the performance of a given approach. While the specifics of each of our customers can be massively different, the reality is that the hardware problem areas that we need to worry about are relatively small in number. For that, I can give you a handful of tricks and tips that I use to find the problems in performance on my injector nodes. Injector nodes are the name we give to any server that is doing REST or SMTP injections into a SparkPost environment. The key here is to find the performance bottleneck and focus on that problem.

Something like low throughput is just a symptom of a different problem all together. The key is to find the bottleneck and eliminate it. The first thing to consider is what do I monitor? The second is how closely do I monitor? So if you are interested in CPU utilization, the act of monitoring CPU utilization burns CPU. So if you aren’t careful how closely you are monitoring something, you run the risk of the monitoring becoming your bottleneck. Over time, I’ve settled on doing monitoring every 60 seconds during all my performance tests. Mostly because it’s often enough to allow me decent graphs and ends up doing very little in the way of damage to my overall performance of the product. So the big areas of concern for generic performance monitoring are as follows: CPU, Memory, Disk I/O, and Network. These four areas represent the cornerstones of the performance of a given environment during an injection. The following tools are for Linux environments you will need to have the packages sysstat and procps installed to use the commands listed below.

CPU

For CPU, there is a great Linux tool for the end user that will provide a sufficient amount of information.  The mpstat command gives you a lot of information about the CPU’s activities. Now I recommend the –P ALL because it allows you to see down into what each individual CPU is doing. On more than one occasion this has revealed that a program wasn’t multithreading for me. What you’ll see in that case is a single CPU running at near 100% utilization and the rest of them are basically doing nothing. And if you just do a plain mpstat you can’t see it because the totals are averaged across all the available CPU’s you have. So when you are looking for the performance bottleneck make sure that maximizing the use of your CPUs.

Memory

Another area to find bottlenecks is in memory utilization. Some times the injection tools we use make dynamic memory errors and it helps to monitor memory on your environment so you can detect when something like this has happened. Depending on the language used for your injector this may or may not be an issue. I use the free command for this kind of check. It’s program independent so you can detect generic leaks in your injector environment easily with it. Free has –k, -m, and –g to switch between kilobytes, megabytes and gigabytes.

This gives you a higher level overview of what’s happening with memory on the system. pidstat can also provide process level monitoring if you aren’t clear what process is using up your memory during an injection. But generally it would be easier to just use a generic top and look at the utilization of memory there to look for the guilty party in a memory leak investigation. But at this point, you’ll switch to the proper profiling tool of your preference rather than depending on external tools so you can get the specific details of the problem.

Disk I/O

One area that your injector can suffer performance problems is in how it’s disk I/O ends up working. If your injector program is making lots of calls to disk inefficiently, it can end up slowing itself down without you even realizing it. Additionally database calls can also cause problems on this front, especially if your database is doing some garbage collection during your run that you are unaware of. The easiest way to detect this is to use the iostat command. I usually use iostat –m –t when I’m monitoring.

The transactions per second (tps) value, is actually the illusive IOPS number that you’ve been looking for. It’s a more generic way to talk about how much the disk is being utilized. Once again, depending on how you have your environment setup, you maybe over utilizing a single disk doing reads and writes for your injector. RAM drives and RAID setups are easy ways to spread the work out between multiple disks to easily increase your efficiency.

Network

The final generic area of monitoring during your injections should probably be related to networking. Sometimes you check everywhere else and it appears that everything is working normally but throughput is still suffering. So the tool you need to use is sar to see what the status of the NIC card you are using to communicate with the outside world.

Your focus should be the eth cards and oftentimes you’ll see one that’s busy but if you have others, they’re usually not. T and R are transmitting and receiving respectively, so the key here is to look at those numbers and look for a bottleneck. For example, the graph below is a combined total of both values showing a network bottleneck.

Screen Shot 2016-02-02 at 3.22.04 PM

The combined throughput maxed out the capacity of the environment for the entire period of the hour long test run. By monitoring networking, I was able to see that and point to it as the bottleneck that we needed to focus on.

Using all the above tools, it’s an easy way to gain insights into what’s happening on your system and what sorts of things you should be focusing on as you are trying to optimize the performance of your injector. Sometimes, something as simple as putting your disk in RAID 10, for example, can pay off with massive dividends on database behavior. This allows you to extend the life of existing hardware or focus future purchases, specifically on the bottleneck your software is facing today.

Dev Survival Guide Blog Footer

2 Comments

  • It would be nice to see these performance numbers if you would use a Linux kernel version 4.8+

    Reply
  • Marcos, thanks for your interesting in my article. The examples in the article are just to provide a basic idea of what the results from the commands will look like rather than any sort of realistic performance data, since every customer will likely develop their own injector optimized to their environment needs and requirements. The goal was to provide some basic performance debugging information to look for problems if they weren’t able to reach their targets for performance in a particular environment.

    Reply

Share your Thoughts

Your email address will not be published.

Related Content

Meet Our New West Coast Developer Advocate

From Intern to Engineer to West Coast Developer Advocate -- Avi’s got big things ahead of him! Learn more about his journey and what he's most excited about.

read more

Community Spotlight: Rise And Shine With This Alexa Skill

Getting out of bed in the morning is easier with coffee and your new favorite Alexa skill: MyMorning.Online, our winners from the recent AngelHack.

read more

Our Top 5 Email Template Hacks

From creating standards that can apply to many templates to design hints on validating your data, here are our top 5 email template hacks.

read more

Start sending email in minutes!

The world’s most powerful email delivery solution is now yours in a developer-friendly, quick to set up cloud service. Open a SparkPost account today!

Get Started

Send this to a friend