In the final part of this blog series, I’ll dive into more detail on the second more realistic implementation scenario I outlined in part 2.

Connecting to the ESP – SMTP and RESTful API

Before getting too deep into the workflow described in part 2, let’s talk about a couple of different approaches to sending emails. First off there are typically two different ways to connect to a modern ESP, through SMTP, and through RESTful APIs.  When connecting to an ESP through SMTP, the SMTP request will contain the full body of the email with each email request (this is often referred to as a fully qualified body). The second approach is to connect to the ESP via RESTful API. When taking that approach, you often have the option of either sending the email body as part of the API call or referencing a stored email template on the ESP server.  Most modern ESP’s also allow you to send substitution data in the RESTful API request that will be merged with the template before sending the email off to the recipient. In both cases, this is simply how you are interfacing with the ESP, the email is always sent out via SMTP to the targeted email address. Also, this is out of the scope of the project. How the ‘connector’ sends the email is irrelevant to the overall solution but worthy of mentioning as part of our discussion.

Stored Templates

The optional Template Library is added into the diagram for completeness, but really has little to no effect on our discussion so I’ll assume for this blog that the email request is using the newer technique of connecting to the ESP via RESTful APIs and sending customer substitution data that will be merged into the template before sending the email. But remember, this really has no effect on best practices as it pertains to the warm-up process.  With that said, sending an email with personal and relevant data into the email may have a huge effect on user interaction and thus your email reputation and inbox placement.

Build/Send Process Capability

Above, it is assumed that the Build/Send Email process(es) have the ability to not only send single email requests via SMTP or RESTful API, but also to make multiple email requests in a single email API call; so the supporting infrastructure needs to support email requests of any bundle size. Also, while it may not be clear, the controller process is actually a library that will be included in the Build/Send Email process. Taking this approach allows for fast processing and multiple Build/Send processes that can run in a cluster.

Process Flows

There are a few process flows that will run in parallel in order to make this a smooth, fast service.

Build/Send Emails flow

Summary

  1. Retrieve email body/content from Content Management System or stored template name
  2. Optional but highly recommended for almost all emails; get data to personalize email
    • Either merge personalized data into template yourself
    • Or Hold data for transmission to ESP for their merging process
  3. Call out to controller process to find out what SDIPC to use
  4. Call connector with all the information so it can send the email(s)

Detail

For the Build/Send Email process to start, something within the backend systems must trigger the need for an email (or many emails) to be sent.  That can be anything from a cron job running at specific times, an event like an invoice being created, or any one of thousands of actions that may tell a program that an email(s) needs to be sent.

The Build/Send process(es) will then gather the content it needs. This may simply be a call to the Content Management system to get a fully built out email, or maybe the email body needs to be merged with personal information like it would be for a password reset link. Under that circumstance, the process will reach out into your data repositories to get the necessary personalization data and either change the template directly or save that data to send during the RESTful API call to the ESP for them to merge the data into the template.

Now that the Build/Send application has the content, it needs to find out which ESP and IP pool to use. The only input the Controller Process needs from the Build/Send process is the number of emails the Build/Send process wants to send and what kind (stream) of email is being sent. The Controller Process will then reply back to the Build/Send process with which ESP and IP Pool to use. Keep in mind that the Email Connector may need more than that information coming back.  In my application, four pieces of information are passed from the Controller Process to the Build/Send Email Process:

  1. The name of the ESP (homegrown) system to use
  2. The name of the IP Pool to use.
  3. The number of emails you can send.  What if a Build/Send Email process wanted to know where to send 500 emails but there are only 250 emails left in the warm-up process for that specific category?  If this was the first couple of days in the warm-up process, sending 250 emails over the planned amount may adversely affect your reputation. So the Controller Process needs a way to say how many emails the requesting app can really send in order to stay within the warm-up plan.
  4. Random text.  In order to future-proof your application against an Email Connector needing some specific information besides ESP and IP Pool, build in the ability to send undefined text from the controller to the Build/Send process that gets passed along to the Email Connector.

*Note* Notice that there is no mention of the Sending Domain being passed from one process to another. If you look closely, there may be a close correlation between Sending Domain and Stream, but then again there may not be.  For example, if you have sending domains like newsletter.company.com, or invoice.company.com, those subdomains may be closely tied to streams.  But some companies have strict policies on sending domains that preclude a tight connection. For example, they may want all email to go out using the sending domain <company name>.com.  With that approach, there is no direct correlation between a Stream and Sending Domain. For that reason, we are leaving sending domain name out of the process and use Stream instead. If you want an easy way to pass the sending domain to be used, you can add that information into the Random text field.  Otherwise, the Build/Send or Controller has the task of obtaining the proper sending domain.

With the information from the Controller, the Build/Send process can call the connector with the information needed to send the email(s). On failure to connect to the ESP, the connector will update the ‘down’ file that is used as input into the controller process when determining which ESP to use.

We will discuss the Controller Process in a moment, but it’s important to note that the Controller process must check which ESPs are down so it doesn’t suggest using a pool that the connector can’t reach.  In my system, there is a simple text file that denotes which ESP’s are down. This file is updated by the connectors after a failure to connect to the ESP is detected.

Controller Process Flow

Summary

  1. Get the list of downed ESPs (and corresponding IP Pools if supported)
  2. Pick a tracking file to check ESP/Pool allotments. In my application, I keep this current state information in 1 to N files which I call tracking files. Each file keeps track of what was sent while using that specific tracking file. There are multiple tracking files in order to protect against a file locking on LARGE fast email systems. More details below.
  3. Read the current state of what has been sent so far into a table; then lock the disk file from changes.
  4. Flag downed ESPs in the table so they won’t be used.
  5. Flag Streams that do not match requested stream in the table so they won’t be used.
  6. Search through the list for an available ESP/Pool combinations that hasn’t used its allotment. My approach was to look for the first pool with some allotment left, not for a pool that can service the whole request.
    • If the whole request can be serviced, update the allotted field by adding the requested amount to the amount already sent. Then let the Build/Send service know which ESP/Pool to use.
    • If only partial fulfillment can be made, update the allotment field to reflect that the full ESP/Pool allotment is used. Then tell the Build/Send process which ESP/Pool to use along with how many emails can be sent without going over the allotment. In reality, the allotment numbers are a guide, so if the Connector sent over that allotment there probably won’t be any damage to your reputation unless that overage was a significant amount over the allotment.
  7. If NO pool exists that can service this request, the request is sent off to another service that tracks long-term ESP/Pool combinations.  In my project, I keep a file that has all long-term ESP/Pool combinations that are available for sending in a file named Balancer.csv. When no warm-up pools are available, the warmup controller moves to this long-term service.  Each ESP/Pool is given a percentage of how much email should go through that ESP/Pool combination. The service will use that percentage as a guide to the Build/Send processes.
  8. Write the warmup pool usage results to disk, unlock the file.

Detail

Please keep in mind that while the Controller process and flow is being described separately from the Build/Send process, I decided to create the Controller process as a library to be ‘included’ into the Build/Send application/services.  This allows each Build/Send process to obtain which ESP/Pool to use without worrying about any other Build/Send process.

As described above, the first step is to obtain a list of all downed ESPs.  This will stop the connector from continually trying to send to a downed service.

This next step is a little convoluted.  Since disk access is slower than memory and I must lock the file while going through the decision process so data won’t change underneath one Controller process by another Controller process (remember, Controller processes are libraries running within the Build/Send code), I decided to build out support for multiple files that track how many emails have been sent against any given pool.  It’s an option that can be set via an input parameters file. When the Controller process starts, it looks up how many files are being used and randomly picks one of them. Each file can allot the total number of emails allotted that ESP/Pool divided by the number of files. For example, if we can send 10,000 emails through an ESP/Pool marked for password resets, and we are using 4 files for tracking allotment, each file can allot up to 2,500 emails. Once a tracking file is picked at random, the Controller process locks the file, parses it into memory, then continues through the evaluation process.

In my application, I decided to only look at one tracking file per request for an allotment.  I randomly pick one of the files and if an allotment cannot be found then I move on to the long-term ESP/Pools. Taking this approach does have its drawbacks. One file might indicate that the allotment is used up while another tracking file still has some allotment shares available. If I randomly pick the wrong file, I may miss an opportunity to use up a warm-up pool allotment. This can be fixed by trying another tracking file, but I decided to simply pass that request to the long-term pools for fulfillment. In reality, each allotment will probably get used from following requests and any accidental leftover allotment during a given period due to this approach would probably be mouse nuts and not worth worrying about.

Once the tracking file is selected, it will be parsed into memory. Now we can merge the tracking information with the ‘downed’ ESP data. During this process, I go through the tracking file in memory and mark each row with the word ‘skip’ that match one or more of the following:

  1. The tracking ESP matches a downed ESP
  2. The pool stream does not match what stream of email is being sent
  3. The pool has been created in this system but has a start date greater than today’s date

Now that the tracking table is built, we simply need to read the table until we find a pool that still has some room. Remember, I took the approach of partial fulfillment being OK and letting the Build/Send process worry about re-sending a request for the unfulfilled amount. That simplifies the code for the controller process that needs to be as tight and fast as possible. I could have taken a couple of other approaches:

  1. Keep searching until I found a pool that could fulfill the whole request.  I decided not to do that because continuing the search may not result in a hit and waste time getting back to the Build/Send process.
  2. I could grab allotment from multiple ESP/Pools to fulfill the request. This means that I would have to build an array of possible ESP/Pools. I decided to keep it simple so I’m only sending a simple set of fields back; not an array that needs to be parsed.

I’m not saying that I took the right approach, but I like it for its simplicity and speed.

If no pools are found, then the process is passed over to the long-term ESP/Pool process for fulfillment. The answer from that process is passed all the way back to the Build/Send process for fulfillment. We will dig deeper into the long-term process later.

Once the ESP pool is found and the information passed to the Build/Send process, the tracking file is written back down to disk and unlocked.

Warmup or Warmed IP addresses (Pools)?
Most of this blog series talks about warming up new IP or IP Pools, but the fact is the warm-up process is just a couple of weeks while your email sending over those IP addresses should be years. So this system also addresses a strategy for those long-term pools. Working with warmed up IP pools is a lot simpler than working through the warm-up process for new pools. There is no keeping track of allotments, just what percentage of your sending you want to use that ESP for that Pool or stream. Maybe for business continuity, you decided to use both SparkPost US and SparkPost EU for sending invoices and you decide to send 50% through each service.  After the controller decides that there is no warm-up allotment for its request, it will check the warmed-up pools.  I call this the Balancer process. The Balancer will use a random number generator to guide which warmed up ESP/Pool to use.  Since there is no writing to files we don’t need to lock the input Balance file nor have multiple versions. As I said, life is much easier for the warm-up up IP addresses and the Balancer process.

But, the Balancer process does need to check against the ESP down file and get the right stream just like when checking against new IPs.

Note: After a lot of testing I found out that only very high volume senders with a lot of Build/Send processes will need multiple files.  Each file lock only happens for a few milliseconds before the Build/Send process actually sends the email. The call to the MTA to send the email to take much longer than the typical Controller process takes to figure out which IP pool/ESP to use, so the overall process only sees slowdowns due to file locking when there are a lot of parallel Build/Send processes working at the same time. In my testing, I didn’t see any locking issues until I had 8 or 9 Build/Send processes all working at the same time. Given that each system is different, I highly recommend that you test against file locking issues and set the number of tracking files appropriately.

Check ESP Status Flow
This process is fairly easy so I’m not going to get too deep into this process. If you’re going to have a controller like we described throughout this blog, you need to know if your ESP’s are up and running so you don’t pick ESP/Pool combinations that are not able to service your request.

The approach this system takes is that the connector will write to a file called down when it fails to connect to an ESP. The Connector process will then know not to select that ESP for fulfillment. If you have multiple ESP’s it’s likely that one of those ESP’s will be up and used to fulfill your request. But something needs to remove the ESP’s names from the down file when the ESP is back up. That means another process needs to periodically read the list of ESP’s from the down file and check if those ESP’s are back up; and if they are, remove their name from the file. Easy Peasy.

Conclusion

We started out with a fairly easy task; warm up some IP addresses and found out that in reality, it isn’t as easy as it seems. It’s not like writing code that can extrapolate the inverse relationship between snails and the expanding universe, but there are some challenges that need to be thought through. My Hackathon project is a good swipe at a way to do just that. It’s a PHP project that includes most of the code necessary for implementing your own warmup and long-term IP strategies. While there is a sample application that calls the controller process and sends emails, that part of the code is up to you. You probably have that code already and just need a way to use the controller in order to help guide which ESP and IP Pools to use. As for tracking which ESP’s are up and down, again, I leave that to you. You probably already have most of the code for finding out which ESPs are down written somewhere in your current error code in your own connectors. All you have to do is write the ESP name into the file named down and another application then checks the status to update the down file when those ESPs are back up. The full project is sitting in a Github repository at:

https://github.com/jeff-goldstein/balancer

Feel free to download the project and make all the changes you need to fit your environment.

Cheers,

Jeff Goldstein