In October of 2019, I embarked on a project to create sample code for an integration between SparkPost and Segment. The impetus for this project was that a large sender wanted to take SparkPost log information which we call “events” and get them into their Amplitude account in order to manage communication interactions with their customers. I have to admit that I had no prior experience with Segment nor Amplitude, so I didn’t know how this was going to go, but I was excited to take this new project on and see where it went. 

I’m happy to say that the code is done and handed off to the customer for their use. The code itself is extremely short yet flexible enough to limit long term maintenance costs. Read on to hear some of the challenges I faced while building this integration in story-form.

The biggest challenge was figuring out the approach that I needed to take. Since SparkPost supports sharing event data via both REST API and webhooks, I knew that I needed to find a way to either push JSON into Segment via webhooks or to pull it from SparkPost into Segment using SparkPost APIs. My preference is always webhooks, so I started looking for ways that Segment can simply consume webhooks from SparkPost. With so many products like Loggly or LogDNA that simply consume webhooks, I figured that Segment would have a generic webhook reader like the others. So I created a free account and started to play around. I ended up looking at all the connections Segment had, both as Source and Destinations. But, I did not find a generic Source that consumed webhook data so I went to the documentation and searched for the term “Sources”. That took me to the Overview section which I thought would be a good chance to start to learn about Segment, its terminology, and its components. After a short read in the Overview section, I noticed a Navigation menu near the top of the page that had a link to Sources, and like an easily distracted squirrel, I had to click on that link.

That link ended up being the golden ticket! On the left-hand side of the page were all types of Sources, including the one that I wanted, “Custom”. They also had links to several of my competitors so I had to look at those. They all seemed to fall into a similar pattern of sending webhook data from the email platform to Segment for ingestion. Cool, that’s in line with what I was thinking.

So I backtracked to the “Custom” link that had a very nice step-by-step layout on how to build a custom function to consume webhook data. But first, I would have to request the rights from Segment to do this. Luckily someone from SparkPost had a connection with Segment and I was able to get the ability to use this “custom function” module. After creating a project, the Segment module displayed the location that I needed to send SparkPost webhook data to. Next, I was presented with a code text editor page where I could build my function to consume and process the SparkPost webhooks. This is actually the one part that caused me to pause.

What language was I supposed to write the code in? I had no idea! Segment does have a set of Templates of sample code but which template was I use? Since I didn’t know which template code was the closest to what I needed I simply used the “Default” template. As embarrassed as I was, I couldn’t figure out which language the template was written in. I graduated with a computer science degree a long time ago (when the major languages were Fortran, assembly, LISP and Pascal) and have learned over a dozen languages throughout the years. So many of the languages look the same that I didn’t know which one Segment was using. Luckily, I decided that it looked close enough to Javascript and decided to use that. It’s a good thing since that was, in fact, the language. 

Now that I had a language I had to figure out how to process each webhook event that would be sent from SparkPost in JSON batches of anywhere from 1 to 1,000 events at a time. I didn’t know if Segment’s Javascript engine supported all Javascript capabilities but I decided to give the forEach method a try on the incoming package and it worked. The next step was to figure out how to store each variable needed in order to set up the Segment Collections that I wanted to build. That meant that I had to figure out how to get data into Segment for testing. Segment has a “Capture New Event” button just above the template/test data pane. That button cleared out the pane below and displayed a message that it was now waiting for data. I then went to Postman where I could send a few emails that in turn would create SparkPost data events that were already configured to send data to Segment (using the endpoint location Segment already gave me when I created the project). Lo and behold, “event” data was sent from SparkPost to Segment and captured in the development environment. The JSON data was displayed for me so I could see the JSON block and field names that I needed to store. It took a little playing around but soon enough I was able to store the necessary data to build each collection I wanted to store within Segment. SparkPost has over 20 different events, so I did what I could to create a generic reader that would process any “event” type.

For the most part, I stored:

  • The email address
  • The 30+- different data sets sent in the webhook body and placed that into a field
  • The event type to use for the Collection 

Once I had the data, it was easy to use the Segment.collection module to store the data.

All-in-all, I was very happy with the code and ready to show it off. Since both Segment and SparkPost shared this customer I was doing the sample work for, I needed to run the code by our contacts at Segment. 

Upon doing so, I realized I had missed a couple of fundamental pieces due to my lack of Segment knowledge.

The first issue is that I was using something called “collections”. I mean that sounded right. I’m collecting specific types of events, deliveries, bounces, delays, opens, clicks, etc. Ok, that was wrong. What I didn’t know was that the Segment world revolves around a specific user or more specifically “userID”. That allows Segment to build a persona for each user; in this case, it would be information about all the emails sent to that person along with how they interacted with those emails. That took me to the second issue– I needed to get SparkPost to send me the Segment “userID” with each email so it could be used to build the user profile. Luckily SparkPost employs the concept of metadata where any key/value pair can be added to an email; then that metadata is added to every event tracked by SparkPost. Once added, all I had to do is pull that field from the JSON block so it could set the Segment “userID” field.

That brings me back to the first issue. Now that I have the person’s “userID”, each SparkPost event becomes a “userID tracking event”– not a collection of data like I thought. So I rewrote the code to leverage the Segment method called track

Those two changes were all it took to have Segment consume the SparkPost data. Segment already had a connection to Amplitude so all I had to do is connect Amplitude as a destination and the data started to flow when I turned everything on! It was a Friday evening and I was done. Scotch time for Jeff.

Too bad that I forgot that I left the data flow on from SparkPost over the weekend. I came back to two test systems that turned off my account due to overuse, and I have to admit, I went WAY over my limits….sorry Segment and Amplitude!

Now it was time to show the customer. There was one unknown that might take me back to the coding board. Segment already has several email platform connections. They all follow a similar approach where they take the JSON data apart and reformat it to a specific standard that Segment has instituted. I didn’t do that. Instead, I just took all of our fields and sent them as is. Also, SparkPost has 2-3x more data in our JSON blocks than our competitors and I didn’t want to lose any data. If the customer wanted me to repackage the data, I would do it, but I really didn’t feel that would be in the best interest of the customer or SparkPost. If the data was, in fact, going to be used to build the persona inside Segment the data transformation would be more important; but since the real task was to get the data to Amplitude, I wasn’t sure it was worth going through the transformation step.

Luckily, the customer agreed. They figured they could simply aggregate all the data within Amplitude and use what they wanted without losing any of the rich information SparkPost provided.

Now that the customer had okayed my sample code, it still was not time to celebrate with some Scotch. Yes, there is a theme to my celebrations.

So that concluded this project. In retrospect, I think it was fairly easy. The code itself was only 45 or so lines of code and flexible enough to consume any new “events” created by SparkPost. If I had to transform the data for Segment, the code would be a tad longer but still fairly short and easy to maintain. This speaks well of what Segment has created and how simple the process of bringing a new source to the platform can be.

Now that this blog is done, you know what time it is…..Scotch time.

Happy sending,

Jeff 

If you want to see the code, please reach out to me directly; [email protected] and I’ll be happy to share it with you.