SparkPost has just launched Recipient Validation, a new paid product feature that checks email addresses for you, prior to sending to them. It provides a way to check for:
- Email address syntax errors and typos
- Invalid domains/domains unable to receive email
- Throw-away addresses on disposable domains
- Non-existent mailboxes
- .. as well as telling you if the address is served by a free email provider (e.g.…@gmail.com),
.. is role-based (e.g.sales@…),
.. or looks like a typo (e.g. ask about firstname.lastname@example.org – we will tell you “did you mean email@example.com“).
SparkPost has the broadest visibility into bounces and invalid addresses on the market because we send 37% of the world’s business email.
If you need to validate a lot of email addresses, the most efficient way is to use the page in the SparkPost app described here. This can deal with millions of addresses at a time and is the fastest form of bulk validation.
The API provides a way to validate email addresses one at a time. This could be built, for example, into your website sign-up process, and is an ideal way to ensure your contact database is as clean as possible.
You should call the SparkPost API from your server, not from the client (browser) side – don’t expose your API key, otherwise, bad things will happen. To demonstrate how to call the API, I’ve built a simple command-line tool that you can play with. First, let’s set out some goals for this tool:
- Provide a simple, short but working example of how to use the new API
- Works with SparkPost.com in the US, the EU, and SparkPost Enterprise
- Easy to configure and get started with
- “Scriptable” into your own processes and workflows easily
- Takes either a single address or many addresses
- Works with the same file formats (.csv) as our efficient bulk-load web app
I found the SparkPost API response time is pretty fast, but the round-trip time between your servers and ours is always a factor worth considering – more on that later.
There are just a few steps to follow – see the GitHub repo README for installation steps. You can quickly be up and running once you’ve set your API key, with the required permission. It’s good practice to grant keys with just the least privilege needed.
export SPARKPOST_API_KEY=<YOUR KEY HERE> ./sparkyRecipValidate.py --email firstname.lastname@example.org
You will see:
Scanned input from command line, contains 1 syntactically OK and 0 bad addresses. Validating with SparkPost.. email,valid,result,reason,is_role,is_disposable,is_free,did_you_mean email@example.com,False,undeliverable,Invalid Recipient,False,False,True, Done
There’s a useful subtlety going on here. The text line “scanned input from …” is going via stderr – which means it appears on your screen, even if you redirect the tool output to a file. The –email parameter accepts more than one email-address, comma-separated; you can also use the short-form -e. This costs us nothing in code complexity, thanks to the beautiful argparse standard Python library. The command
./sparkyRecipValidate.py -e firstname.lastname@example.org,email@example.com > out.csv
Scanned input from command line, contains 2 syntactically OK and 0 bad addresses. Validating with SparkPost.. Done
Reading and writing files
You can also provide the tool an input file to process in plain-text .CSV format, the same as the SparkPost UI accepts. Thanks to Python argparse, we can easily handle named input files, or from stdin so that the tool acts as a “filter”. All the following are valid ways to provide an input file:
./sparkyRecipValidate.py -i valtest.csv ./sparkyRecipValidate.py --infile valtest.csv ./sparkyRecipValidate.py <valtest.csv cat valtest.csv | ./sparkyRecipValidate.py
The tool can, therefore, act as a Unix-style filter, so it can be easily plugged into your own workflows. The first three forms allow our code to “rewind” (seek) the input file stream to re-read it. The tool can check, and report on the number of email addresses in the file before starting the actual validation, like this:
Scanned input valtest.csv, contains 15 syntactically OK and 0 bad addresses. Validating with SparkPost..
The “pipe” form does not allow seeking, so you’ll see:
Skipping input file syntax pre-check. Validating with SparkPost..
You’ll see no actual code in this project for deciding whether to read from stdin or a file and similarly for whether to write to stdout or a file. It’s all taken care of elegantly and automatically by the argparse.FileType parameter. Yay! I love Python for things like this. Another sweet trick I should explain, is how the –email option works. I made the file-input version first, and thought I’d have to refactor everything to handle command-line address inputs. But wait! Python lets you do this:
cmdInfile = io.StringIO(args.email.replace(',', '\n')) cmdInfile.name = 'from command line'
Oh yeah! That takes the –email argument payload (comma-separated), makes a “file” with newline-separated input, and gives it a filename so the comfort reporting doesn’t look silly. Python makes lazy programmers look like heroes, definitely the bright side of life.
Bonus feature: syntax pre-check
The tool counts and reports the number of addresses before starting the validation, if it can; so you know whether to go for a coffee, for lunch, or a short vacation while it completes. Rather than just count lines in the file, it uses the excellent email_validator library to do a proper email syntax check, and report addresses as OK/bad before we start the actual validation. As long as we disable its own deliverability checks (like this) it will be fast.
Every address in the file is submitted to SparkPost, this is just a pre-check before we start. In case you wish to disable the pre-check, simply add the –skip_precheck flag.
You don’t need Excel to work with .CSV files. csvkit is an awesome, free command-line tool kit that enables you to sort, filter, and pretty-print .CSV files, making them much easier to read. These tools also play nicely as Unix-style filters, for example:
./sparkyRecipValidate.py -i valtest.csv | csvlook Scanned input valtest.csv, contains 15 syntactically OK and 0 bad addresses. Validating with SparkPost.. Done | email | valid | result | reason | is_role | is_disposable | is_free | did_you_mean | | -------------------------------- | ----- | ------------- | ----------------- | ------- | ------------- | ------- | --------------- | | firstname.lastname@example.org | True | valid | | True | False | True | | | email@example.com | True | valid | | True | False | False | | | firstname.lastname@example.org | True | valid | | False | True | True | | | email@example.com | True | valid | | False | True | False | | | firstname.lastname@example.org | False | | Invalid Domain | False | True | False | | | email@example.com | True | valid | | False | True | False | | | firstname.lastname@example.org | True | valid | | False | True | False | | | email@example.com | True | valid | | False | True | False | | | firstname.lastname@example.org | True | valid | | True | False | False | | | email@example.com | True | valid | | False | False | False | | | firstname.lastname@example.org | False | undeliverable | Invalid Recipient | False | False | True | | | email@example.com | True | valid | | False | False | False | firstname.lastname@example.org | | email@example.com | False | undeliverable | Invalid Recipient | False | False | True | | | firstname.lastname@example.org | True | valid | | False | False | True | | | email@example.com | False | | Invalid Domain | False | False | False | |
A few random thoughts. Experienced Pythonistas can skip this section.
I have found myself forever needing environment-variable readers/checkers, URL fixer-uppers, and other helper-type functions. As a beginner Python programmer, I was copy/pasting these between files. No more! I’ve reached the point where I should have a file common.py with this sort of thing, to bring order out of chaos.
Bringing these into the main code scope “as if they were in the same file” is simply a matter of
from common import eprint, getenv_check, getenv, hostCleanup
To pipenv or not pipenv…that is the question
An experienced programmer has pointed out to me that pipenv is not good for everything. However, for mini-projects like this, it makes the installation easier for you. Because I make only basic use of external libraries, the Pipfile has:
[packages] requests = "*" email-validator = "*"
I don’t put Pipfile.lock into the repo. That means your system fetches the current stable version of the above packages when you install it. If this was 24×7 Production code, having a defined Pipfile.lock (with versions and hashes) provides safety i.e. “the version you tested is now the version in Production”.
I feel that would be overly pedantic for a demo project. The Requests library has had a few recent vulnerabilities found and fixed, and tying you to a specific version seems less good than you getting the latest stable versions.
This is another shiny thing. I love the way Travis tests my code is not broken, and checks for compatibility across several Python versions each time I check changes into Github. Of course, it works on many languages, not just Python.
Environment variables vs .ini files
I’ve pretty much switched to using environment variables rather than .ini files now, for the following reasons:
- Security. Having a .ini file lying around with API keys in is not great. You have to remember to chmod 0700 it. If you really want to set environment variables up in a file, just create a .sh script (and chmod 0700 it).
- Heroku. This provides an elegant way to set config via environment variables that you can even edit after deployment on their web UI. I like that.
- Playing nicely with Travis. It took me a while to realize this. You can set up private environment variables that get used for automated tests. That means your test cases can be “real” i.e. do things via a SparkPost account, exercising more of your code and providing better quality tests for little effort. In contrast, checking in a .ini file with credentials to your repo, so Travis can find it, would be a bad idea.
I think this sentiment applies to any language, not just Python – see here.
More on latency/speed
I found that response time from the UK to our EU service in Dublin was around 40ms, whereas the response time from the UK to the US service is around 200ms. There’s nothing too surprising about that – the difference is due mostly to the distance involved (nearly 10,000 miles round-trip) and the corresponding router hops. I suggest using the service endpoint near your own servers.
On an Amazon EC2 Linux host in US-West2, validating with sparkpost.com, 100 recipients took around seven seconds. As you’d expect, runtime is basically linear for this tool. For large batches, the SparkPost app (web UI) is considerably faster – use it instead of this tool.
|Number of recipients||Demo tool runtime
|SparkPost web UI runtime
It’s a wrap!
We have taken a stroll through Recipient Validation via a tiny command-line tool. I hope you enjoyed reading this (and using the tool) as much as I enjoyed writing it. That’s it for now! Happy validating.