Email validation has gone through a few different eras
A long time ago in an era far, far away… it started with Syntax Validation
Checking an email address for syntax accuracy has been the simplest version of email validation. The core elements of a valid email address are the local part, the @ symbol, the domain, and finally, the extension (.com, .org, etc.). To help standardize all the various syntaxes, specifications called Requests For Comments (RFCs) were published to determine what characters would be acceptable for local and domain parts. These RFCs eventually became quite extensive and created the need for open-source libraries to help validate email syntax in many languages.
Validation SMTP Command and The Attack of the Spammers
Recognizing the need for help on validating email addresses, Internet Service Providers (ISPs) started to build in email address validation functionality. Thus, “VRFY” (also known as Verify) was built as an SMTP command which enabled senders to ask a receiving mail server if an email address was valid. With the hope to use VRFY to bring peace and order to the galactic Internet, it soon fell into the wrong hands of the dark side; spammers. After wide-scale abuse of this functionality, ISP administrators disabled VRFY, leaving email address validation in disarray.
SMTP Ping (The Spammer Menace)
After the fall of VRFY, senders creatively devised SMTP Ping, a different method to verify whether or not an email address was valid. SMTP Ping would be used to check against a remote mail server to see if an email address was alive. A connection to the Internet Service Provider (ISP) remote mail server, such as Gmail, would be made as if actually sending an email, but abruptly cut short without actually sending the email.
Typically, the conversation held in the connection between the sending mail server and the receiving ISP mail server would look like this:
In some scenarios, the ISP could provide feedback like this instead:
With SMTP Ping, senders could cut the conversation short after seeing the response back from the ISP, after requesting to send mail to the specified email address. This became a way to ping against an ISP to see if the receiving mail server found the email addresses to be valid or invalid, with some degree of confidence.
The Dark Side of SMTP Ping
ISPs consider SMTP Ping as spammer behavior. ISPs can easily tell that you’re doing this by looking at the conversation patterns: Calling in and hanging up repetitiously, with no (or very little) messages actually being sent, ends up in their mail server logs, After the history with SMTP VRFY, this type of behavior is now known to be spammy. ISPs are cracking down on this behavior and cracking down hard. Microsoft for example, considers this type of practice to be malicious and Hotmail finds SMTP Ping as evidence of a directory harvest attack. SMTP Ping attempts in progress will typically drop a hard block on all connections from the sending IP address. ISPs dislike SMTP Ping, and so do blacklist operators. Keep it up, and you’ll almost surely end up getting blacklisted. Long story short, it’s a really bad practice.
A New Hope: Data-Driven
Rather than rely on SMTP Ping, there’s a different data-driven approach that does not make enemies with ISPs. Validating email addresses can be done by looking up against a large data set, with event data including hard bounces, deliveries, and engagement, as well as incorporating syntax validation, typo detection, DNS queries for valid domains, and quality checks for free, role-based, and disposable email addresses. This method heavily relies on the depth and breadth of the data the email validation tool or service is built upon, instead of depending on the ISP to provide back a specific response. You may not want to judge Master Yoda based on his size, but you’ll want to judge an email address validation tool by its data size.
SparkPost’s Recipient Validation is built on top of its large email data footprint, sending more than 37% of the world’s B2C and B2B email. Our data science team has done a thorough analysis of billions of email bounces and delivery events. Our findings establish that a single hard bounce isn’t enough to establish you shouldn’t send to an address. Using our data footprint, we are constantly updating our list of recipients and our algorithms to capture the true validity of a hard bounce, and analyzing all related email events to best answer the question: Can you deliver to this given email address?
As we continue to build and iterate upon our Recipient Validation, our goal is to make ours the most dependable and fastest validation tool on the market. Rumor has it our Recipient Validation will be able to make the Kessel Run in less than 12 parsecs, or at least something along those lines…
— Isaac Kim, Technical Product Manager