Anti-spam techniques
E-mail has become the subject of much abuse, in the form of both spamming and E-mail worm programs. Both of these flood the in-boxes of E-mail users with junk E-mails, wasting their time and money, and often carrying offensive, fraudulent, or damaging content. This article describes the efforts being made to stop E-mail abuse and ensure that E-mail continues to be usable in the face of these threats.
Spam tips for users
Aside from installing client-side filtering software, end users can protect themselves from the brunt of spam's impact in numerous other ways.
Spam filters
The continuing increase in spam has resulted in an equally exponential growth in the use of spam filter programs: software designed to examine incoming email and separate spam email from genuine email message intended for the user. A number of commercial spam filtering programs exist and are readily available, but many freeware and shareware spam filters are also available for easy downloading and installation. Spam filters are currently included as standard features in nearly every available email client, though the features of these built-in filters can be limited; for some users, this may necessitate the addition of more versatile, commercially available filtering program.
Preventing Address Harvesting
One way that spammers obtain email addresses to target is to trawl the Web and Usenet for strings which look like addresses. Contact forms and address munging are good ways to prevent email addresses from appearing on these forums. If the spammers can't find the address, the address won't get spam.
Unfortunately, there are other ways that spammers can get addresses such as "dictionary attacks" in which the spammer generates a number of likely-to-exist addresses out of names and common words. For instance, if there is someone with the address adam@example.com, where 'example.com' is a popular ISP or mail provider, it is likely that he frequently receives spam.
Address munging
Posting anonymously, or with an entirely faked name and address, is one way to avoid this "address harvesting", but users should ensure that the faked address is not valid. Users who want to receive legitimate email regarding their posts or Web sites can alter their addresses in some way that humans can figure out but spammers haven't (yet). For instance, joe@example.net might post as joeNOS@PAM.example.net, or display his email address as an image instead of text. This is called address munging, from the jargon word "mung" meaning to break.
Contact Forms
Contact forms allow users to send email by filling out forms in a web browser. The web server takes the form data and forwards it to an email address. The user (and therefore the spam harvester) never sees the email address. Contact forms have the drawback that they require a website that supports server side scripts.
Disposable e-mail addresses
Many email users sometimes need to give an address to a site without complete assurance that the site will not spam, or leak the address to spammers. One way to mitigate the risk of spam from such sites is to provide a disposable email address -- a temporary address which forwards email to your real account, but which you can disable or abandon whenever you see fit.
A number of services provide disposable address forwarding. Addresses can be manually disabled, can expire after a given time interval, or can expire after a certain number of messages have been forwarded. Some of these services allow easier creation of disposable addresses via various techniques.
Defeating Web bugs and JavaScript
Many modern mail programs incorporate Web browser functionality, such as the display of HTML and images. This can easily expose the user to pornographic or otherwise offensive images in spam. In addition, spam written in HTML can contain JavaScript programs to direct the user's Web browser to an advertised page, or to make the spam message difficult or impossible to close or delete. In some cases, spam messages have contained attacks upon security vulnerabilities in the HTML renderer, using these holes to install spyware. (Some computer viruses are borne by the same mechanisms.) Also, the images can be used to find out whether a spam message is actually read and seen by a user.
Users can defend against these methods by using mail clients which do not automatically display HTML, images or attachments, or by configuring their clients not to display these by default.
Avoiding responding to spam
It is well established that some spammers regard responses to their messages -- even responses which say "Don't spam me" -- as confirmation that an email address refers validly to a reader. Likewise, many spam messages contain Web links or addresses which the user is directed to follow to be removed from the spammer's mailing list. In several cases, spam-fighters have tested these links and addresses and confirmed that they do not lead to the recipient address's removal -- if anything, they lead to more spam.
In Usenet, it is widely considered even more important to avoid responding to spam. Many ISP have software that seeks out and destroys duplicate messages. Someone may see a spam and respond to it before it is cancelled by their server, which can have the effect of reposting the spammer's spam for them; since it is not just a duplicate, this reposted copy will last longer.
In late 2003, the FTC launched a public relations campaign to encourage email users to simply never respond to a spam email -- ever. This campaign stemmed from the tendency of casual email users to reply to spam, in order to complain about the spam and ask the spammer to stop sending spam. This has the effect of alerting spammers to the existence of a person who actually reads spam email, and it has the effect of increasing spam rather than stopping it.
Reporting spam
The majority of ISPs explicitly forbid their users from spamming, and eject from their service users who are found to have spammed. Tracking down a spammer's ISP and reporting the offense often leads to the spammer's service being terminated. Unfortunately, it can be difficult to track down the spammer -- and while there are some online tools to assist, they are not always accurate.
Two such online tools are SpamCop and Network Abuse Clearinghouse. Both provide automated or semi-automated means to report spam to ISPs. Some spam-fighters regard them as inaccurate compared to what an expert in the email system can do; however, most email users are not experts.
Defense against email worms
In the past several years, scores of worm programs have used email systems as a conduit for infection. The worm program transmits itself in an email message, usually as a MIME attachment. In order to infect a computer, the executable worm attachment must be opened. In almost all cases, this means the user must click on the attachment. The worm also requires a software environment compatible with its programming.
Email users can defend against worms in a number of ways, including:
- Avoiding email client software which supports executable attachments. The most frequently-targeted client software for email worms is Microsoft Outlook and Outlook Express, both of which can easily be made to open executable attachments. However, other Windows-based email software is not immune to worms.
- Using an operating system which does not provide an environment compatible with present worms. Essentially all current email worms affect only the Microsoft Windows operating system. They cannot execute on Macintosh, Unix, GNU/Linux, or other operating systems. In some cases, it is conceivable that a worm could be written for one of these systems; however, various security features militate against it.
- Using up-to-date anti-virus software to detect incoming worms and quarantine or delete them before they can take effect.
- Being skeptical of unsolicited email attachments. Since worms and other email-borne malware arrive in this form, some email users simply refuse to open attachments that the sender has not given them advance notice of.
Examination of anti-spam protocols
There are a number of services and software systems that mail sites and users can use to reduce the load of spam on their systems and mailboxes. Some of these depend upon rejecting email from Internet sites known or likely to send spam. Others rely on automatically analyzing the content of email messages and weeding out those which resemble spam. These two approaches are sometimes termed blocking and filtering.
Blocking and filtering each have their advocates and advantages. While both reduce the amount of spam delivered to users' mailboxes, blocking does much more to alleviate the bandwidth cost of spam, since spam can be rejected before the message is transmitted to the recipient's mail server. Filtering tends to be more thorough, since it can examine all the details of a message. Many modern spam filtering systems take advantage of machine learning techniques, which vastly improve their accuracy over manual methods. However, some people find filtering intrusive to privacy, and many mail administrators prefer blocking to deny access to their systems from sites tolerant of spammers.
Spam blocking and filtering techniques
DNSBLs
DNS-based Blackhole Lists, or DNSBLs, are a blocking technique, whereby a site publishes lists of IP addresses via the DNS, in such a way that mail servers can easily be set to reject mail from those addresses. There are literally scores of DNSBLs, each of which reflects different policies: some list sites known to emit spam; others list open mail relays or proxies; others, such as SPEWS, list ISPs known to support spam.
For history and details on DNSBLs, see DNSBL.
Content-based filtering
Until recently, content filtering techniques relied on mail administrators specifying lists of words or regular expressions disallowed in mail messages. Thus, if a site receives spam advertising "herbal Viagra", the administrator might place these words in the filter configuration. The mail server would thence reject any message containing the phrase.
Content based filtering can also filter based on content other than the words and phrases that make up the text of the message. Primarily, this means looking at the headers of the email, the part of the message that contains information about the message, and not the text of the message. Spammers will often spoof headers in order to hide their identities, or to try to make the email look more legitimate than it is; many of these spoofing methods can be detected. Also, spam sending software often produces headers that violate the RFC 2822 standard on how email headers are supposed to be formed.
Disadvantages of this static filtering are threefold: First, it is time-consuming to maintain. Second, it is prone to false positives. Third, these false positives are not equally distributed: manual content filtering is prone to reject legitimate messages on topics related to products advertised in spam. A system administrator who attempts to reject spam messages which advertise mortgage refinancing may easily inadvertently block legitimate mail on the same subject.
Finally, spammers can change the phrases and spellings they use, or employ methods to try to trip up phrase detectors. This means more work for the administrator. However, it also has some advantages for the spam fighter. If the spammer starts spelling "Viagra" as "V1agra" (see leet) or "Via_gra", it makes it harder for the spammer's intended audience to read their messages. If they try to trip up the phrase detector, by, for example, inserting an invisible-to-the-user HTML comment in the middle of a word ("Via<---->gra"), this sleight of hand is itself easily detectable, and is a good indication that the message is spam. And if they send spam that consists entirely of images, so that anti-spam software can't analyze the words and phrases in the message, the fact that it is image only can be detected.
Statistical filtering
Statistical filtering was first proposed in 1998 by Mehran Sahami et al., at the AAAI-98 Workshop on Learning for Text Categorization. A statistical filter is a kind of document classification system, and a number of machine learning researchers have turned their attention to the problem. Statistical filtering was popularized by Paul Graham's influential 2002 article, which proposed the use of naive Bayes classifiers to predict whether messages are spam or not – based on collections of spam and nonspam ("ham") email submitted by users. [1] [2]
Statistical filtering, once set up, requires no maintenance per se: instead, users mark messages as spam or nonspam and the filtering software learns from these judgements. Thus, a statistical filter does not reflect its author's or administrator's biases as to content, but it does reflect the user's biases as to content; a biochemist who is researching Viagra won't have messages containing the word "Viagra" flagged as spam, because "Viagra" will show up often in his or her legitimate messages. It can also respond quickly to changes in spam content, without administrative intervention.
Spammers have attempted to fight statistical filtering by invisibly inserting many random but valid words into their messages, making more likely that the filter will classify the message is neutral; they make the words invisible by giving them a very tiny font, by making the words the same color as the background, or both. However, the countermeasures seem to have been largely ineffective.
Software programs that implement statistical filtering include Bogofilter, the e-mail programs Mozilla and Mozilla Thunderbird, and later revisions of SpamAssassin. Another interesting project is CRM114 which hashes phrases and does bayesian classification on the phrases.
You can also check POPFile [3] that will sort mail in as many category as you want (family, friends, co-worker, spam, whatever) with bayesian filtering.
Checksum-based filtering
Checksum-based filter takes advantage of the fact that, for any individual spammer, all of the messages he or she sends out will be mostly identical, the only differences being web bugs, and when the text of the message contains the recipient's name or email address. Checksum-based filters will strip out everything that might vary between messages, reduces it to a checksum, and compares it to a database which collects the checksums of messages that email recipients consider to be spam (some people have a button on their email client which they can click to nominate a message as being spam); if the checksum is in the database, the message is likely to be spam.
The advantage of this type of filtering is that it lets ordinary users help identify spam, and not just administrators, thus vastly increasing the pool of spam fighters. The disadvantage is that spammers can insert unique invisible gibberish -- known as hashbusters -- into the middle of each of their messages, thus making each message unique and having a different checksum. This leads to an arms race between the developers of the checksum software and the developers of the spam-generating software.
Checksum based filtering methods include:
Protocol extensions
A number of proposals and specifications have been written to extend the SMTP protocol to avoid spam, including:
- Sender Policy Framework (SPF, formerly known as Sender Permitted From)
- Trusted Email Open Standard (TEOS)
- Tripoli protocol
- Domain keys
Ham passwords
Another approach for countering spam is to use a "ham password". Systems that use ham passwords ask senders (at least, strangers) to include in their email a password that demonstrates that the email message is a "ham" (not spam) message. Typically the email address and ham password would be described on a web page, and the ham password would be included in the "subject" line of an email address. Ham passwords are often combined with filtering systems, to counter the risk that a filtering system will accidentally identify a ham message as a spam message.
Messages certified as not being spam
There are several third-party organizations which guarantee that certain messages aren't spam, and have the means to prevent spammers from fraudulently using their system, by fining or suing them, for example. Administrators can use this to let through messages that would otherwise be filtered or blocked as spam, thus reducing the false positive rate.
Organizations that implement such systems include:
Hashcash
Hashcash and similar systems require that a sender perform a calculation that the receiver can later verify. Verification must be much faster than performing the calculation, so that the computation slows down a sender but does not significantly impact a receiver. The point is to slow down machines that send most of spam -- often millions and millions of them. While every user that wants to send email to a moderate number of recipients suffers just a seconds' delay, sending millions of emails would take an unaffordable amount of time.
Heuristic filtering
Heuristic filtering, such as is implemented in the program SpamAssassin, uses some or all of the various tests for spam mentioned above, and assigns a numerical score to each test. Each message is scanned for these patterns, and the applicable scores tallied up. If the total is above a fixed value, the message is rejected or flagged as spam. By ensuring that no single spam test by itself can flag a message as spam, the false positive rate can be greatly reduced. [4]
Tarpits and Honeypots
A tarpit is any server software which intentionally responds pathologically slowly to client commands. A honeypot is a server which attempts to attract attacks. Some mail administrators operate tarpits to impede spammers' attempts at sending messages, and honeypots to detect the activity of spammers. By running a tarpit which appears to be an open mail relay, or which treats acceptable mail normally and known spam slowly, a site can slow down the rate at which spammers can inject messages into the mail facility.
One tarpit design is the teergrube, whose name is simply German for "tarpit." This is an ordinary SMTP server which intentionally responds very slowly to commands. Such a system will bog down SMTP client software, as further commands cannot be sent until the server acknowledges the earlier ones. Several SMTP MTAs, including Postfix, have a teergrube capacity built-in: when confronted with a client session which causes errors such as spam rejections, they will slow down their responding. [5] [6]. A similar approach is taken by TarProxy.
Another design for tarpits directly controls the TCP/IP protocol stack, holding the spammer's network socket open without allowing any traffic over it. By reducing the TCP window size to zero, but continuing to acknowledge packets, the spammer's process may be tied up indefinitely. This design is more difficult to implement than the former. Aside from anti-spam purposes, it has also been used to absorb attacks from network worms. [7]
A third design is simply an imitation MTA which gives the appearance of being an open mail relay. Spammers who probe systems for open relay will find such a host and attempt to send mail through it, wasting their time. Such a system may simply discard the spam attempts, submit them to DNSBLs, or store them for analysis. It may also selectively deliver relay test messages to give a stronger appearance of open relay. SMTP honeypots of this sort have been suggested as a way that end-users can interfere with spammers' activities. [8] [9]
Spammers also abuse open proxies, and open proxy honeypots (proxypots) are also used. [10] Ron Guillmette reported in 2003 that he succeeded in getting over 100 spammer accounts terminated in under 3 months, using his network (of unspecified size) of proxypots.
Unlike most other anti-spam techniques tarpits and honeypots work at the relay (or proxy) level. They work by targeting spammer behavior rather than targeting spam content.
Note also that there is some terminological confusion. Some people refer to spamtraps as honeypots. In this context a spamtrap is an email address created specifically to attract spam. These run at the destination level rather than at the relay or proxy level.
Challenge-response systems
Another method which may be used by internet service providers (or by specialized services) to combat spam is to require unknown senders to pass various tests before their messages are delivered. These strategies are termed challenge-response systems or C/R, and are currently controversial among email programmers and system administrators.
One example of a challenge-response system is a "captcha" test, in which a mail sender is required to view an image containing a word or phrase, and respond with that word or phrase in text. The purpose of this is to ensure that automated systems (incapable of reading the image) cannot transmit email.
Critics of C/R systems have raised several issues regarding their usefulness as an email defense:
- Some kinds of C/R system, such as captchas, discriminate against the disabled. A blind person can send and receive textual email (using a braille terminal, for instance), but cannot see an image and read text from it. A blurry image intended to defeat optical character recognition software may be impossible for sighted but visually-impaired persons.
- Some C/R systems interact badly with mailing list software. If a person subscribed to a mailing list begins to use C/R software, posters to the mailing list may be confronted by large numbers of challenge messages. Many regard these as junk mail equal in annoyance to actual spam. Some C/R systems allow the user to simply "whitelist" mailing lists to which they subscribe -- instructing the C/R software not to challenge their messages.
- C/R systems interact badly with other C/R systems. If two persons both use C/R and one emails the other, the two C/R systems may become trapped in a loop, each challenging the other, neither one willing to deliver the challenge messages -- or the original message. Some C/R systems attempt to avoid this by automatically whitelisting addresses to which mail is sent -- though this may not work when the recipient has their mail forwarded to a different address. Disparate C/R systems can also "cooperate" with one another by marking challenges as "bulk" priority, and not challenging messages sent as "bulk".
- Disseminating an ordinary email address that is protected by a C/R system will result in those who send mail to that address having their messages challenged unless the sender has been previously whitelisted. Many C/R critics consider it rude to give someone your email address, then require them to play along with C/R software before they can send you mail. Some C/R systems allow for the creation of "tagged" addresses which allow messages to be accepted without being challenged as long as the message meets certain requirements. For example, TMDA can create "tagged" addresses that permit mail sent from a particular address, mail that contains a certain "keyword" or mail that is sent within a pre-set length of time, such as a day, a month, or a year. For example, a time-limited tagged address can be created for a short amount of time to allow correspondence related to an online order, such as shipping notices, but expire after a while to disallow future marketing e-mail from the online store without the store responding to message challenges.
- Spammers and viruses send forged messages -- email with other people's addresses in the From headers. Forged mail from valid addresses is becoming increasingly common as Call Back Verification is increasingly used to detect spam. A C/R system challenging a forged message will send its challenge to the uninvolved person whose address the spammer put in the spam. This effectively doubles the amount of unwanted email being distributed. Indeed, some argue that using a C/R system means sending unsolicited, bulk email (that is, spam) to all those people whose addresses are forged in spam.
Nevertheless, some users report C/R systems are extremely effective at eliminating spam, while others find that they have to review their challenged mail looking for wanted mail for which the sender has not responded to the challenge.
External links
- Coalition Against Unsolicited Commercial Email
- Spamfo.co.uk Covers the latest news on junk email, scams, fraud, legal aspects and reviews of software and services out there to help you reduce the 'spam'.
- California lawyer who sues spammers
- Address Munging FAQ: Spam-Blocking Your Email Address
- Challenge/Response at the SMTP Level: It may be possible to implement a challenge/response SPAM protection system using custom SMTP Delivery Status Notifications (DSNs) that include an HTML confirmation hyperlink.
- Countering Spam by Using Ham Passwords (Email Passwords)
- "Countering Spam with Ham-Authenticated Email and the Guarded Email Protocol" describes a challenge-response system
- Spam Links
Tools to reduce the impact of spam
- Mozilla and the stand-alone Thunderbird: e-mail programs ("clients") with a Bayesian filter, i.e. a filter that keeps learning and is therefore able to adapt to the constantly changing forms of spam
- Disposable e-mail accounts, various types for registering on web sites etc.
- E4ward.com You can use your own domain name or e4ward.com for your aliases
- Sneakemail original disposable email address service
- spamgourmet expire after a number of emails, but can be reset or ignored for some senders
- jetable expiring in 1-8 days
- mailinator instant email accounts, self-destructing email after you read it.
- SpamDay allows you to create forward addresses and webmail addresses, valid for 24 hours. Support for RSS feed!
- Making it harder to harvest e-mail addresses
- hide email addresses on web sites from harvesting tools
- Tools to filter out spam
- SpamPal free (really) Windows filter with lots of filtering methods. Client or server-side filtering
- Spamihilator Free antispam program with a good-working bayesan filter and a lot of other filters plugins. It works with almost all email program.
- Bogofilter Bayesian filter
- Spambayes Bayesian filter especially designed for use with Microsoft Outlook
- iMailLight smart plugin for Outlook, based on Bayesian filtering
- SpamAssassin heuristic filter
- TMDA, a challenge/response system
- Checksum-based filter:
- Tools to filter out viruses
- Contact forms that hide email addresses
- Contact Form - Open source (GPL) - Requires a webserver, Perl, and Sendmail
- form2mail - Open source (GPL) - Requires a webserver, PHP, MySQL, and SendMail
- MailWebForm Open source (GPL)- Requires Java, Java Servlets, and Java Mail
- SCForm - Open source (GPL) - Requires a websever, PHP and Sendmail
- Services which guarantee messages as not being spam:
- Protocols for reducing spam
- Spam-proofing the mail system; Linux Weekly News; December 17, 2003.