Prevention Is the Best Medicine: Creating an Anti-spam Strategy

by Carl Weinschenk

There's no denying spam is a pervasive and tricky problem that continues to proliferate. From an enterprise perspective, the strongest cure for spam is prevention. This tutorial outlines a variety of spam prevention tactics -- from specific products to general techniques -- that can be taken on the mail server, desktop, ISP, or network edge level.

The tension is growing by the month. Each time one side makes a move, the other counters. These aren't earth shattering frontal attacks and counter thrusts, however. They are subtle and incremental, based on a wary knowledge of how a clever enemy tries to outsmart its adversary.

No, this isn't Iraq vs. the United Nations' weapons inspectors. It's spam vs. anti-spam forces.

Spam, which has been a background nuisance for years, is exploding. And based on the studies and reports released on a near daily basis, it's close to impossible to ignore.

Anti-spam software vendor Brightmail is one such company that tracks spam. It follows the proliferation of attacks on its "Probe Network," which it says has a statistical reach of 200 million e-mail boxes. According to Brightmail's findings, spam on the network nearly tripled -- from about 1.97 million to 5.5 million spam messages -- between November 2001 and November 2002. This increase is even more pronounced when one considers that as recently as June 2001 the network logged fewer than 1 million spam messages. That's more than a five-fold increase in less than 18 months. Other vendors and analysts provide equally stark statistics.

Unique Spam Attacks June 2001 to November 2002
(as Measured by Brightmail's Probe Network


Fighting spam is tricky for a number of reasons. First, spammers are exceedingly clever. Second, and even more critically, spam fighters have a double task: Not only must they identify spam, but they must also do so without harming legitimate mail. "Many products are not truly effective," Daniel Silver, the director of marketing for Lyris Technologies, told ServerWatch. "The other issue is false positives, which is grabbing things that shouldn't be grabbed."

There's no denying that spam is a pervasive and tricky problem, so it's no surprise that spam prevention tactics are as varied as the problem itself.

Spam prevention efforts can be undertaken by an outsourced service or tackled in-house. They can occur at the desktop or be loaded into mail servers or machines linked to mail servers at the gateway or ISP.

To illustrate this, we have highlighted anti-spam solutions from a variety of vendors. The list of companies and solution types noted below is by no means complete. Rather, it represents a sample of companies, and types of companies, involved in the spam wars, as well as the solutions they are are currently offering.

Spam War Defense Arsenal Options
(in Alphabetical Order)
ProductType of CompanyFeatured Anti-spam
BrightmailBrightmail 4.0Anti-spam softwareF
CloudmarkAuthorityAnti-spam softwareF
CMSConnect PraetorE-mail infrastructureF
GordanoAnti-SpamE-mail infrastructureBL, F, KS
Lotus NotesMail ServerBusiness softwareBL
LyrisMailShieldE-mail infrastructureF
MicrosoftExchangeGeneral softwareBL
MirapointMirapointE-mail infrastructureF, H, RBLs
PostiniPerimeter manager E-mail security serviceH, WL/BL
SendmailMailstream ManagerE-mail infrastructureF, H, RBL, WL/BL,
Stalker SoftwareCommuniGate ProE-mail infrastructureF, KS, RBL, WL/BL
VircomVOP modusGateAnti-spam softwareF
Key to Anti-spam Approach Abbreviations: BL = Black List, F = Filtering, H = Heuristics, KS = Keyword Search, RBL = Real-Time Blackhole Lists, WL/BL = White List/Black List

The most rudimentary and common approach to spam prevention is black list/white lists. As the name implies, this involves continuously updating huge lists of approved and disapproved domain names. Analysts say that this approach is labor-intensive and easily evaded by the spammer changing the originating domain of the spam.

Microsoft Exchange and IBM's Lotus Notes offer black list/white lists. Free lists are available on the Web at www.mail-abuse.org, www.dsbl.org and elsewhere. "By including these they are able to provide some level of spam filtering and can check the little box in the checklist saying that they provide spam filtering," says Marten Nelson, a research analyst with Ferris Research.

This approach, one expert says, catches about 80 percent of spam, which may be enough for most enterprises. The marketplace is still determining whether the radical increase in spam will entice e-mail server vendors and their enterprise and ISP clients to implement advanced solutions. The key question is one of economics: At what point do the network resources and manpower waste caused by spam justify the time, money, and "mind share" required to implement the advanced solutions?

The spam-fighting tools currently on the market that are offered by mail server vendors as well as companies specializing in anti-spam products are proliferating. Each technique has its inherent drawbacks, and many vendors feature one approach while incorporating others. Whether they do this for technical or marketing reasons is debatable. "The means to address spam vary, and range all over the map," says Lih-Tah Wong, president of Computer Mail Services, which sells Praetor rules-based spam filtering software.

One intermediate approach is simple keyword searches. Obviously, finding individual words or phrases doesn't determine whether a message is spam or not, so keyword searches must be combined with some other technique to have significant impact.

Spammers often respond to keyword searching with a technique called HTML cloaking. This involves replacing characters with their unique ASCII values. Ultimately, the computer displays the ASCII symbols as the intended letter. At the point that the anti-spam software is passed, however, the full word is not present. Consequently, the message isn't deemed objectionable.

Users and managers that don't favor keyword searches argue that HTML cloaking allows spam to pass. Conversely, those backing the approach say that the presence of HTML cloaking in and of itself is a potent sign of spam.

Keyword searches are generally combined with heuristic approaches, which use various methods to divine the context within which a word or phrase is used, to determine if the message is likely spam or not.

One heuristic approach is sieve filtering. This approach gives system administrators and others the ability to write scripts based on the characteristics of newly arriving spam that filter out subsequent e-mails following the same pattern. The downside to this approach is that it demands human intervention. "Those are the systems in which the feature set -- whether or not the message is spam -- is human chosen," says Bill Yerazunis, author of a spam filter called CMR114.

Another heuristic filtering approach is Bayesian analysis. With Bayesian analysis, a tremendous amount of spam and an equal number of legitimate e-mail undergo sophisticated statistical analysis. A comparison of the results creates a baseline threshold against which newly arriving messages are judged.

Paul Graham, a programmer influential in this type of Bayesian filtering, thinks it is the answer. He noted that since he published an article on the topic in August, more than 20 open source Bayesian filters have been written. He says CRM114 (named after the radio security device in the bomber in the movie 1964 movie "Dr. Strangelove") is more than 99.95 percent accurate. "The best and the most efficient is the open source stuff," he says.

A downside of heuristics, proponents of other approaches maintain, is that they can have trouble reacting to subtle changes by spammers.

Another filtering methodology is the checksum approach. The checksum approach focuses on the ASCII values of the characters that comprise a message. Totaling the specific combination of characters in a message is virtually guaranteed to create a unique number. So if two messages have the same total ASCII value -- i.e., the same checksum -- they are almost certain to be identical and, thus, are likely to be spam.

The downside to this approach is that spammers can program subtle changes, for example the insertion of a random character string in each message, that render the checksums different.

While it is unclear which approach will dominate, it is clear that none will be effective if not backed by large human or automated networks. Brightmail's Probe Network is one example. Another example is Cloudmark-owned SpamNet (formerly Razor), which is a peer-to-peer network of more than 250,000 participants who identify, collect, and deliver spam to a central location for analysis, says Tricia Fahey, the company's vice president of communications. The analysis includes thousands of characteristics focused on headers, message lines, and message bodies.

Yet another mass approach is one advocated by Vircom that links almost 60 ISPs writing sieve rules used by them with between 200 and 300 other ISPs representing as many as 35 million e-mail boxes, marketing product manager Daniel Roy said.

Before we conclude, we must note that there seems to be a fundamental difference between spammers and "crackers" -- those who write and circulate viruses, Trojans, and other debilitating software. A cracker's primary objective is the act of disrupting electronic communications, with no obvious financial motive, whereas a spammer (from his or her point of view) is an entrepreneur taking advantage of an unbelievably inexpensive and huge distribution network. As IT managers, the key is to make their return on investment unacceptable.

Unfortunately, that's something not likely to happen soon. "Spam is an ongoing war," says Sendmail spokesman Todd Blaschka. "The latest surge in spam and vendor response represents the latest battle."

This article was originally published on Thursday Jan 9th 2003
Mobile Site | Full Site