The Apache SpamAssassin project has released version 4.0.0 of its renowned open-source anti-spam platform with numerous tweaks and bug fixes and improved classification, performance and handling of text in international languages. This release is an important milestone in the open source world, as Apache SpamAssassin has emerged as a testament to the security benefits of leveraging the open-source development model to combat the universal threat of spam email over the past two decades.

Guardian Digital has been using the Apache SpamAssassin framework as a component of its multi-layered business email security solution, EnGarde Cloud Email Security, from the beginning, and will be utilizing the updates and improvements in the 4.0.0 release to provide its clients with enhanced email protection. The company stands in support of both the Apache SpamAssassin project’s core values of transparency, collaboration and community involvement, as well as its anti-spam product. Guardian Digital spoke with Chair of the Apache SpamAssassin Project Management Committee Sidney Markowitz and Apache SpamAssassin PMC member Kevin A. McGrail to gain firsthand insight into the significance of this release and the key upgrades and improvements Apache SpamAssassin 4.0.0 offers.

Apache SpamAssassin: A Flexible Open-Source Scoring Framework with Enterprise Functionality

Apache SpamAssassinApache SpamAssassin provides a secure, reliable framework upon which companies can build scalable, adaptive spam filtering and email security solutions. Apache SpamAssassin does not simply block or accept mail; it scrutinizes it using thousands of different characteristics to make a judgment on its potential risk. The program operates on the principle that there is no single definitive mechanism to identify spam. Rather, it has a modular plugin architecture that supports a wide range of independent operations that can be correlated to the spam/ham classification.

ISPs and email security providers worldwide recognize Apache SpamAssassin’s innovation and effectiveness, and have incorporated the framework into their security services and solutions. For medium-sized organizations, ISPs and MSPs, Apache SpamAssassin leverages Machine Learning to tune the scoring of the human-written rules that are used to classify email. Markowitz explains, “This is where Apache SpamAssassin proves its worth. An MSP has free access to the software. Since much of the spam-filtering capability is defined by the rules, the MSP can tune the software to their specific needs by tweaking rule scores or adding their own custom rules without having to touch the underlying code.” The Apache SpamAssassin project accepts anonymized data compiled from spam and ham emails by contributors such as MSPs. This data is used in a Machine Learning algorithm run daily by the project to calculate optimum rule scores. ISPs and MSPs that use SpamAssassin run a daily process to download updated rules and scores optimized for the latest spam and ham submissions. Today, a well-implemented system powered by SpamAssassin can easily achieve categorization results meeting and exceeding industry standards of 99.9% correct classification. 

Guardian Digital CEO Dave Wreski advises, “It is important to note that while Apache SpamAssassin is an excellent piece of security software, defense in depth is critical to any security strategy, and the Apache SpamAssassin framework should be implemented as part of a comprehensive spam filtering and email security solution for maximal spam protection against the serious aggravation, disruption, and security threat that spam email poses to all organizations.” He elaborates, “The use of the SPF, DKIM, and DMARC email authentication protocols are another effective anti-spam method that should be used to the fullest to verify the legitimacy of email communications.”

The History of Apache SpamAssassin: How a Revolutionary Idea Evolved into an Open Source Success Story

​​Apache SpamAssassin was created by Justin Mason, a software engineer who had maintained a number of patches against an earlier program named filter.plx by Mark Jeftovic. Mason rewrote all of Jeftovic’s code and uploaded the rewritten codebase to SourceForge on April 20, 2001. In the summer of 2004, Spamassassin became an Apache Software Foundation project and was officially renamed Apache SpamAssassin. The History of Apache SpamAssassin: How a Revolutionary Idea Evolved into an Open Source Success Story

Before Apache SpamAssassin, there were no effective general purpose tools for fighting spam. ISPs struggled with various methods for blocking email from unknown IP addresses from certain countries, filtering mail with certain words in the subject or body, relying on whatever seemed to work. Spammers continually changed up their methods to counter the anti-spam measures being used. Every ISP did the best they could, and most found an acceptable balance for their customers. With the birth of Apache SpamAssassin, this was about to change.

Support and critique provided by the open source community fostered rapid innovation and notable improvements during the Apache SpamAssassin project’s early years. While Apache SpamAssassin has evolved significantly over the past two decades, the platform still leverages the scoring and rule framework that have made it successful and future-proof. 

The ability to handle the evolution in spam protection over the years without re-engineering is why Apache SpamAssassin has survived this long. Spam protection has evolved into a generic term, blocking everything from unsolicited advertisements to dangerous malware to phishing emails and much more. At its core, SpamAssassin is a robust framework for scoring email so it can be categorized. If someone has a good idea for a test or new threat data that helps classify, it's relatively simple to add it to SpamAssassin and weight the score appropriately. 

Apache SpamAssassin is the epitome of an open source success story: a real-world example of expert engineers and developers volunteering their time to combat the growing email problem. The team has demonstrated innovation, leadership and perseverance in the face of both adversity and success. 

Open-source development has had a significant influence on Apache SpamAssassin’s ability to provide companies with a flexible, scalable and secure framework for filtering spam. Unlike proprietary anti-spam platforms, Apache SpamAssassin’s open-source code is available at no charge. In addition, the scoring framework that Apache SpamAssassin offers is supported by a knowledgeable and passionate community of mail server experts who assist in creating new rules and in developing new ideas for improving the platform. Markowitz summarizes the advantages of Apache SpamAssassin’s use of open-source development for anti-spam solutions, “With Open Source you know exactly what you are getting and have the access to limit risk. Apache SpamAssassin is always available and the source code is there for anyone to modify. With the Apache open source development model, there is a transparent view of the development process and an open process for any qualified interested person to participate."

What New & Improved in Apache SpamAssassin 4.0.0?

Apache SpamAssassin 4.0.0 contains numerous tweaks, functional patches and bug fixes over the past releases. Most notably, it includes major changes that significantly improve classification, performance and the handling of text in international language, and will help in blocking new spam campaigns.
What New & Improved in Apache SpamAssassin 4.0.0?

The community will benefit greatly from this release because a significant amount of new code has been specifically developed to address new spam techniques, and new rules can be easily written to better match new spam types that will arrive in the future. By using external ocr software, the new version of Apache SpamAssassin can detect spam messages that are hidden inside images or PDF attachments, a very common technique used to evade anti-spam services.

The Apache SpamAssassin 4.0.0 release includes a major overhaul of the Apache SpamAssassin code (written in perl) to make use of the new capability of perl to process Unicode characters directly. Unicode is a standard for representation of characters that makes it possible to use the thousands of letters and symbols found in all the languages of the world. Before Unicode support was added to perl 5.12 and finalized in perl 5.14, Apache SpamAssassin had quite complex error-prone code and rule definitions to process emails containing internationalized domain names (IDN) and international Unicode characters in their text. 

With the release of 4.0.0, the Apache SpamAssassin community can finally make use of the new support for internationalized characters. As one member of the Apache SpamAssassin developer mailing list commented, "It makes writing rules to match properly decoded non-ASCII text (Subject, display name, body, ...) so much easier simply in UTF-8, compared to previous hacking with hex-coded bytes!" 

Markowitz states, “Apache SpamAssassin 4.0.0 has now been thoroughly tested in production systems. We strongly recommend upgrading as soon as possible. We’d also like to thank the committers, contributors, rule testers, mass checkers, and code testers who have made this release possible, as well as cPanel for their continued support of new features.” 

Notable changes and new features introduced in Apache SpamAssassin 4.0.0 include:

New Plugins

There are three new plugins added with this release:

  • Mail::SpamAssassin::Plugin::ExtractText: This plugin uses external tools to extract text from message parts, and then sets the text as the rendered part. All SpamAssassin rules that apply to the rendered part will run on the extracted text as well.
  • Mail::SpamAssassin::Plugin::DMARC: This plugin checks if emails match DMARC policy after parsing DKIM and SPF results.
  • Mail::SpamAssassin::Plugin::DecodeShortURLs: This plugin looks for URLs shortened by a list of URL shortening services. Upon finding a matching URL, the plugin will send a HTTP request to the shortening service and retrieve the Location-header which points to the actual shortened URL. It then adds this URL to the list of URIs extracted by SpamAssassin which can then be accessed by uri rules and plugins such as URIDNSBL.

Email Security DMARCNew Configuration Options

  • All rules, functions, command line options and modules that contain "whitelist" or "blacklist" have been renamed to "welcomelist" and "blocklist" terms. Old options will continue to work for backwards compatibility until at least the Apache SpamAssassin version 4.1.0 release.
  • New tflag "nolog" added to hide info coming from rules in SpamAssassin reports
  • New dns_options "nov4" and "nov6" added.  
  • Razor2 razor_fork option added. 
  • Pyzor pyzor_fork option added. 
  • urirhsbl and urirhssub rules now support "notrim" tflag, which forces querying the full hostname, instead of trimmed domain.
  • report_charset now defaults to UTF-8 which may change the rendering of SpamAssassin reports.

Internal Changes

  • Meta rules no longer use priority values, they are evaluated dynamically when the rules they depend on are finished.
  • DNS and other asynchronous lookups like DCC or Razor2 plugins are now launched when priority -100 is reached. This allows short circuiting at lower priority without sending unneeded DNS queries.
  • New internal Mail::SpamAssassin::GeoDB module supporting RelayCountry and URILocalBL plugins provides a unified interface to Geographic IP modules.
  • Bayes and TxRep Message-ID tracking now uses a different hashing method.

Optimizations

Apache SpamAssassin 4.0.0 features numerous improvements, new rule types, and internal native handling of messages in international languages. These three key optimizations will improve the efficiency of Apache SpamAssassin: 

  • DNS queries are now done asynchronously for overall speed improvements.  
  • DCC checks can now use dccifd asynchronously for improved throughput.
  • Pyzor and Razor fork use separate processes done asynchronously for increased throughput.

Other Changes & Fixes

Support for international text such as UTF-8 rules has been completed and significantly improved to include native UTF-8 processing.

Apache SpamAssassin PMC member Kevin A. McGrail reflects on the significance of the 4.0.0 release, “The Apache SpamAssassin project is active and releases rule updates constantly. However, SpamAssassin 4.0.0 does include major changes to the SpamAssassin core that significantly improves the handling of text in international language. It also includes years of work that improve the categorization and performance of SpamAssassin overall.  And with this release, our entire community can now focus on the 4.0 branch of the project.”

Next Steps

Markowitz reflects on the release of Apache SpamAssassin 4.0.0, “I’m proud that we have a stable and mature project that still helps people every day!” At Guardian Digital, we share this pride. Wreski proclaims, “We are proud to be using Apache SpamAssassin as part of our multi-layered approach to keeping spam out of our clients’ inboxes and safeguarding them against the latest and most dangerous email threats.”  

Are you ready to experience the improved security and performance that Apache SpamAssassin 4.0.0 offers? You can download or upgrade to Apache SpamAssassin 4.0.0 here.

After you've upgraded to Apache SpamAssassin 4.0, make sure you use the free KAM ruleset. They've been publishing it for more than 18 years and installation is simple. Find out more at The McGrail Foundation website.

Must Read Blog Posts

Latest Blog Articles

Recommended Reading