You're now viewing all of my posts relating to Email. Enjoy!

Better Spam Filtering

I'm continually on the holy quest for zero-spam in my inbox. Alas, spam filtering techniques age and die quickly on this great Internets upon which we surf. Back in the good old days I got by with just Bogofilter's Bayesian spam learning and was amazed at the spam reduction. However, times have changed and spam is getting increasingly difficult to recognize through Bayesian analysis. I've started using a combination of SpamAssassin, Razor, Pyzor, DCC, TextCat, Bogofilter and SPF to get this accomplished.

It's really pretty simple to get up and running on an Ubuntu box. In this tutorial I will assume that you've got some administration history under your belt. This is a quick and dirty run-down of how to get these services configured.

User Setup

For our users' directories, we will have to run the following commands to setup the directory tree:

maildirmake /home/USERNAME/.maildir
maildirmake /home/USERNAME/.maildir/.Spam

This will create the root mail directory as well as a subdirectory for Spam storage.

Installing SpamAssassin and Bogofilter

Before we go any further, we should start installing SpamAssassin and Bogofilter. What good is configuring non-existent software?

apt-get install spamassassin bogofilter

Then you'll need access to all of the other associated joys for this packaging:

apt-get install razor pyzor dcc-client libspf2-2

Loe, you have Spamassasin fully installed. It'll just require a little finagling in the conf files (/etc/spamassassin/) to get an acceptable setup.

In /etc/spamassassin/local.cf I find the default required "spamicity" level to be entirely too low. I like a required_score in between 2.5 and 4. If you go any higher, then you'll never get spam filtered well. For TextCat to work nicely, we'll also throw in the following:

ok_languages en
ok_locales en

This will ensure that none of that Russian spam gets through. Obviously omit this if you are Russian.

v310.pre

After making the changes listed above, you'll also have to edit v310.pre to make sure that it uses some of our more advanced filtering techniques. Make sure that the following lines are uncommented in your v310.pre.

loadplugin Mail::SpamAssassin::Plugin::DCC
loadplugin Mail::SpamAssassin::Plugin::Pyzor
loadplugin Mail::SpamAssassin::Plugin::Razor2
loadplugin Mail::SpamAssassin::Plugin::SpamCop
loadplugin Mail::SpamAssassin::Plugin::TextCat

/etc/procmailrc

That'll do it for SpamAssassin setup. Now, we get to move on to installing and configuring procmail to our liking. Here's an example config that I like.

LOGFILE=/var/log/procmail
MAILDIR=$HOME/.maildir/
DEFAULT=$HOME/.maildir/
SPAMDIR=$MAILDIR/.Spam/new

:0fw* < 256000
| /usr/local/bin/spamassassin

:0:* ^X-Spam-Status: Yes
$SPAMDIR


:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
$SPAMDIR



:0fw
| bogofilter -e -p
:0e
{ EXITCODE=75 HOST }
# file the mail to spam-bogofilter if it's spam.
:0:* ^X-Bogosity: Yes, tests=bogofilter
$SPAMDIR


~./fetchmailrc

Now we move on to the individual users' fetchmailrc's. Put this directly in their home directory and edit accordingly.


set logfile /home/LOCALUSER/.maildir/fetchmail.log
set no bouncemail

poll MAILSERVER
  protocol pop3
  username "USERNAME"
  password "PASSWORD"
  fetchall
  expunge 5
  pass8bits
  is LOCALUSERNAME
  mda "/usr/bin/procmail -d %s"

learn_spam.rb

So, you've got bogofilter installed from way back at the beginning of this tutorial but you're wondering how to use it. Well, bogofilter has to be trained based on existing spam and ham corpuses. The way we'll be setting up our mailboxes is to have everything stored in a root maildir. Your Spam will be both placed in and learned from a folder called, obviously, Spam. The following script should be ran nightly by every used on the system. It's not terribly efficient, but for small users-bases its perfectly fine. Apologies for the poor formatting, I'll look at this later.


require 'find'

$bogofilter = "/usr/local/bin/bogofilter"
$username = `whoami`
$username.gsub! "\n", ""
$maildir = "/home/#{$username}/.maildir/"
$spamdir = "#{$maildir}/.Spam"

class BogoFilter
  def check_paths
    [$maildir,$spamdir].each do |dir|
      raise "Could not locate #{dir}!" unless File.exists? dir
    end



  end


  def clear_database
    puts `rm -r /home/#{$username}/.bogofilter`
  end


  def learn_spam
    Find.find($spamdir) do |spam|
      next unless File.file? spam
      `cat \"#{spam}\" | bogofilter -s`
    end



  end


  def learn_ham
    Find.find($maildir) do |ham|
      next if ham.include? "Spam"
      next unless File.file? ham
      `cat \"#{ham}\" | bogofilter -n`
    end



  end


end
bf = BogoFilter.new
bf.check_paths
bf.clear_database
puts "Learning spam..."
bf.learn_spam
puts "Learning ham..."
bf.learn_ham

Fin.

Well, that should do it for you. I'll update this article with more information as I get time. When testing your setup, be sure to use "fetchmail -k" to force fetchmail to not delete messages after fetching - just to ensure that the messages are delivered correctly and not shoved off into the ether. If you have any suggestions, please leave them in the comments section and I'll be sure to include them here.


Zeldman On HTML Email

So, Mr. Zeldman recently wrote about how HTML e-mail sucks - and I totally agree with him. In my day to day work life, its amazing how many "professionals" think that 19Pt. Comic Sans colored bright pink is acceptable. My favorite quote?

"Designed" e-mail is just a slightly more polished version of those messages your uncle sends you. Your uncle thinks 18pt bright red Comic Sans looks great, so he sends e-mail messages formatted that way. You cluck your tongue, or sigh, or run de-formatting scripts on every message you receive from him. When your uncle is the "designer," you "get" why styled mail sucks. It sucks just as much when you design it, even if it looks better than your uncle's work in the two e-mail programs that support it correctly.
So perfect. Now, you may view my epileptic poodle of bad design. Bonus points for including it in an HTML signature back to one of these "designers".