You're now viewing all of my posts relating to Email. Enjoy!
Better Spam Filtering
July 23, 2007
I'm continually on the holy quest for zero-spam in my inbox. Alas, spam filtering techniques age and die quickly on this great Internets upon which we surf. Back in the good old days I got by with just Bogofilter's Bayesian spam learning and was amazed at the spam reduction. However, times have changed and spam is getting increasingly difficult to recognize through Bayesian analysis. I've started using a combination of SpamAssassin, Razor, Pyzor, DCC, TextCat, Bogofilter and SPF to get this accomplished.
It's really pretty simple to get up and running on an Ubuntu box. In this tutorial I will assume that you've got some administration history under your belt. This is a quick and dirty run-down of how to get these services configured.
User Setup
For our users' directories, we will have to run the following commands to setup the directory tree:
maildirmake /home/USERNAME/.maildir maildirmake /home/USERNAME/.maildir/.Spam
This will create the root mail directory as well as a subdirectory for Spam storage.
Installing SpamAssassin and Bogofilter
Before we go any further, we should start installing SpamAssassin and Bogofilter. What good is configuring non-existent software?
apt-get install spamassassin bogofilter
Then you'll need access to all of the other associated joys for this packaging:
apt-get install razor pyzor dcc-client libspf2-2Loe, you have Spamassasin fully installed. It'll just require a little finagling in the conf files (/etc/spamassassin/) to get an acceptable setup.
In /etc/spamassassin/local.cf I find the default required "spamicity" level to be entirely too low. I like a required_score in between 2.5 and 4. If you go any higher, then you'll never get spam filtered well. For TextCat to work nicely, we'll also throw in the following:
ok_languages en
ok_locales en
This will ensure that none of that Russian spam gets through. Obviously omit this if you are Russian.
v310.pre
After making the changes listed above, you'll also have to edit v310.pre to make sure that it uses some of our more advanced filtering techniques. Make sure that the following lines are uncommented in your v310.pre.
loadplugin Mail::SpamAssassin::Plugin::DCC loadplugin Mail::SpamAssassin::Plugin::Pyzor loadplugin Mail::SpamAssassin::Plugin::Razor2 loadplugin Mail::SpamAssassin::Plugin::SpamCop loadplugin Mail::SpamAssassin::Plugin::TextCat
/etc/procmailrc
That'll do it for SpamAssassin setup. Now, we get to move on to installing and configuring procmail to our liking. Here's an example config that I like.
LOGFILE=/var/log/procmail
MAILDIR=$HOME/.maildir/
DEFAULT=$HOME/.maildir/
SPAMDIR=$MAILDIR/.Spam/new
:0fw* < 256000
| /usr/local/bin/spamassassin
:0:* ^X-Spam-Status: Yes
$SPAMDIR
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
$SPAMDIR
:0fw
| bogofilter -e -p
:0e
{ EXITCODE=75 HOST }
# file the mail to spam-bogofilter if it's spam.
:0:* ^X-Bogosity: Yes, tests=bogofilter
$SPAMDIR
~./fetchmailrc
Now we move on to the individual users' fetchmailrc's. Put this directly in their home directory and edit accordingly.
set logfile /home/LOCALUSER/.maildir/fetchmail.log set no bouncemail poll MAILSERVER protocol pop3 username "USERNAME" password "PASSWORD" fetchall expunge 5 pass8bits is LOCALUSERNAME mda "/usr/bin/procmail -d %s"
learn_spam.rb
So, you've got bogofilter installed from way back at the beginning of this tutorial but you're wondering how to use it. Well, bogofilter has to be trained based on existing spam and ham corpuses. The way we'll be setting up our mailboxes is to have everything stored in a root maildir. Your Spam will be both placed in and learned from a folder called, obviously, Spam. The following script should be ran nightly by every used on the system. It's not terribly efficient, but for small users-bases its perfectly fine. Apologies for the poor formatting, I'll look at this later.
require 'find'
$bogofilter = "/usr/local/bin/bogofilter"
$username = `whoami`
$username.gsub! "\n", ""
$maildir = "/home/#{$username}/.maildir/"
$spamdir = "#{$maildir}/.Spam"
class BogoFilter
def check_paths
[$maildir,$spamdir].each do |dir|
raise "Could not locate #{dir}!" unless File.exists? dir
end
end
def clear_database
puts `rm -r /home/#{$username}/.bogofilter`
end
def learn_spam
Find.find($spamdir) do |spam|
next unless File.file? spam
`cat \"#{spam}\" | bogofilter -s`
end
end
def learn_ham
Find.find($maildir) do |ham|
next if ham.include? "Spam"
next unless File.file? ham
`cat \"#{ham}\" | bogofilter -n`
end
end
end
bf = BogoFilter.new
bf.check_paths
bf.clear_database
puts "Learning spam..."
bf.learn_spam
puts "Learning ham..."
bf.learn_ham
Fin.
Well, that should do it for you. I'll update this article with more information as I get time. When testing your setup, be sure to use "fetchmail -k" to force fetchmail to not delete messages after fetching - just to ensure that the messages are delivered correctly and not shoved off into the ether. If you have any suggestions, please leave them in the comments section and I'll be sure to include them here.
Permalink |
Add to delicious |
2 Comments
| Tagged: Spam, Email, Internet
Zeldman On HTML Email
June 08, 2007
So, Mr. Zeldman recently wrote about how HTML e-mail sucks - and I totally agree with him. In my day to day work life, its amazing how many "professionals" think that 19Pt. Comic Sans colored bright pink is acceptable. My favorite quote?
"Designed" e-mail is just a slightly more polished version of those messages your uncle sends you. Your uncle thinks 18pt bright red Comic Sans looks great, so he sends e-mail messages formatted that way. You cluck your tongue, or sigh, or run de-formatting scripts on every message you receive from him. When your uncle is the "designer," you "get" why styled mail sucks. It sucks just as much when you design it, even if it looks better than your uncle's work in the two e-mail programs that support it correctly.So perfect. Now, you may view my epileptic poodle of bad design. Bonus points for including it in an HTML signature back to one of these "designers".
Permalink |
Add to delicious |
4 Comments
| Tagged: HTML, Email, Internet
