jason schaefer . com

"arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say."

Category: Content Control

  • SpamAssassin training and spam cleanup script

    Spam is a constant battle as it is ever changing and always creeping into your Inbox. Spam wrangling is only effective with proper training, SpamAssassin does a decent job out-of-the-box but needs users input to truly be effective. This script will run SpamAssassin’s built in sa-learn tool against users known spam and known ham.

    With my setup — A spam message is received (postfix) and identified as spam (spamassassin), it will be moved to the users Junk directory per the Sieve (dovecot) rule at the bottom of this post. If a spam message is received and not matched as such, it will be delivered to the Inbox. The user will identify it and manually move it to the Junk directory. I like to configure Thunderbird to “Move new junk messages to: “Junk” folder on: … in Account Settings.” Now a user marks a message as Junk, it automatically gets moved.

    When this script runs it will mark messages in the Inbox as ham and messages in the Junk directory as spam. The command “sa-learn –sync” adds these results to the Bayes database that SpamAssassin consults when determining a spam score for each received message. This database is optionally backed up in the event a mistake was made and you need to revert back to previous versions. The script logs to /var/log/train-mail.log, information about how much spam and ham is being added, how many total messages have been processed and stats on the auto clean feature built into sa-learn can be gleaned. Additionally, I setup a spam cleanup using find and rm.

    If sa-learn scans a mail as ham when it is spam, or vice versa, simply move the messages to the correct directory (Inbox=ham/Junk=spam), and the mistake will be corrected on the next run. SpamAssassin will automatically ‘forget’ the previous indications.

    Its important to have an equal part ham to spam. As a result I run this script daily in an effort to capture users ham before they delete it or sort it into sub folders. Another thing to be aware of is that typically you should aim to train with at least 1000 messages of spam, and 1000 ham messages, if possible. More is better, but anything over about 5000 messages does not improve accuracy significantly in SpamAssassins tests.

    Obviously a lot of these options are site/user specific. This is a good foundation to use as-is or build from.

    #!/bin/bash
    
    #specify one or more users, space padded [user=(user1 user2 user3)] or empty [user=()] to include all users. All users is considered uid ≥ 1000.
    user=()
    
    #After how many days should Spam be deleted?
    cleanafter=30
    
    #backup path, comment out to disable backups
    bk=/home/backup/sa-learn_bayes_`date +%F`.backup
    
    log=/var/log/train-mail.log
    #log=/dev/stdout
    
    echo -e "\n`date +%c`"  >> $log 2>&1
    
    if [ -z ${user[@]} ]; then
    echo user is empty, using all users from system
    user=(`awk -F':' '$3 >= 1000 && $3 < 65534' /etc/passwd |awk -F':' '{print $1}'`)
    fi
    
    for u in ${user[@]}; do
    if [ ! -d /home/$u/Maildir ]; then
    echo "No such Maildir for $u" >> $log 2>&1
    else
    echo "Proceeding with ham and spam training on user \"$u\""
    #add all messages in "junk" directory to spamassassin
    echo spam >> $log 2>&1
    #change this path to match your spam directory, in this case its "Junk"
    #add current and new messages in Junk directory as spam
    sa-learn --no-sync --spam /home/$u/Maildir/.Junk/{cur,new} >> $log 2>&1
    echo ham >> $log 2>&1
    #only add current mail to ham, not new. This gives user a chance to move it to spam dir.
    sa-learn --no-sync --ham /home/$u/Maildir/{cur} >> $log 2>&1
    fi
    done
    
    #sync the journal created above with the database
    echo sync >> $log 2>&1
    sa-learn --sync >> $log 2>&1
    if [ $? -eq 0 ]; then
    for u in ${user[@]}; do
    echo "deleting spam for $u older than 30 days" >> $log 2>&1
    find /home/$u/Maildir/.Junk/cur/ -type f -mtime +$cleanafter -exec rm {} \;
    done
    else
    echo "sa-learn wasn't able to sync. Something is broken. Skipping spam cleanup"
    fi
    
    echo "Statistics:" >> $log 2>&1
    sa-learn --dump magic >> $log 2>&1
    echo ============================== >> $log 2>&1
    
    if [ -n $bk ]; then
    echo "backup writing to $bk" >> $log 2>&1
    sa-learn --backup > $bk
    fi
    

    Here is my sieve rule for moving messages that are marked as spam to my Junk directory. I setup roundcube for people to manage their sieve filters.

    $ cat /home/jason/sieve/managesieve.sieve
    require ["fileinto"];
    # rule:[SPAM]
    if header :contains "X-Spam-Status" "Yes"
    {
    	fileinto "Junk";
    	stop;
    }
    
  • Silverlight (Microsoft) and Moonlight (Novell) Digital Restrictions Management frustrations

    Moonlight is a Free implementation of Microsoft Silverlight. Its a Mono project, http://www.mono-project.com/Moonlight. I have read rumors that 2.0 will have support for Netflix. I went to install the 2.0 dev (currently 1.99.5). Logged into netflix and was denied. I was confused because moonlights dev website says that 1.99.5 is complete except for bug testing and a security audit. I did more reading and found that its not a limitation of moonlight at all, its the DRM (Digital Restrictions Management) that Netflix licenses from Microsoft called Play Ready. This is why we can’t watch Netflix with Moonlight. Another case where the end user suffers at the hands of DRM. The industry again is trying to control the user and hope no one cares. So what, if people save the movie streams from Netflix, you can copy as many dvd’s as you want when they show up in the mail! What happened to Fair Use? The move to hosted content is highly desireable by these industries. Now they can tell you what, how and when. Since its hosted on their servers, you can’t do a thing about it (other than not use it). Ugghh. I am disappointed…