jason schaefer . com

"arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say."

Category: scripts

  • Apache, MySQL and WordPress install script

    After lots of laborious manual installs and much motivation from my buddy Damian of Mindshare, I decided to write a little script to quickly install and setup a typical environment for WordPress.
    This script does the following:
    – install apache, php and mysql
    – activate typical apache modules
    – create directories
    – download and un-tar WordPress
    – set permissions for wordpress doc root et al
    – create database, user, pass and grant to db.
    – auto setup typical VirtualHost site file in apache for both http and https
    – generate a self signed certificate

    Here is the bash script -> http://jasonschaefer.com/stuff/setupwp.sh.txt
    download, rename, and chmod 755 and run it like so “./setupwp.sh hostname”
    Be sure to understand what the script is doing before you run it :-)

  • SpamAssassin training and spam cleanup script

    Spam is a constant battle as it is ever changing and always creeping into your Inbox. Spam wrangling is only effective with proper training, SpamAssassin does a decent job out-of-the-box but needs users input to truly be effective. This script will run SpamAssassin’s built in sa-learn tool against users known spam and known ham.

    With my setup — A spam message is received (postfix) and identified as spam (spamassassin), it will be moved to the users Junk directory per the Sieve (dovecot) rule at the bottom of this post. If a spam message is received and not matched as such, it will be delivered to the Inbox. The user will identify it and manually move it to the Junk directory. I like to configure Thunderbird to “Move new junk messages to: “Junk” folder on: … in Account Settings.” Now a user marks a message as Junk, it automatically gets moved.

    When this script runs it will mark messages in the Inbox as ham and messages in the Junk directory as spam. The command “sa-learn –sync” adds these results to the Bayes database that SpamAssassin consults when determining a spam score for each received message. This database is optionally backed up in the event a mistake was made and you need to revert back to previous versions. The script logs to /var/log/train-mail.log, information about how much spam and ham is being added, how many total messages have been processed and stats on the auto clean feature built into sa-learn can be gleaned. Additionally, I setup a spam cleanup using find and rm.

    If sa-learn scans a mail as ham when it is spam, or vice versa, simply move the messages to the correct directory (Inbox=ham/Junk=spam), and the mistake will be corrected on the next run. SpamAssassin will automatically ‘forget’ the previous indications.

    Its important to have an equal part ham to spam. As a result I run this script daily in an effort to capture users ham before they delete it or sort it into sub folders. Another thing to be aware of is that typically you should aim to train with at least 1000 messages of spam, and 1000 ham messages, if possible. More is better, but anything over about 5000 messages does not improve accuracy significantly in SpamAssassins tests.

    Obviously a lot of these options are site/user specific. This is a good foundation to use as-is or build from.

    #!/bin/bash
    
    #specify one or more users, space padded [user=(user1 user2 user3)] or empty [user=()] to include all users. All users is considered uid ≥ 1000.
    user=()
    
    #After how many days should Spam be deleted?
    cleanafter=30
    
    #backup path, comment out to disable backups
    bk=/home/backup/sa-learn_bayes_`date +%F`.backup
    
    log=/var/log/train-mail.log
    #log=/dev/stdout
    
    echo -e "\n`date +%c`"  >> $log 2>&1
    
    if [ -z ${user[@]} ]; then
    echo user is empty, using all users from system
    user=(`awk -F':' '$3 >= 1000 && $3 < 65534' /etc/passwd |awk -F':' '{print $1}'`)
    fi
    
    for u in ${user[@]}; do
    if [ ! -d /home/$u/Maildir ]; then
    echo "No such Maildir for $u" >> $log 2>&1
    else
    echo "Proceeding with ham and spam training on user \"$u\""
    #add all messages in "junk" directory to spamassassin
    echo spam >> $log 2>&1
    #change this path to match your spam directory, in this case its "Junk"
    #add current and new messages in Junk directory as spam
    sa-learn --no-sync --spam /home/$u/Maildir/.Junk/{cur,new} >> $log 2>&1
    echo ham >> $log 2>&1
    #only add current mail to ham, not new. This gives user a chance to move it to spam dir.
    sa-learn --no-sync --ham /home/$u/Maildir/{cur} >> $log 2>&1
    fi
    done
    
    #sync the journal created above with the database
    echo sync >> $log 2>&1
    sa-learn --sync >> $log 2>&1
    if [ $? -eq 0 ]; then
    for u in ${user[@]}; do
    echo "deleting spam for $u older than 30 days" >> $log 2>&1
    find /home/$u/Maildir/.Junk/cur/ -type f -mtime +$cleanafter -exec rm {} \;
    done
    else
    echo "sa-learn wasn't able to sync. Something is broken. Skipping spam cleanup"
    fi
    
    echo "Statistics:" >> $log 2>&1
    sa-learn --dump magic >> $log 2>&1
    echo ============================== >> $log 2>&1
    
    if [ -n $bk ]; then
    echo "backup writing to $bk" >> $log 2>&1
    sa-learn --backup > $bk
    fi
    

    Here is my sieve rule for moving messages that are marked as spam to my Junk directory. I setup roundcube for people to manage their sieve filters.

    $ cat /home/jason/sieve/managesieve.sieve
    require ["fileinto"];
    # rule:[SPAM]
    if header :contains "X-Spam-Status" "Yes"
    {
    	fileinto "Junk";
    	stop;
    }
    
  • Simple tracking of top memory users over time

    I have a Dreamhost VPS account and have been running out of memory and experiencing the dreaded forced reboots dh impose. I found it difficult to identify the offending sites that take up all that memory on my server. Every time I login and run top it was too late or I would find a website being crawled by a search bot. How to find a trend over time, without getting too complicated? My solution was to track the memory usage with ps and write that to individual files, then sort all those files and derive the top offenders in one list. Which is web accessible (or not) for easy viewing later. If my VPS reboots, I can go back to the individual files before the forced reboot and get details of whats causing the problem.

    #!/bin/bash
    
    #no trailing slash. Be sure this dir exists.
    path=/home/jason/jasonschaefer.com/memusages
    
    logfile=index.txt
    
    #how many days to keep files, remove after..
    removeafter=5
    
    # ps -[e]everything, [o]format
    # rss is resident set size in kilobytes
    # user:20 username with 20 char space so it won't revert to uid on usernames longer than 8 chars
    # cmd:40 running command with 40 char column, stime=start time of cmd
    # [h]hide headers, --sort=rss sorts on rss column
    /bin/ps -eo rss,user:20,cmd:40,stime,pid h --sort=rss > $path/mem`date +"%F_%k-%M"`.txt
    
    # sort unique, numeric on column 5 the pid, so we don't show duplicate processes.
    # then sort numeric, reverse on column 1 the memory usage, write the top 200 lines to our logfile.
    /usr/bin/sort -u -n -k 5 $path/mem*.txt | /usr/bin/sort -n -r -k 1 | head -n 200 > $path/$logfile
    
    #find files older than $removeafter days and remove them
    /usr/bin/find $path -mtime $removeafter -exec rm -fr {} \;
    

    Don’t forget to make it executable
    chmod 755 /home/jason/memusages.sh

    Then setup to run in cron, to run every ten minutes of every hour, every day, every month and every day of week. Change as needed.. I have it running every minute right now. Depending on your setup you may need to run this as root to see all system processes.
    crontab -e
    */10 * * * * /home/jason/memusages.sh