A Venerable Script gets an upgrade

Everyone knows by now that I’m a big believer in the old-school teachings of the great Masters of System Administration, keeping the true UNIX principles as my guidelines.  As such, I tend to automate my duties to as great an extent as possible, risk no detriment to my users.  As part of this, I write lots and lots of scripts to help me do minor tasks and alert me to changing conditions on the systems I administer.  On such script checks for the presence of files in /lost+found directories on my filesystems, indicating that data was recovered from lost chains during a fsck operation.  These files require my attention.  Therefore I’ve written a script to alert me to their presence.  I’ve just given that script a makeover:


It all started with this e-mail I got letting me know that /pub/lost+found was not empty on one of my servers:

/pub/lost+found not empty!
 From: Sam Starfall 
 To: root@savagechicken.lakemasoniccenter.org

****************************************
* /pub/lost+found is not empty at Sun Dec  4 04:26:12 CST 2011          *
****************************************

Please take care of these files.
-root@dustpuppy

The problem is that there is no such directory on that server.  What was really happening is that the find command was getting bogus results from asking NFS to do something undefined.  /pub is an NFS mount from another server and SELinux was preventing SavageChicken from doing things that were unreasonable, such as accessing /lost+found.  I needed to modify my script to let the find command know to ignore the /pub directory when looking for /lost+found directories:

#!/bin/bash
#
# checks for files in lost+found directories

ALERT=`find / -type d -name 'pub' -prune -type d -name 'lost+found' -not -empty -print 2>/dev/null`
LINES=`echo $ALERT | wc -l`

if [ $LINES > 0 ]
then
        for dir in $ALERT
        do
                mail -s "$dir not empty!" root <<EOF
****************************************
* $dir is not empty at `date`          *
****************************************

`ls -l $dir`

Please take care of these files.
-root@SavageChicken

EOF

        done
fi

I added the -type d -name 'pub' -prune clause to the arguments to find in line 5. This tells find to remove any results under a directory called “pub” from the result list. I also corrected the name of the server from “dustpuppy” (where this script was originally written and used) to SavageChicken. After a test run, I am satisfied that this eliminated the false positive from the /pub directory.

Advertisements

, ,

  1. #1 by Chadwick on December 4, 2011 - 11:13 AM

    Server automation is ideal. It won’t forget to do things like you may. On the other hand, that assumes it sends you messages like that for you to go “WTF? That’s not right.” at. So keep up the good work?

    • #2 by Joshua on December 4, 2011 - 5:20 PM

      /lost+found is used to hold data recovered by fsck after improper shutdowns or unclean disconnects of devices holding mounted filesystems.

      • #3 by Joshua on December 4, 2011 - 5:22 PM

        So the implication is that files in /lost+found on any filesystem is indicative of an abnormal condition on the system. In the normal course of business, there should be no improper shutdowns or unclean removals of storage devices.

  2. #4 by Joshua on December 4, 2011 - 11:48 AM

    Pretty much, yeah. LOL

  3. #5 by Gus Wiening on December 4, 2011 - 1:17 PM

    I’m a fan of not hard-coding values into the actual command string, instead defining these as variables in the script head, even though the value won’t change through the entire operation of the script. For a short script like this it’s not a big deal, but the old monitoring system I used at my previous job had huge scripts to gather system information, and it’s much easier to change the variable value once at the head of the file, instead of making the same change 20 times throughout the body of code.

    • #6 by Joshua on December 4, 2011 - 5:19 PM

      I fully agree. My standards are twofold: How often it changes and how many times the value is used. I consider arguments to find to be part of the coding of the procedure in this case. In other words, the proper functioning of the script hinges on those arguments and the change I applied constituted a bugfix to prevent a false positive.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: