Mondoarchive Exclude List Failures

To illustrate config files in sh scripts, I published my mondoarchive script. That script dynamically builds an mondo exclude list from a list of directories inside a file.

Since I published that article, many of you have arrived here after Googling mondoarchive exclude lists. It seems they’re failing you. Fear not, faithful Googlers. Your JoeDog has experienced this pain and he can help.

There are two main problems with mondoarchive exclude lists that causes the program to ignore them. One is documentation and the other is a bug.

Older versions of mondoarchive use space separated exclude lists. You construct them like this:

  -E "/usr/src /data/archive /usr/local/src"

Since version the syntax has changed for both -E and -I. Whereas older versions used space separated lists of directories, newer versions use pipe separated directories. If you have a newer version, construct your lists like this:

  -E "/usr/src|/data/archive|/usr/local/src"

The other problem I’ve encountered appears to be a bug. The first directory in my exclude list wasn’t being excluded. To fix that problem, I’ve placed /tmp first in all my exclude lists.

  -E "/tmp|/usr/src|/data/archive|/usr/local/src"

Problem “solved.”

Creating Config Files For sh Scripts

Your JoeDog uses mondorescue for bare-metal Linux restoration. We use mondorestore to recover the OS and Net Backup to recover its content. Since we’re only concerned about archiving the OS for bare-metal recovery, it’s necessary to exclude directories when we run mondoarchive.

My exclude requirement varies from server to server so I wanted to build the list dynamically. As a coder, I have religious aversion to altering scripts for the purpose of configuring them. If we set config variables inside the script, then we have a different version on every server. That’s a paddlin’.

For my mondoarchive script, I developed a pretty slick way to read a configuration file and build an exclude list. The list is configured in a conf file that ignores comment lines and superfluous white space. A typical configuration looks like this:

# This file is maintained by the Puppet Master 
# This is the exclude list for mondoarchive Directories inside
# this list will not be archived for bare metal recovery.

My mondoarchive script builds a string of pipe separated directories like this:


Since very few of you will have a similar usecase, I wrote an example that reads the file into a sh array. This version will loop through the array and print each one.

# An example script that reads a list from a config
# file into a sh script array.
# Read the directory list from $CONF
if [[ -e $CONF ]] ; then
  while read line ; do
    # XXX: Use awk's substr on older systems like
    # HPUX which don't support the above syntax.
    # chr=$(echo $line | awk '{print substr($1,0,1)}')
    case $chr in
       # ignore comments
       if [[ ${#line} -gt 2 ]] ; then
         if [[ -z $LIST ]] ; then
         LIST="$LIST $line"
  done < $CONF
  echo "$0: [error] unable to locate $CONF"
let X=1
for I in $LIST ; do
  echo "$X: $I"
  let X=$X+1

Let’s run this bad boy and see what happens:

$ sh haha
1: /tmp
2: /etc
3: /usr/local
4: /data/mrepo

If some of the concepts listed don’t make sense, then you might want to see our sh scripting cheat sheet. It will help you understand things like ‘-e $CONF’ and sh script arrays. Happy hacking.

UPDATE: Given the introduction to this post, it’s likely that many of you have arrived here in search of a mondoarchive backup script. Well, we won’t let you leave empty handed. You can grab my archive script here: Mondo Rescue Archive Script

This script builds both NFS recoverable archives and DVD images to an NFS mounted volume. Here’s its usage banner:

Usage: archiver [-c|-n]
Requires either a '-c' or a '-n' argument
  -c      create a CD Rom archive
  -n      create an NFS archive

Is There An AJP Functional Test?

There are plenty of helpful tools to test network services. If you want to check HTTP functionality, you could craft a request with curl, wget or “siege -g” to see if a server is functioning. If you understand the service protocol, you can always telnet to a TCP port and type a transaction.

Unfortunately, there aren’t many tools to help you test AJP protocol. Sure, you can telnet to the port to ensure it’s running, but how many people know how to craft an AJP transaction? I didn’t.

In order to help you test AJP servers like Apache’s tomcat, I wrote ajping. It connects to a user-define port and conducts a simple transaction. ajping validates the server’s response and clocks the length of the transaction. Over the LAN, you should expect times in the hundreds of seconds. This is a command line utility. In order to install it, run the following commands:

 $ wget
 $ mv ajping.txt ajping
 $ chmod +x ajping

You can test a server with it like this:

LT $ ajping
Reply from 7 bytes in 0.019 seconds
Reply from 7 bytes in 0.004 seconds
Reply from 7 bytes in 0.004 seconds
Reply from 7 bytes in 0.011 seconds
Reply from 7 bytes in 0.004 seconds
Reply from 7 bytes in 0.016 seconds
Reply from 7 bytes in 0.009 seconds
Reply from 7 bytes in 0.021 seconds
Reply from 7 bytes in 0.011 seconds
Reply from 7 bytes in 0.025 seconds

I’ve also incorporated this code into a check_ajp script for Zenoss. Remove the .txt extension and install it on Zenoss as you would any other script.  Happy hacking.

UPDATE: I fixed the links to point to the new download location. H/T paalfe

Use Fido To Process FTP Uploads

Did you ever want to process a file immediately after it was uploaded via FTP? You could have the upload script execute a remote command after the file is uploaded. That requires shell access that you may or may not be able to grant. On the server, you could run a processing script every minute out of cron but that could get messy.

Fido provides alternative method.

Starting with version 1.0.7, Fido has the ability to monitor a file or directory by its modification date. When the date changes, fido launches a script. We can use this feature to process files that are uploaded via ftp.

In this example, we’ll monitor a directory. In fido.conf, we’ll set up a file block that points to a directory. (For more information about configuring fido, see the user’s manual). This is our configuration:

/home/jdfulmer/incoming {
 rules = modified
 action = /home/jdfulmer/bin/process
 log = /home/jdfulmer/var/log/fido.log

With this configuration, fido will continuously watch /home/jdfulmer/incoming for a modification change. When a file is upload, the date will change and fido will launch /home/jdfulmer/bin/process. Pretty sweet, huh?

Not quite. The modification date will change the second ftp lays down the first bite. Our script would start to process the file before it’s fully uploaded. How do we get around that? We can make our script smarter.

For the purpose of this exercise, I’m just going to move uploaded files from incoming to my home directory. Here’s a script that will do that:

for F in $FILES ; do
  while [ -n "$(lsof | grep $F)" ] ; do
    sleep 1
  mv $PREFIX/$F /home/jdfulmer

In order to ensure the file is fully uploaded, I check lsof for its name. If there’s an open file handle under that name, then the script will continue to loop until it’s cleared. When the while loop breaks, the script moves the file.

There’s just one more thing to think about. When the script moves the file what happens to the directory fido is watching? Yes. Its modification date changes. In my example, process runs a second time but does nothing since nothing is there. Depending on your situation, you may need to make the script a little smarter.

The Free Press

The Washington Examiner blared this headline: “Obama ‘a fan’ of singer accused of homophobia.” The singer is Cee Lo. To prove he was a homophobe, they quoted him. Here’s the self-incriminating money-shot: “I most certainly am not harboring any sort of negative feeling toward the gay community.” I hate the American news media….

Linux Marketshare

I’m responsible for around 150 GNU/Linux servers. Not one of them actually shipped with Linux. They were all bare metal installs at the point of delivery. That’s generally how Linux works. You buy hardware from one vendor and OS entitlements from another. If my experience isn’t unusal, then the latest server track numbers from IDC are quite extraordinary.

IDC tracks servers shipped by OEMs to customers and reports on hardware and OS marketshare. It doesn’t track bare metal installations, hardware re-provisions and VM guest installs. Last quarter, according to IDC, factory revenue for Linux grew while it shrank for Windows and UNIX. On top of the provisioning methods I mentioned above, customers are increasingly asking IBM, HP and Dell to ship servers with Linux installed.

According to IDC, demand was driven by the need for high performance and cloud computing. Linux has also earned a reputation as more reliable and more secure than that other Intel OS. And if you want to ruin hardware performance, just add virus protection which is a necessity in the Windows operating environment. Given the risk vs. the reward of increased performance, many Linux administrators simply eschew virus protection. That gives Linux a real world performance boost over its rival from Microsoft.

Once Linux conquers the datacenter, it’s only a matter of time until millions of open source developers really start to focus on the desktop. Proprietary software’s best days are behind it.


Concurrency and the Single Siege

We’re frequently asked about concurrency. When a siege is finished, one of its characteristics is “Concurrency” which is described with a decimal number. This stat is known to make eyebrows furl. People want to know, “What the hell does that mean?”

In computer science, concurrency is a trait of systems that handle two or more simultaneous processes. Those processes may be executed by multiple cores, processors or threads. From siege’s perspective, they may even be handled by separate nodes in a server cluster.

When the run is over, we try to infer how many processes, on average, were executed simultaneously the web server. The calculation is simple: total transactions divided by elapsed time. If we did 100 transactions in 10 seconds, then our concurrency was 10.00.

Bigger is not always better

Generally, web servers are prized for their ability to handle simultaneous connections. Maybe your benchmark run was 100 transactions in 10 seconds. Then you tuned your server and your final run was 100 transactions in five seconds. That is good. Concurrency rose as the elapsed time fell.

But sometimes high concurrency is a trait of a poorly functioning website. The longer it takes to process a transaction, the more likely they are to queue.  When the queue swells, concurrency rises. The reasons for this rise can vary. An obvious cause is load.  If a server has more connections than thread handlers, requests are going to queue. Another is competence – poorly written apps can take longer to complete then well-written ones.

We can illustrate this point with an obvious example. I ran siege against a two-node clustered website. My concurrency was 6.97. Then I took a node away and ran the same run against the same page. My concurrency rose to 18.33. At the same time, my elapsed time was extended 65%.

Sweeping conclusions

Concurrency must be evaluated in context. If it rises while the elapsed time falls, then that’s a Good Thing™. But if rises while the elapsed time increases, then Not So Much™. When you reach the point where concurrency rises and elapsed time is extended, then it might be time to consider more capacity.


HTTP Authentication

Some of you seem to confuse Basic authentication with form-based authentication. They’re not the same and the differences are important. If you don’t configure siege for the appropriate authentication method, it will be on the outside looking in at an HTTP-401.

Basic authentication occurs at the protocol level. It was originally described in HTTP/1.0 and later moved to RFC 2617. Basic authentication is a challenge/response framework. When the server receives a request for a protected resource, it challenges the user to authenticate himself. It will make the item available only after the user is autheticated.

Here’s an example exchange using basic.php from the html directory inside the siege source code:

GET /siege/basic.php HTTP/1.0
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.71b6)
Connection: close
HTTP/1.1 401 Authorization Required
Date: Thu, 16 Feb 2012 13:09:53 GMT
Server: CERN/1.0A
X-Powered-By: PHP/5.2.5
WWW-Authenticate: Basic realm="siege_basic_auth"
Status: 401 Unauthorized
Content-Length: 178
Connection: close
Content-Type: text/html; charset=WINDOWS-1251
GET /siege/basic.php HTTP/1.0
Authorization: Basic c2llZ2U6aGFoYQ==
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.71b6)
Connection: close
HTTP/1.1 200 OK
Date: Thu, 16 Feb 2012 13:09:53 GMT
Server: CERN/1.0A
X-Powered-By: PHP/5.2.5
Content-Length: 278
Connection: close
Content-Type: text/html; charset=WINDOWS-1251

See what happened? Siege requested /siege/basic.php and the server was all “Whoa! I don’t know who you are.” It issued an HTTP 401 challenge to siege which responded by sending its username and password in BASE64 encryption: c2llZ2U6aGFoYQ==

In this example, I emulated HTTP Basic authentication with a php program. Typically, Basic auth is setup at the server level. Here’s an example in apache:

<Location "/siege">
   AuthType basic
   AuthName "siege_basic_auth"
   AuthBasicProvider file
   AuthUserFile /var/www/etc/passwd
   AuthGroupFile /var/www/etc/group
   Require valid-user
   Require group siege
   Satisfy All

To configure siege to use basic authetication, you need to add a login to your .siegerc file. Search the file for WWW-Authenticate. The directive is login and it takes three values separated by a colon. username:password:realm. Our basic.php username and password are ‘siege’ and ‘haha’. So our login looks like this:

login = siege:haha:siege_basic_auth

The third argument (realm) is optional. If you don’t specify a realm, siege will send ‘siege:haha’ every time it faces an HTTP basic challenge. By setting a realm, you can configure it to use multiple logins:

login = admin:secret:Administration
login = siege:haha:siege_basic_auth
login = root:d41ly:high_level

Now you can also restrict access programmatically. This is referred to as form-based authentication. In order to configure siege to login in this manner, you’ll need to reproduce a browser’s action.

To illustrate this, we’ve included login.php in the html directory of the siege source code. That page accepts both GET and POST requests. It produced an HTML form that looks like this:

<td>Username: </td><td>
<input type='text' name='username' value='' size='30'></td>
<td>Password: </td><td>
<input type='password' name='password' value='' size='30'></td>

To login to this form, you’ll need to provide field values that match the form. Your parameters must match the form input names. In this case it’s ‘username’ and ‘password’. POST username=siege&password=haha

If your entire site requires authentication you can add a login URL to your .siegerc file. If this value is set, siege will access that URL before it does anything. Search your .siegerc file for ‘login-url’. Here’s an example using one of the URLs we constructed above:

login-url = POST username=siege&password=haha

After it hits that URL, siege will start running through the list of URLs you created.

Happy hacking.

Garbled Apostrophes And Other Things

Do you have man pages with garbled type? I’m working on a multi-threaded file watcher that searches for patterns in files and executes commands on a match. In order to release it into the wild, I need documentation. That means man pages. So I’m viewing my man pages and I see crap like this: ’-f /path/file’

Those are supposed to be single-quotes, i.e., apostrophes.

For this project, I’m building my man pages from perl PODs with Pod::Man. In case you’d like to do the same, here’s a handy utility for making man pages from perl pods. It converts POD data to *roff.

# A Pod::Man example script
use Pod::Man;
my $input = $ARGV[0] or barf();
my $output = $ARGV[1] or barf();

my $parser = Pod::Man->new (release => $VERSION, section => 8);
$parser->parse_from_file ($input, $output);

sub barf() {
  print "usage: $0 <file.pod> <file.1>n";

When I saw the garbled text above, I suspected a problem with my method. It turns out that wasn’t the case at all. The culprit was my character set. My language was set to en_US.UTF-8 but my terminal didn’t support that character set. If you’re having a similar problem, you can check your character set with this command:

$ set | grep -i lang

The fix is easy:

export LANG=en_US

Add that to your .profile to make it permanent.

Counting Downloads With Fido

I wanted to illustrate how to use fido with an example. Today we’re going to use it to count software downloads on this site. Exciting! This will be simple since we only have one data source. A few years ago, I move my software from an FTP repository onto this web server. To quantify software downloads, we can simply monitor the http access log.

Here’s our fido configuration for the log file:

/var/log/httpd/access_log {
 rules  = downloads.conf
 action = /usr/local/bin/tally
 log    = syslog

This tells fido to monitor the access_log in real time. Its pattern match rules are in a file called downloads.conf When fido finds a match, it will execute a program called tally. Finally, the last directive tells fido to use syslog to log its activity.

In order to understand what we’re looking for, you should take a look at the software repository. It contains multiple versions and helpful links to the latest releases and betas. We want to match them all.

Let’s take a look at our downloads.conf file. Since we didn’t specify a full path to the file, fido knows to look for it under $sysconfdir/etc/fido/rules. If you configured it to use /etc, then the rules are found in /etc/fido/rules/downloads.conf. Here’s the file:

# Track and count downloads from the website
SIEGE:  .*siege-.*tar.gz.*
FIDO:   .*fido-.*([rpm]|[tar.gz]).*
WACKY:  .*wackyd-.*tar.gz.*
DICK:   .*dick.*tar.gz.*
SPROXY: .*sproxy-.*tar.gz.*
CONFIG: .*JoeDog-Config.*
STATS:  .*JoeDog-Stats.*
GETOPT: .*php-getopt.*
WACKY:  .*JoeDog-Wacky.*
PBAR:   .*JoeDog-ProgressBar.*

Each line begins with an optional label. If a label is present, fido will pass it (minus the colon) to the action program. In the example above, if the JoeDog-Config perl module is downloaded, then fido will run /usr/local/bin/tally CONFIG. For more on labels, see the fido user’s manual.

Continue reading Counting Downloads With Fido