Use Fido To Process FTP Uploads

Did you ever want to process a file immediately after it was uploaded via FTP? You could have the upload script execute a remote command after the file is uploaded. That requires shell access that you may or may not be able to grant. On the server, you could run a processing script every minute out of cron but that could get messy.

Fido provides alternative method.

Starting with version 1.0.7, Fido has the ability to monitor a file or directory by its modification date. When the date changes, fido launches a script. We can use this feature to process files that are uploaded via ftp.

In this example, we’ll monitor a directory. In fido.conf, we’ll set up a file block that points to a directory. (For more information about configuring fido, see the user’s manual). This is our configuration:

/home/jdfulmer/incoming {
 rules = modified
 action = /home/jdfulmer/bin/process
 log = /home/jdfulmer/var/log/fido.log

With this configuration, fido will continuously watch /home/jdfulmer/incoming for a modification change. When a file is upload, the date will change and fido will launch /home/jdfulmer/bin/process. Pretty sweet, huh?

Not quite. The modification date will change the second ftp lays down the first bite. Our script would start to process the file before it’s fully uploaded. How do we get around that? We can make our script smarter.

For the purpose of this exercise, I’m just going to move uploaded files from incoming to my home directory. Here’s a script that will do that:

for F in $FILES ; do
  while [ -n "$(lsof | grep $F)" ] ; do
    sleep 1
  mv $PREFIX/$F /home/jdfulmer

In order to ensure the file is fully uploaded, I check lsof for its name. If there’s an open file handle under that name, then the script will continue to loop until it’s cleared. When the while loop breaks, the script moves the file.

There’s just one more thing to think about. When the script moves the file what happens to the directory fido is watching? Yes. Its modification date changes. In my example, process runs a second time but does nothing since nothing is there. Depending on your situation, you may need to make the script a little smarter.

The Free Press

The Washington Examiner blared this headline: “Obama ‘a fan’ of singer accused of homophobia.” The singer is Cee Lo. To prove he was a homophobe, they quoted him. Here’s the self-incriminating money-shot: “I most certainly am not harboring any sort of negative feeling toward the gay community.” I hate the American news media….

Linux Marketshare

I’m responsible for around 150 GNU/Linux servers. Not one of them actually shipped with Linux. They were all bare metal installs at the point of delivery. That’s generally how Linux works. You buy hardware from one vendor and OS entitlements from another. If my experience isn’t unusal, then the latest server track numbers from IDC are quite extraordinary.

IDC tracks servers shipped by OEMs to customers and reports on hardware and OS marketshare. It doesn’t track bare metal installations, hardware re-provisions and VM guest installs. Last quarter, according to IDC, factory revenue for Linux grew while it shrank for Windows and UNIX. On top of the provisioning methods I mentioned above, customers are increasingly asking IBM, HP and Dell to ship servers with Linux installed.

According to IDC, demand was driven by the need for high performance and cloud computing. Linux has also earned a reputation as more reliable and more secure than that other Intel OS. And if you want to ruin hardware performance, just add virus protection which is a necessity in the Windows operating environment. Given the risk vs. the reward of increased performance, many Linux administrators simply eschew virus protection. That gives Linux a real world performance boost over its rival from Microsoft.

Once Linux conquers the datacenter, it’s only a matter of time until millions of open source developers really start to focus on the desktop. Proprietary software’s best days are behind it.


Concurrency and the Single Siege

We’re frequently asked about concurrency. When a siege is finished, one of its characteristics is “Concurrency” which is described with a decimal number. This stat is known to make eyebrows furl. People want to know, “What the hell does that mean?”

In computer science, concurrency is a trait of systems that handle two or more simultaneous processes. Those processes may be executed by multiple cores, processors or threads. From siege’s perspective, they may even be handled by separate nodes in a server cluster.

When the run is over, we try to infer how many processes, on average, were executed simultaneously the web server. The calculation is simple: total transactions divided by elapsed time. If we did 100 transactions in 10 seconds, then our concurrency was 10.00.

Bigger is not always better

Generally, web servers are prized for their ability to handle simultaneous connections. Maybe your benchmark run was 100 transactions in 10 seconds. Then you tuned your server and your final run was 100 transactions in five seconds. That is good. Concurrency rose as the elapsed time fell.

But sometimes high concurrency is a trait of a poorly functioning website. The longer it takes to process a transaction, the more likely they are to queue.  When the queue swells, concurrency rises. The reasons for this rise can vary. An obvious cause is load.  If a server has more connections than thread handlers, requests are going to queue. Another is competence – poorly written apps can take longer to complete then well-written ones.

We can illustrate this point with an obvious example. I ran siege against a two-node clustered website. My concurrency was 6.97. Then I took a node away and ran the same run against the same page. My concurrency rose to 18.33. At the same time, my elapsed time was extended 65%.

Sweeping conclusions

Concurrency must be evaluated in context. If it rises while the elapsed time falls, then that’s a Good Thing™. But if rises while the elapsed time increases, then Not So Much™. When you reach the point where concurrency rises and elapsed time is extended, then it might be time to consider more capacity.


HTTP Authentication

Some of you seem to confuse Basic authentication with form-based authentication. They’re not the same and the differences are important. If you don’t configure siege for the appropriate authentication method, it will be on the outside looking in at an HTTP-401.

Basic authentication occurs at the protocol level. It was originally described in HTTP/1.0 and later moved to RFC 2617. Basic authentication is a challenge/response framework. When the server receives a request for a protected resource, it challenges the user to authenticate himself. It will make the item available only after the user is autheticated.

Here’s an example exchange using basic.php from the html directory inside the siege source code:

GET /siege/basic.php HTTP/1.0
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.71b6)
Connection: close
HTTP/1.1 401 Authorization Required
Date: Thu, 16 Feb 2012 13:09:53 GMT
Server: CERN/1.0A
X-Powered-By: PHP/5.2.5
WWW-Authenticate: Basic realm="siege_basic_auth"
Status: 401 Unauthorized
Content-Length: 178
Connection: close
Content-Type: text/html; charset=WINDOWS-1251
GET /siege/basic.php HTTP/1.0
Authorization: Basic c2llZ2U6aGFoYQ==
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.71b6)
Connection: close
HTTP/1.1 200 OK
Date: Thu, 16 Feb 2012 13:09:53 GMT
Server: CERN/1.0A
X-Powered-By: PHP/5.2.5
Content-Length: 278
Connection: close
Content-Type: text/html; charset=WINDOWS-1251

See what happened? Siege requested /siege/basic.php and the server was all “Whoa! I don’t know who you are.” It issued an HTTP 401 challenge to siege which responded by sending its username and password in BASE64 encryption: c2llZ2U6aGFoYQ==

In this example, I emulated HTTP Basic authentication with a php program. Typically, Basic auth is setup at the server level. Here’s an example in apache:

<Location "/siege">
   AuthType basic
   AuthName "siege_basic_auth"
   AuthBasicProvider file
   AuthUserFile /var/www/etc/passwd
   AuthGroupFile /var/www/etc/group
   Require valid-user
   Require group siege
   Satisfy All

To configure siege to use basic authetication, you need to add a login to your .siegerc file. Search the file for WWW-Authenticate. The directive is login and it takes three values separated by a colon. username:password:realm. Our basic.php username and password are ‘siege’ and ‘haha’. So our login looks like this:

login = siege:haha:siege_basic_auth

The third argument (realm) is optional. If you don’t specify a realm, siege will send ‘siege:haha’ every time it faces an HTTP basic challenge. By setting a realm, you can configure it to use multiple logins:

login = admin:secret:Administration
login = siege:haha:siege_basic_auth
login = root:d41ly:high_level

Now you can also restrict access programmatically. This is referred to as form-based authentication. In order to configure siege to login in this manner, you’ll need to reproduce a browser’s action.

To illustrate this, we’ve included login.php in the html directory of the siege source code. That page accepts both GET and POST requests. It produced an HTML form that looks like this:

<td>Username: </td><td>
<input type='text' name='username' value='' size='30'></td>
<td>Password: </td><td>
<input type='password' name='password' value='' size='30'></td>

To login to this form, you’ll need to provide field values that match the form. Your parameters must match the form input names. In this case it’s ‘username’ and ‘password’. POST username=siege&password=haha

If your entire site requires authentication you can add a login URL to your .siegerc file. If this value is set, siege will access that URL before it does anything. Search your .siegerc file for ‘login-url’. Here’s an example using one of the URLs we constructed above:

login-url = POST username=siege&password=haha

After it hits that URL, siege will start running through the list of URLs you created.

Happy hacking.

Garbled Apostrophes And Other Things

Do you have man pages with garbled type? I’m working on a multi-threaded file watcher that searches for patterns in files and executes commands on a match. In order to release it into the wild, I need documentation. That means man pages. So I’m viewing my man pages and I see crap like this: ’-f /path/file’

Those are supposed to be single-quotes, i.e., apostrophes.

For this project, I’m building my man pages from perl PODs with Pod::Man. In case you’d like to do the same, here’s a handy utility for making man pages from perl pods. It converts POD data to *roff.

# A Pod::Man example script
use Pod::Man;
my $input = $ARGV[0] or barf();
my $output = $ARGV[1] or barf();

my $parser = Pod::Man->new (release => $VERSION, section => 8);
$parser->parse_from_file ($input, $output);

sub barf() {
  print "usage: $0 <file.pod> <file.1>n";

When I saw the garbled text above, I suspected a problem with my method. It turns out that wasn’t the case at all. The culprit was my character set. My language was set to en_US.UTF-8 but my terminal didn’t support that character set. If you’re having a similar problem, you can check your character set with this command:

$ set | grep -i lang

The fix is easy:

export LANG=en_US

Add that to your .profile to make it permanent.

Counting Downloads With Fido

I wanted to illustrate how to use fido with an example. Today we’re going to use it to count software downloads on this site. Exciting! This will be simple since we only have one data source. A few years ago, I move my software from an FTP repository onto this web server. To quantify software downloads, we can simply monitor the http access log.

Here’s our fido configuration for the log file:

/var/log/httpd/access_log {
 rules  = downloads.conf
 action = /usr/local/bin/tally
 log    = syslog

This tells fido to monitor the access_log in real time. Its pattern match rules are in a file called downloads.conf When fido finds a match, it will execute a program called tally. Finally, the last directive tells fido to use syslog to log its activity.

In order to understand what we’re looking for, you should take a look at the software repository. It contains multiple versions and helpful links to the latest releases and betas. We want to match them all.

Let’s take a look at our downloads.conf file. Since we didn’t specify a full path to the file, fido knows to look for it under $sysconfdir/etc/fido/rules. If you configured it to use /etc, then the rules are found in /etc/fido/rules/downloads.conf. Here’s the file:

# Track and count downloads from the website
SIEGE:  .*siege-.*tar.gz.*
FIDO:   .*fido-.*([rpm]|[tar.gz]).*
WACKY:  .*wackyd-.*tar.gz.*
DICK:   .*dick.*tar.gz.*
SPROXY: .*sproxy-.*tar.gz.*
CONFIG: .*JoeDog-Config.*
STATS:  .*JoeDog-Stats.*
GETOPT: .*php-getopt.*
WACKY:  .*JoeDog-Wacky.*
PBAR:   .*JoeDog-ProgressBar.*

Each line begins with an optional label. If a label is present, fido will pass it (minus the colon) to the action program. In the example above, if the JoeDog-Config perl module is downloaded, then fido will run /usr/local/bin/tally CONFIG. For more on labels, see the fido user’s manual.

Continue reading Counting Downloads With Fido

Invalid command ‘TypesConfig’

Ah but the joys of trying to match the missing module with its obtuse apache error. In this case, we tried to use the TypesConfig directive but the module wasn’t loaded at runtime. Here’s the error:

# service httpd configtest
Syntax error on line 107 of /etc/httpd/conf/httpd.conf:
Invalid command 'TypesConfig', perhaps misspelled or defined by a module
not included in the server configuration

In this case, we were missing the mime module. You can add that module in your httpd.conf file with the following directive:

LoadModule mime_module modules/

Happy apaching!

Newlines In WordPress

Did you ever want to add a new line to a WordPress entry but it gives you a new paragraph? Instead of this:

– haha
– papa
– mama

You get this:

– haha

– papa

– mama

I hate that. It adds extra space between each line. Fortunately, there’s an easy fix. In order to produce the first list without spaces between each line, just hold the shift key while you hit return.

Invalid command ‘order’

It would be nice if apache told you which module you were missing. Fortunately, there’s the Internets! Hey, this site is on the Internets let’s see if we can help. I just ran ‘service httpd checkconfig’ and received the following error:

# service httpd configtest
Syntax error on line 92 of /etc/httpd/conf/httpd.conf:
Invalid command 'Order', perhaps misspelled or defined by a module 
not included in the server configuration

After a brute force attempt at adding modules, it became clear that I was missing the following module: authz_host_module. I added that in httpd.conf with the following directive:

LoadModule authz_host_module modules/

You can also compile that module into the binary with the following flag: –enable-authz-host  (in most cases that’s compiled by default but I’m using RedHate’s binary so it was necessary to add it at run time).