Fork me on Github
Fork me on Github

Joe Dog Software

Proudly serving the Internets since 1999

Who’s Honoring Me Now?

Your JoeDog’s Fido was selected by Profit Bricks as its 24th Best Free Sysadmin Tool

About Fido:

A multi-threaded file watch utility, Fido can monitor a file or directory to see if its modification time changed, and it’s intuitive enough to recognize a change even if the daemon was down at the time of the change. Written for RedHat Enterprise Linux, Fido should run in all Linux flavors, and the source bundle contains RedHat init.d scripts for users’ convenience.

Key Features:

  • Content changes can be recognized with regular expression pattern matches
  • Monitors files for changes in content or modification times
  • It will kick off a user-defined script when it notices a change
  • Coded to POSIX 1003.1 standards


Fido 1.1.5 SIGHUP and Reload

Good morning, JoeDoggers. Let’s bask in the glow of Your Fido this morning; he’s all grown up and ready for love. What does that mean? Well, it means it now behaves like a contemporary modern daemon. Starting with version 1.1.5, if you send it SIGHUP, it will reload its configuration file.

Really? It’s been out since 2011 and you’re only adding that feature now?

Hey, what do you want from me? It’s free, isn’t it?

Here’s how it works: if you change fido’s configuration file, you can send it SIGHUP to reload its key = value pairs. There’s just one thing it won’t reload: its filenames.

Remember, a fido configuration file is divided into two parts; it contains global settings and file settings. The file settings are distinguished by a filename followed by two brackets like this: {}. Here’s an example:

/usr/local/var/my.log {
 # key = value pairs go here.
}

So if you change /usr/local/var/my.log to anything else, you’ll have to restart fido. If you change any other values, then you can just send it SIGHUP.

So how do I send it SIGHUP?

There’s several ways of doing this.

1.) You can look for the process ID (PID) with the ps command and send it SIGHUP (which is signal number 1):

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
# kill -1 31952

2.) Check your system documentation. Some kill commands support name values such as this:

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
Pom # kill -HUP 31952

3.) We can eliminate the ps command by using fido’s pid file like this:

# kill -1 $(cat /var/run/fido.pid)

You can verify a successful config reload by looking at /var/log/messages.

 [Download: Fido]

 



Fido 1.1.4 And Google’s Last Crawl

Your JoeDog can now announce the release of fido-1.1.4. To illustrate its awesome new feature we’re going to do an exercise.

“Wait. Homework? This blog sucks!”

Yeah, but this homework but this is useful. We’re going to capture the time of Google’s last crawl so you can be a hero with the SEO nerds in your company.

“Okay, fine!”

We can identify the googlebot by its User-agent. Google conveniently refers to itself as Googlebot. Here’s an entry in Your JoeDog’s logs:

208.78.85.241 - - [23/Nov/2014:06:16:24 -0500] "GET /blog/ HTTP/1.1" 
   200 57787 "-" "Googlebot/2.X (+http://www.googlebot.com/bot.html)"

Unfortunately, anybody can masquerade as a googlebot. The only way we can be certain this agent is authentic is to check its IP address.

Pom $ dig -x 208.78.85.241
;; ANSWER SECTION:
241.85.78.208.in-addr.arpa. 1316 IN PTR host241.subnet-208-78-85.gigavenue.com.

Wait a second! That’s not Google, that’s a fraud. Let’s check another entry:

Pom $ dig -x 66.249.65.47
;; ANSWER SECTION:
47.65.249.66.in-addr.arpa. 11310 IN PTR crawl-66-249-65-47.googlebot.com.

Okay, that’s Google. So in order to validate and record the time of Google’s last crawl, we have to check the IP address. How do we achieve this?

We’ll use fido to check our logs for instances of Googlebot but it can’t validate the IP address. Our action program can do that but how do we pass it the address?  Prior to fido-1.1.4, that would have been impossible. Starting with 1.1.4 it can now do regex capture and pass those variables to the action program.

To set this up, you’ll need to a file block in fido.conf which points to your access_log.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+.[0-9]+.[0-9]+.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

When fido locates a match, it will capture everything inside the parentheses and send that to the googler script as $1. Here’s the googler script:

#!/usr/bin/perl
use Socket;
use strict;
use vars qw($LOCK_EX $LOCK_UN);
$LOCK_EX = 2;
$LOCK_UN = 8;
my $addr = $ARGV[0];
my $host = gethostbyaddr(inet_aton($addr), AF_INET);
if ($host !~ /.googlebot.com$/) {
 print "ERROR: Forged User-agent ($host)n";
 exit;
}
my $file = "/path/to/joedog.org/google.txt";
open (FILE, ">>$file") or die "Unable to open file: $filen";
flock(FILE, $LOCK_EX);
print FILE timestamp()." | $addrn";
flock(FILE, $LOCK_UN);
exit;
# returns a string in the following format:
# YYYYMMDDHHMMSS
sub timestamp() {
 my $now = time;
 my @date = localtime $now;
 $date[5] += 1900;
 $date[4] += 1;
 my $stamp = sprintf(
 "%02d/%02d/%04d %02d:%02d:%02d",
 $date[4],$date[3],$date[5], $date[2], $date[1], $date[0]
 );
 $stamp .= " | ";
 $stamp .= sprintf(
 "%04d%02d%02d%02d%02d",
 $date[5],$date[4],$date[3], $date[2], $date[1], $date[0]
 );
 return $stamp;
}
sub empty { ! defined $_[0] || ! length $_[0] }

As you can imagine, there’s lots of creative things you can do with the googler script. Your JoeDog hopes to compare crawl frequencies against the site’s freshness to see if there’s a correlation.

[JoeDog: Last Google Crawl]

UPDATE:  As Tim notes in the comments below, the Internets are full of jerks.

Consider this hostname: haha.googlebot.com.fooledyou.ru

The regex has been changed so that the fully qualified hostname must end in .com

 

 

 



Fido Now Supports Regex Capture and Back References

It’s a good thing Your JoeDog is not very bright. If he were an intelligent man, then he’d realize the difficulty of the task he was about to undertake before he undertook it. He would look at that task and think, “I’m not good enough to code that.” Fortunately, he’s not very bright and that means fido has a new feature.

New feature? Exciting!

Beginning with version 1.1.4, fido can do regex capture and pass the back references to the program it should run. Let’s look at an example config to help us make some sense of this. The new feature is highlighted in red.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+.[0-9]+.[0-9]+.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

In this exercise, we want to find instances of GoogleBot in the access log, but we also want to verify that it’s actually the GoogleBot and not a crawler with a forged User-agent. To accomplish that, we want to capture the IP address and send it to our program for validation.

So let’s look at the rule:

^([0-9]+.[0-9]+.[0-9]+.[0-9]+).*GoogleBot

If a line being with an quartet of dot separated numbers, i.e., an IP address, and it contains GoogleBot, then we have a match. Notice that IP address is wrapped in parentheses? That’s our capture. Everything within the parens will be assigned to $1. (BTW: Writing that was a royal PITA.) If we had two sets of parens, then the second set would be assigned to $2.

Fido will assign variables by name so it doesn’t matter which order you pass them to your program.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+.[0-9]+.[0-9]+.[0-9]+).*(GoogleBot)
 action = /home/jeff/bin/googler $2 $1
}

Will run: /home/jeff/bin/googler GoogleBot 198.14.14.6

I’ll release version 1.1.4 after the documentation is up-to-date. In the meantime, early adapters can grab the code from the source repository.

You’ll need autotools on your system. Inside the fido directory, run this command to build the configure script:

utils/bootstrap

Happy hacking.

 



Fido 1.1.3

Your JoeDog had a requirements change. “Stupid requirements!” He had to ensure each file in a directory and all its sub-directories was less than eight days old. Unfortunately, Your Fido didn’t traverse directory trees. He stood watch only at the top of the tree.

That’s the problem with dogs: they have a mind of their own.

Without much effort, fido learned a new trick. It now recursively searches a directory for files. To leverage this feature, you’ll have to give it a command. “Recurse, boy, recurse!”

/export {
 rules = exceeds 7 days
 exclude = ^.|CVS|Makefile
 action = /usr/local/bin/sendtrap.sh
 recurse = true
}

recurse takes one of two values, true or false. True means search the tree and false means remain at the top level. If you don’t set a recurse directive, then fido will treat it as false, i.e., it will remain in the top directory.

[Trending: Fido-1.1.3]

 



Fido Learns A New Trick

I use Mondoarchive to create Linux recovery disks. Each server writes ISO images to a shared volume on a weekly basis. If any file inside that directory is older than seven days, then a server failed to create an ISO. In order to monitor this directory for failure, I added a new feature to fido. Exciting!

Starting with version 1.1.0 (click to download), fido can monitor a file or directory to see if it — or any file inside it — is older than a user configurable period of time. If fido discovers a file whose modification date exceeds the configured time, it fires an alert.

The following example illustrates how to configure the use case above:

/export {
  rules = exceeds 8 days
  exclude = ^.|^lccns178$|^lccns179$|^lccns335$|lccns336$
  throttle = 12 hours
  action = /etc/fido/notify.sh
}

This file block applies to “/export” which is a directory. Since it’s a directory, the rules apply to every file inside it. In this case ‘rules’ is pretty straight forward. We’re looking for files that exceed eight days in age. This rule will always follow this format: exceeds [int] [modifier]. The modifier can be seconds, minutes, hours or days. If you take the long view — if you’re concerned about events far into the future — then you’ll have to do some math. We don’t designate years so you’ll have to use 1825 days if you want to be alerted five years out.

We also find a new feature inside this block. ‘exclude’ takes a regular expression and tells fido which files to ignore. Currently, ‘exclude’ only works inside a file block with an exceeds rule but I plan to make better use of it.

Finally we notice one final feature that we’ve never seen before. The ‘throttle’ directive tells fido how long to wait between alerts. In this scenario, fido will trigger an alert the second it finds a file which exceeds 8 days. If the problem is not addressed within twelve hours, it will fire another alert. Alerts will continue in twelve hour intervals until the problem is corrected.

I hope you enjoy these features. If there are enhancements you’d like to see, feel free to contact me either in the comments or by email.



Use Fido To Process FTP Uploads

Did you ever want to process a file immediately after it was uploaded via FTP? You could have the upload script execute a remote command after the file is uploaded. That requires shell access that you may or may not be able to grant. On the server, you could run a processing script every minute out of cron but that could get messy.

Fido provides alternative method.

Starting with version 1.0.7, Fido has the ability to monitor a file or directory by its modification date. When the date changes, fido launches a script. We can use this feature to process files that are uploaded via ftp.

In this example, we’ll monitor a directory. In fido.conf, we’ll set up a file block that points to a directory. (For more information about configuring fido, see the user’s manual). This is our configuration:

/home/jdfulmer/incoming {
 rules = modified
 action = /home/jdfulmer/bin/process
 log = /home/jdfulmer/var/log/fido.log
}

With this configuration, fido will continuously watch /home/jdfulmer/incoming for a modification change. When a file is upload, the date will change and fido will launch /home/jdfulmer/bin/process. Pretty sweet, huh?

Not quite. The modification date will change the second ftp lays down the first bite. Our script would start to process the file before it’s fully uploaded. How do we get around that? We can make our script smarter.

For the purpose of this exercise, I’m just going to move uploaded files from incoming to my home directory. Here’s a script that will do that:

#!/bin/sh
PREFIX="/home/jdfulmer/incoming"
FILES=$(ls $PREFIX)
for F in $FILES ; do
  while [ -n "$(lsof | grep $F)" ] ; do
    sleep 1
  done
  mv $PREFIX/$F /home/jdfulmer
done

In order to ensure the file is fully uploaded, I check lsof for its name. If there’s an open file handle under that name, then the script will continue to loop until it’s cleared. When the while loop breaks, the script moves the file.

There’s just one more thing to think about. When the script moves the file what happens to the directory fido is watching? Yes. Its modification date changes. In my example, process runs a second time but does nothing since nothing is there. Depending on your situation, you may need to make the script a little smarter.



Counting Downloads With Fido

I wanted to illustrate how to use fido with an example. Today we’re going to use it to count software downloads on this site. Exciting! This will be simple since we only have one data source. A few years ago, I move my software from an FTP repository onto this web server. To quantify software downloads, we can simply monitor the http access log.

Here’s our fido configuration for the log file:

/var/log/httpd/access_log {
 rules  = downloads.conf
 action = /usr/local/bin/tally
 log    = syslog
}

This tells fido to monitor the access_log in real time. Its pattern match rules are in a file called downloads.conf When fido finds a match, it will execute a program called tally. Finally, the last directive tells fido to use syslog to log its activity.

In order to understand what we’re looking for, you should take a look at the software repository. It contains multiple versions and helpful links to the latest releases and betas. We want to match them all.

Let’s take a look at our downloads.conf file. Since we didn’t specify a full path to the file, fido knows to look for it under $sysconfdir/etc/fido/rules. If you configured it to use /etc, then the rules are found in /etc/fido/rules/downloads.conf. Here’s the file:

#
# Track and count downloads from the website
SIEGE:  .*siege-.*tar.gz.*
FIDO:   .*fido-.*([rpm]|[tar.gz]).*
WACKY:  .*wackyd-.*tar.gz.*
DICK:   .*dick.*tar.gz.*
SPROXY: .*sproxy-.*tar.gz.*
CONFIG: .*JoeDog-Config.*
STATS:  .*JoeDog-Stats.*
GETOPT: .*php-getopt.*
WACKY:  .*JoeDog-Wacky.*
PBAR:   .*JoeDog-ProgressBar.*

Each line begins with an optional label. If a label is present, fido will pass it (minus the colon) to the action program. In the example above, if the JoeDog-Config perl module is downloaded, then fido will run /usr/local/bin/tally CONFIG. For more on labels, see the fido user’s manual.

Continue reading Counting Downloads With Fido