Fork me on Github
Fork me on Github

Joe Dog Software

Proudly serving the Internets since 1999

up arrow New Features For Fido

There’s rumors on the Internets. “Oh, really? What do they say?” They claim your posting frequency affects Google’s crawling frequency. The more you update, the more often it crawls.

Your JoeDog wanted to scrutinize this hypothesis. He wanted to monitor his logs and compare Google’s crawl rate against his web site updates.

So he added this rule to fido.conf

/var/log/httpd/joedog-access_log {
 rules = Googlebot
 action = /home/jeff/bin/googler
}

With that config he produced this document: google.txt

“J!!!!!!!!” Funk shouted in IRC. “You need to verify Googlebot”

“You mean a lot people forge that User-agent?”

“I’ll bet forty percent is forged.”

“D’oh!”

Fido can’t validate an IP address, nor do I want it to. Still, it needs a new feature, namely the ability to interact with its action script. Your JoeDog will add support for regex capture to his rules. This will allow you to capture part of the match and send that text to your action script.

“I have no idea what you just said.”

Okay, let’s modify the rule above with its intended implementation:

/var/log/httpd/joedog-access_log {
 rules = ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*Googlebot
 action = /home/jeff/bin/googler
}

The parentheses are highlighted in red because they represent the proposed feature.

On a match, fido will capture everything inside the parentheses and send it as argument 1 to the googler script. In this case it will be an IP address which googler can then verify as one that belongs to Google. If you have multiple captures, fido will send them as space separated arguments, i.e.,  /home/jeff/bin/googler  173.240.11.11 GET /index.php

For a sh script, $1 = 173.240.11.11, $2 = GET and $3 = /index.php

If all I wanted to do was solve this problem, I’d just write a script that parses the logs. However, I think this will be a valuable feature that will allow fido more flexibility to solve all the world’s problems.