Fido 1.1.4 And Google’s Last Crawl

Your JoeDog can now announce the release of fido-1.1.4. To illustrate its awesome new feature we’re going to do an exercise.

“Wait. Homework? This blog sucks!”

Yeah, but this homework but this is useful. We’re going to capture the time of Google’s last crawl so you can be a hero with the SEO nerds in your company.

“Okay, fine!”

We can identify the googlebot by its User-agent. Google conveniently refers to itself as Googlebot. Here’s an entry in Your JoeDog’s logs:

208.78.85.241 - - [23/Nov/2014:06:16:24 -0500] "GET /blog/ HTTP/1.1" 
   200 57787 "-" "Googlebot/2.X (+http://www.googlebot.com/bot.html)"

Unfortunately, anybody can masquerade as a googlebot. The only way we can be certain this agent is authentic is to check its IP address.

Pom $ dig -x 208.78.85.241
;; ANSWER SECTION:
241.85.78.208.in-addr.arpa. 1316 IN PTR host241.subnet-208-78-85.gigavenue.com.

Wait a second! That’s not Google, that’s a fraud. Let’s check another entry:

Pom $ dig -x 66.249.65.47
;; ANSWER SECTION:
47.65.249.66.in-addr.arpa. 11310 IN PTR crawl-66-249-65-47.googlebot.com.

Okay, that’s Google. So in order to validate and record the time of Google’s last crawl, we have to check the IP address. How do we achieve this?

We’ll use fido to check our logs for instances of Googlebot but it can’t validate the IP address. Our action program can do that but how do we pass it the address?  Prior to fido-1.1.4, that would have been impossible. Starting with 1.1.4 it can now do regex capture and pass those variables to the action program.

To set this up, you’ll need to a file block in fido.conf which points to your access_log.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

When fido locates a match, it will capture everything inside the parentheses and send that to the googler script as $1. Here’s the googler script:

#!/usr/bin/perl
use Socket;
use strict;
use vars qw($LOCK_EX $LOCK_UN);
$LOCK_EX = 2;
$LOCK_UN = 8;
my $addr = $ARGV[0];
my $host = gethostbyaddr(inet_aton($addr), AF_INET);
if ($host !~ /\.googlebot\.com$/) {
 print "ERROR: Forged User-agent ($host)\n";
 exit;
}
my $file = "/path/to/joedog.org/google.txt";
open (FILE, ">>$file") or die "Unable to open file: $file\n";
flock(FILE, $LOCK_EX);
print FILE timestamp()." | $addr\n";
flock(FILE, $LOCK_UN);
exit;
# returns a string in the following format:
# YYYYMMDDHHMMSS
sub timestamp() {
 my $now = time;
 my @date = localtime $now;
 $date[5] += 1900;
 $date[4] += 1;
 my $stamp = sprintf(
 "%02d/%02d/%04d %02d:%02d:%02d",
 $date[4],$date[3],$date[5], $date[2], $date[1], $date[0]
 );
 $stamp .= " | ";
 $stamp .= sprintf(
 "%04d%02d%02d%02d%02d",
 $date[5],$date[4],$date[3], $date[2], $date[1], $date[0]
 );
 return $stamp;
}
sub empty { ! defined $_[0] || ! length $_[0] }

As you can imagine, there’s lots of creative things you can do with the googler script. Your JoeDog hopes to compare crawl frequencies against the site’s freshness to see if there’s a correlation.

[JoeDog: Last Google Crawl]

UPDATE:  As Tim notes in the comments below, the Internets are full of jerks.

Consider this hostname: haha.googlebot.com.fooledyou.ru

The regex has been changed so that the fully qualified hostname must end in .com

 

 

 

Posted in Applications, Fido | 1 Comment



Fido Now Supports Regex Capture and Back References

It’s a good thing Your JoeDog is not very bright. If he were an intelligent man, then he’d realize the difficulty of the task he was about to undertake before he undertook it. He would look at that task and think, “I’m not good enough to code that.” Fortunately, he’s not very bright and that means fido has a new feature.

New feature? Exciting!

Beginning with version 1.1.4, fido can do regex capture and pass the back references to the program it should run. Let’s look at an example config to help us make some sense of this. The new feature is highlighted in red.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

In this exercise, we want to find instances of GoogleBot in the access log, but we also want to verify that it’s actually the GoogleBot and not a crawler with a forged User-agent. To accomplish that, we want to capture the IP address and send it to our program for validation.

So let’s look at the rule:

^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot

If a line being with an quartet of dot separated numbers, i.e., an IP address, and it contains GoogleBot, then we have a match. Notice that IP address is wrapped in parentheses? That’s our capture. Everything within the parens will be assigned to $1. (BTW: Writing that was a royal PITA.) If we had two sets of parens, then the second set would be assigned to $2.

Fido will assign variables by name so it doesn’t matter which order you pass them to your program.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*(GoogleBot)
 action = /home/jeff/bin/googler $2 $1
}

Will run: /home/jeff/bin/googler GoogleBot 198.14.14.6

I’ll release version 1.1.4 after the documentation is up-to-date. In the meantime, early adapters can grab the code from the source repository.

You’ll need autotools on your system. Inside the fido directory, run this command to build the configure script:

utils/bootstrap

Happy hacking.

 

Posted in Applications, Fido | Leave a comment



Ted Cruz: Stupid or Evil?

Since this is a nerd blog, Your JoeDog tries to avoid politics. Afterall, who gives a shit what he thinks, amirite? As you may have guessed, that’s a segue. You know what follows that segue, don’tcha? That’s right: politics.

Today we will discuss a particular senator from the great state of Texas. Your nerdblogger has never been to Texas so he’s not sure if it’s great. Texans tell him, “Don’t Mess With Texas” so, um, “great state” it is!

Texas is represented by a particular senator whose mind droppings may interest the readers of this blog. Those mental farts provide useful insight into his character and the policies he’d like to enact. Unfortunately, they don’t tell us much about the senator himself but this much is certain: Ted Cruz is either the dumbest motherfscker on the Internets or a diabolical genius.

A couple days ago, Cruz dropped this loaf on the Internets:

When you regulate a public utility, it calcifies it — it freezes it in place. Let’s give a simple contrast. The Telecommunications Act of 1934 was adopted to regulate these [pulls out an old rotary phone]. To put regulations in place and what happened? It froze everything in place. This (rotary phone) is regulated by Title II. [displays an iPhone] This is not.

That’s right, Senator Dingleberry claims the Communications Act of 1934 kept land lines in rotary stagnation while smart phones have thrived, free of government’s heavy hand. Funny thing. Smart phones are regulated under the same jurisdiction as rotary phones. Ever hear of the FCC? It was created by the — wait for it — Communications Act of 1934 and it has the power to regulate cell phone providers. As a sitting senator, he probably knows that. Afterall, he’s on the subcommittee of Communications, Technology, and the God Damn Internet — no, really, that’s the subcommittee’s actual name … we’re told, anyway.

Net Neutrality has nothing to do with changing the Internets. It’s about keeping them the same — exactly the same. It’s about ensuring ISPs can’t extract fees from content providers, throttle competitors or marginalize small enterprises like this little nerdblog. It’s about treating the Internets as a public utility. It’s about ensuring all content providers operate on an even playing field — may the best ones win.

Your JoeDog is told Ted Cruz is smart (he’s not sure if that’s true, but that’s what he’s told.) If Senator Smarty Pants is indeed an intelligent man, then he’s lying to you in that video. He’s intentionally distorting reality in order to promote a political agenda. He’s creating “Death Panels” for the Internets in order to appease his political benefactors. And that, my nerd blog friends, is worse than being stupid.

Posted in Community | Leave a comment



Ted Cruz (R-Retard) Gets Splained By The Oatmeal

Ted Cruz recently compared Net Neutrality with Obamacare. They’re exactly alike if by “exactly alike” you mean “have nothing in common.” The Oatmeal tries to explain why the analogy is flawed. That’s probably an impossible task. You can’t make a senator understand a concept if his financial contributions depend on his ignorance. Anyway, enjoy the strip….

[The Oatmeal: Dear Senator Ted Cruz]

UPDATE: Gizmodo weighs in 

UPDATE: Pr0n stars weigh in

Posted in Community | Leave a comment



AI Reporting

Your JoeDog is a Big Fan of artificial intelligence. His pinochle game represents one foray into the field. The computer bids based on its results experience. Your JoeDog can’t predict how it will bid a particular hand. It looks for experiences that resemble its current hand and it bids accordingly. Unfortunately, it still plays programmatically. As such, it can never be better than this nerd-blogger.

CBS Sports and Yahoo are doing interesting things with AI. Their fantasy football sites use artificial intelligence to summarize millions of games each week. The software analyzes lots of data and composes articles much like a human reporter. They only fail the Turing Test due to a contemplation of scale: A rational person soon realizes there aren’t enough humans on earth to produce that many articles by Tuesday morning.

There’s a more personal reason why Your JoeDog likes these cyber reporters: they think highly of his coaching skill:

Tonzie Crushers benefited from smart coaching this week. Coach Fulmer left Chris Johnson and Justin Hunter on the bench in favor of Frank Gore and Robert Woods, who were both expected to score less.

These great decisions boosted Tonzie Crushers’ final score by 22.1 points, which just made the final result that much more embarrassing. Putting Gore in the staring lineup also gained more points than any other coaching move this week, making it the Volkswagen Start of the Week.

Well this week. Last week they thought he was a moran

 

Posted in Programming | Leave a comment



Programmers….

A physicist, engineer and a programmer were driving down a mountain pass when the brakes failed. The car started to accelerate and they were soon screaming into the valley. Hanging on for dear life, they smacked the guard rails several times. Fortunately, they came across and escape lane and they were able to navigate up the hill to a stop.

The physicist said, “We need to model temperatures resulting from friction to determine why the brakes failed.”

The engineer said, “I have a case of temperature sensors in the trunk.”

The programmer said, “Let’s not get ahead of ourselves. We need to get the car back up the mountain and see if the failure is reproducible.”

 

 

Posted in On The Job | Leave a comment



So Which Is It?

A bug or a feature?

 

 

Posted in Uncategorized | Leave a comment



My Dilbert Moment

This morning Your JoeDog received a form. Exciting! … wait a second. That’s not exciting. That’s more work!

Indeed.

He had to fill it out and deliver it within Large Corporate Bureaucracy. There were two different delivery options:

  1. Interoffice messenger
  2. Fax machine (they still exist for some reason)

The fax option contained these special instructions:

If sending via fax, do not send original. Retain a copy of the completed form for your records.

dilbert

Posted in On The Job | Leave a comment



Please Don’t Use Comments To Alter Functionality

“Holy shit!” Your JoeDog exclaimed.

“Why do you swear so much?” an emailer emailed this blog. “Young readers don’t need to be exposed to that.” Listen, if your kid is reading this site, then maybe it’s time to buy him a football. By the time he’s old enough to care about these topics, he’s already heard a lot of vulgar language….

“Holy shit!” Your JoeDog exclaimed. “That’s a code salad!”

Our enterprise backup guy is just like your enterprise backup guy. He’s involved with every system, every project and every meeting yet all he does is put ones and zeros on tape. Generally he calls your attention to meaningless minutia but once a decade you learn of something important. Yesterday was once a decade. Backup informed Your JoeDog that the NetBackup client wasn’t installed on a new server.

“That seems unlikely,” Your JoeDog said. “Puppet puts it on every server.” Puppet is our configuration management server. It installs software and writes configurations to every server in the enterprise.

“That’s what I thought,” Backup said. “But it’s not there.”

To prove that Puppet puts it on every server, Your JoeDog showed him the code. We’ll examine that code after the jump

Continue reading “Please Don’t Use Comments To Alter Functionality” »

Posted in On The Job | 2 Comments



Baby Cow!

baby-cowMeet the newest member of the JoeDog family. This is Baby Cow — with her markings she looks like a little tiny baby cow.

She was abused by a Mennonite farmer who tried to breed her. When that failed he tied her to a pole along a highway with a sign that read “Free bulldog.” By chance, a member of the Long Island Bulldog Rescue happened to see her. She stopped and called the state agency which enforces puppy mill laws. The farmer was fined.

That night Your JoeDog had three beers at a local brewery. Mrs. JoeDog saw an announcement on the Long Island Bulldog Rescue’s Facebook page. They needed someone to foster this dog. Your JoeDog reluctantly agreed because … well, did he mention three beers?

When Baby Cow arrived she was in sorry shape. Her eyes were cloudy and her rear legs were both injured. At first the vet suspected glaucoma but it turns out they were irritated by her lashes. Baby Cow’s legs were another story. She had two torn ACLs, probably the result of standing long hours on top of chicken wire. Puppy Mill breeders frequently stack their dogs in chicken wire crates. You don’t want to be the bottom dog. That one gets peed and pooped on by the dogs above.

She’s already had one operation to fix her eyes. Aren’t they beautiful? She still needs two more to fix her rear legs. LIBR promised to pay for those operations in January. They are promised grant money from a large national pet store chain. (Your JoeDog is unsure if he can mention the name so he’ll keep it to himself for now.) If the grant falls through, he’ll try to raise it himself.

Your JoeDog may have reluctantly agreed to take her, but he’s not letting her go anywhere now. As Mrs JoeDog says, “That’s Your Baby Cow.”

Posted in Community | Leave a comment



Recent Comments

  • Tim: For those who enjoy playing at home and are extra OCD … they’ll spot something wrong with this....
  • roshni: Hi jeff, I need your help regarding running urls in a file containing post directives. Could you please send...
  • Alle: In seige, what does the pink result mean?
  • Windows User: Nice collection of Perl modules. Thanks for sharing.
  • Jeff Fulmer: No idea. What do you see in the webserver’s logs?