The Massive Scale of AWS

Here’s an interesting peek behind the scenes at Amazon Web Services:

Scale is perhaps the most important thing, and no one needs to teach an online retailer like Amazon anything about that. With Amazon, there is very little talk of public cloud, and that is because Amazon believes that, by its nature, cloud means it cannot be private. Over the long haul, Amazon believes the massive scale of the public cloud will mean that very few organizations will run their own datacenters.

Interesting throughout.  (H/T to Tim for bringing it to my attention)

[Enterprise Tech: A Rare Peek Into The Massive Scale of AWS]


Posted in Uncategorized | Leave a comment

Fido 1.1.4 And Google’s Last Crawl

Your JoeDog can now announce the release of fido-1.1.4. To illustrate its awesome new feature we’re going to do an exercise.

“Wait. Homework? This blog sucks!”

Yeah, but this homework but this is useful. We’re going to capture the time of Google’s last crawl so you can be a hero with the SEO nerds in your company.

“Okay, fine!”

We can identify the googlebot by its User-agent. Google conveniently refers to itself as Googlebot. Here’s an entry in Your JoeDog’s logs: - - [23/Nov/2014:06:16:24 -0500] "GET /blog/ HTTP/1.1" 
   200 57787 "-" "Googlebot/2.X (+"

Unfortunately, anybody can masquerade as a googlebot. The only way we can be certain this agent is authentic is to check its IP address.

Pom $ dig -x

Wait a second! That’s not Google, that’s a fraud. Let’s check another entry:

Pom $ dig -x

Okay, that’s Google. So in order to validate and record the time of Google’s last crawl, we have to check the IP address. How do we achieve this?

We’ll use fido to check our logs for instances of Googlebot but it can’t validate the IP address. Our action program can do that but how do we pass it the address?  Prior to fido-1.1.4, that would have been impossible. Starting with 1.1.4 it can now do regex capture and pass those variables to the action program.

To set this up, you’ll need to a file block in fido.conf which points to your access_log.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1

When fido locates a match, it will capture everything inside the parentheses and send that to the googler script as $1. Here’s the googler script:

use Socket;
use strict;
use vars qw($LOCK_EX $LOCK_UN);
$LOCK_EX = 2;
$LOCK_UN = 8;
my $addr = $ARGV[0];
my $host = gethostbyaddr(inet_aton($addr), AF_INET);
if ($host !~ /\.googlebot\.com$/) {
 print "ERROR: Forged User-agent ($host)\n";
my $file = "/path/to/";
open (FILE, ">>$file") or die "Unable to open file: $file\n";
flock(FILE, $LOCK_EX);
print FILE timestamp()." | $addr\n";
flock(FILE, $LOCK_UN);
# returns a string in the following format:
sub timestamp() {
 my $now = time;
 my @date = localtime $now;
 $date[5] += 1900;
 $date[4] += 1;
 my $stamp = sprintf(
 "%02d/%02d/%04d %02d:%02d:%02d",
 $date[4],$date[3],$date[5], $date[2], $date[1], $date[0]
 $stamp .= " | ";
 $stamp .= sprintf(
 $date[5],$date[4],$date[3], $date[2], $date[1], $date[0]
 return $stamp;
sub empty { ! defined $_[0] || ! length $_[0] }

As you can imagine, there’s lots of creative things you can do with the googler script. Your JoeDog hopes to compare crawl frequencies against the site’s freshness to see if there’s a correlation.

[JoeDog: Last Google Crawl]

UPDATE:  As Tim notes in the comments below, the Internets are full of jerks.

Consider this hostname:

The regex has been changed so that the fully qualified hostname must end in .com




Posted in Applications, Fido | 1 Comment

Fido Now Supports Regex Capture and Back References

It’s a good thing Your JoeDog is not very bright. If he were an intelligent man, then he’d realize the difficulty of the task he was about to undertake before he undertook it. He would look at that task and think, “I’m not good enough to code that.” Fortunately, he’s not very bright and that means fido has a new feature.

New feature? Exciting!

Beginning with version 1.1.4, fido can do regex capture and pass the back references to the program it should run. Let’s look at an example config to help us make some sense of this. The new feature is highlighted in red.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1

In this exercise, we want to find instances of GoogleBot in the access log, but we also want to verify that it’s actually the GoogleBot and not a crawler with a forged User-agent. To accomplish that, we want to capture the IP address and send it to our program for validation.

So let’s look at the rule:


If a line being with an quartet of dot separated numbers, i.e., an IP address, and it contains GoogleBot, then we have a match. Notice that IP address is wrapped in parentheses? That’s our capture. Everything within the parens will be assigned to $1. (BTW: Writing that was a royal PITA.) If we had two sets of parens, then the second set would be assigned to $2.

Fido will assign variables by name so it doesn’t matter which order you pass them to your program.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*(GoogleBot)
 action = /home/jeff/bin/googler $2 $1

Will run: /home/jeff/bin/googler GoogleBot

I’ll release version 1.1.4 after the documentation is up-to-date. In the meantime, early adapters can grab the code from the source repository.

You’ll need autotools on your system. Inside the fido directory, run this command to build the configure script:


Happy hacking.


Posted in Applications, Fido | Leave a comment

Ted Cruz: Stupid or Evil?

Since this is a nerd blog, Your JoeDog tries to avoid politics. Afterall, who gives a shit what he thinks, amirite? As you may have guessed, that’s a segue. You know what follows that segue, don’tcha? That’s right: politics.

Today we will discuss a particular senator from the great state of Texas. Your nerdblogger has never been to Texas so he’s not sure if it’s great. Texans tell him, “Don’t Mess With Texas” so, um, “great state” it is!

Texas is represented by a particular senator whose mind droppings may interest the readers of this blog. Those mental farts provide useful insight into his character and the policies he’d like to enact. Unfortunately, they don’t tell us much about the senator himself but this much is certain: Ted Cruz is either the dumbest motherfscker on the Internets or a diabolical genius.

A couple days ago, Cruz dropped this loaf on the Internets:

When you regulate a public utility, it calcifies it — it freezes it in place. Let’s give a simple contrast. The Telecommunications Act of 1934 was adopted to regulate these [pulls out an old rotary phone]. To put regulations in place and what happened? It froze everything in place. This (rotary phone) is regulated by Title II. [displays an iPhone] This is not.

That’s right, Senator Dingleberry claims the Communications Act of 1934 kept land lines in rotary stagnation while smart phones have thrived, free of government’s heavy hand. Funny thing. Smart phones are regulated under the same jurisdiction as rotary phones. Ever hear of the FCC? It was created by the — wait for it — Communications Act of 1934 and it has the power to regulate cell phone providers. As a sitting senator, he probably knows that. Afterall, he’s on the subcommittee of Communications, Technology, and the God Damn Internet — no, really, that’s the subcommittee’s actual name … we’re told, anyway.

Net Neutrality has nothing to do with changing the Internets. It’s about keeping them the same — exactly the same. It’s about ensuring ISPs can’t extract fees from content providers, throttle competitors or marginalize small enterprises like this little nerdblog. It’s about treating the Internets as a public utility. It’s about ensuring all content providers operate on an even playing field — may the best ones win.

Your JoeDog is told Ted Cruz is smart (he’s not sure if that’s true, but that’s what he’s told.) If Senator Smarty Pants is indeed an intelligent man, then he’s lying to you in that video. He’s intentionally distorting reality in order to promote a political agenda. He’s creating “Death Panels” for the Internets in order to appease his political benefactors. And that, my nerd blog friends, is worse than being stupid.

Posted in Community | Leave a comment

Ted Cruz (R-Retard) Gets Splained By The Oatmeal

Ted Cruz recently compared Net Neutrality with Obamacare. They’re exactly alike if by “exactly alike” you mean “have nothing in common.” The Oatmeal tries to explain why the analogy is flawed. That’s probably an impossible task. You can’t make a senator understand a concept if his financial contributions depend on his ignorance. Anyway, enjoy the strip….

[The Oatmeal: Dear Senator Ted Cruz]

UPDATE: Gizmodo weighs in 

UPDATE: Pr0n stars weigh in

Posted in Community | Leave a comment

AI Reporting

Your JoeDog is a Big Fan of artificial intelligence. His pinochle game represents one foray into the field. The computer bids based on its results experience. Your JoeDog can’t predict how it will bid a particular hand. It looks for experiences that resemble its current hand and it bids accordingly. Unfortunately, it still plays programmatically. As such, it can never be better than this nerd-blogger.

CBS Sports and Yahoo are doing interesting things with AI. Their fantasy football sites use artificial intelligence to summarize millions of games each week. The software analyzes lots of data and composes articles much like a human reporter. They only fail the Turing Test due to a contemplation of scale: A rational person soon realizes there aren’t enough humans on earth to produce that many articles by Tuesday morning.

There’s a more personal reason why Your JoeDog likes these cyber reporters: they think highly of his coaching skill:

Tonzie Crushers benefited from smart coaching this week. Coach Fulmer left Chris Johnson and Justin Hunter on the bench in favor of Frank Gore and Robert Woods, who were both expected to score less.

These great decisions boosted Tonzie Crushers’ final score by 22.1 points, which just made the final result that much more embarrassing. Putting Gore in the staring lineup also gained more points than any other coaching move this week, making it the Volkswagen Start of the Week.

Well this week. Last week they thought he was a moran


Posted in Programming | Leave a comment


A physicist, engineer and a programmer were driving down a mountain pass when the brakes failed. The car started to accelerate and they were soon screaming into the valley. Hanging on for dear life, they smacked the guard rails several times. Fortunately, they came across and escape lane and they were able to navigate up the hill to a stop.

The physicist said, “We need to model temperatures resulting from friction to determine why the brakes failed.”

The engineer said, “I have a case of temperature sensors in the trunk.”

The programmer said, “Let’s not get ahead of ourselves. We need to get the car back up the mountain and see if the failure is reproducible.”



Posted in On The Job | Leave a comment

So Which Is It?

A bug or a feature?



Posted in Uncategorized | Leave a comment

My Dilbert Moment

This morning Your JoeDog received a form. Exciting! … wait a second. That’s not exciting. That’s more work!


He had to fill it out and deliver it within Large Corporate Bureaucracy. There were two different delivery options:

  1. Interoffice messenger
  2. Fax machine (they still exist for some reason)

The fax option contained these special instructions:

If sending via fax, do not send original. Retain a copy of the completed form for your records.


Posted in On The Job | Leave a comment

Please Don’t Use Comments To Alter Functionality

“Holy shit!” Your JoeDog exclaimed.

“Why do you swear so much?” an emailer emailed this blog. “Young readers don’t need to be exposed to that.” Listen, if your kid is reading this site, then maybe it’s time to buy him a football. By the time he’s old enough to care about these topics, he’s already heard a lot of vulgar language….

“Holy shit!” Your JoeDog exclaimed. “That’s a code salad!”

Our enterprise backup guy is just like your enterprise backup guy. He’s involved with every system, every project and every meeting yet all he does is put ones and zeros on tape. Generally he calls your attention to meaningless minutia but once a decade you learn of something important. Yesterday was once a decade. Backup informed Your JoeDog that the NetBackup client wasn’t installed on a new server.

“That seems unlikely,” Your JoeDog said. “Puppet puts it on every server.” Puppet is our configuration management server. It installs software and writes configurations to every server in the enterprise.

“That’s what I thought,” Backup said. “But it’s not there.”

To prove that Puppet puts it on every server, Your JoeDog showed him the code. We’ll examine that code after the jump

Continue reading “Please Don’t Use Comments To Alter Functionality” »

Posted in On The Job | 2 Comments

Recent Comments

  • Jeff Fulmer: Ten isn’t a large number of concurrent users so you’re probably not opening more connections...
  • Snooops: Hey Guys, im running siege 3.0.9 with: siege -c10 -r once -f urls2.txt -b i get a lot of: [alert] socket:...
  • Oleg: Hm, i have the same problem as “Gokul Muralidharan says: July 27, 2012 at 1:00 pm” He says “I...
  • Jeff: Oleg, Because you told it to run just one repetition: -r1 I think what you’re looking for is...
  • Oleg: I think, i have found one bug. The command siege -d10 -r1 -c25 -f url.txt hits ONLY FIRST url. Why?