CTR Is Hard

Sproxy is a word Your JoeDog invented to describe his [S]iege [Proxy]. At the time of this writing, this site has the top three positions for ‘sproxy’ on Google. In the past week, nine hundred people typed ‘sproxy’ into the Google machine. Of those nine hundred, only 110 clicked a link to this site. That’s a 12.22% click-through rate for a made-up word that describes an esoteric piece of software that exists right on this very site. Let’s just say that falls a little below expectation….

 

 

 

Posted in Applications, Siege, Technology | Leave a comment



Fido 1.1.5 SIGHUP and Reload

Good morning, JoeDoggers. Let’s bask in the glow of Your Fido this morning; he’s all grown up and ready for love. What does that mean? Well, it means it now behaves like a contemporary modern daemon. Starting with version 1.1.5, if you send it SIGHUP, it will reload its configuration file.

Really? It’s been out since 2011 and you’re only adding that feature now?

Hey, what do you want from me? It’s free, isn’t it?

Here’s how it works: if you change fido’s configuration file, you can send it SIGHUP to reload its key = value pairs. There’s just one thing it won’t reload: its filenames.

Remember, a fido configuration file is divided into two parts; it contains global settings and file settings. The file settings are distinguished by a filename followed by two brackets like this: {}. Here’s an example:

/usr/local/var/my.log {
 # key = value pairs go here.
}

So if you change /usr/local/var/my.log to anything else, you’ll have to restart fido. If you change any other values, then you can just send it SIGHUP.

So how do I send it SIGHUP?

There’s several ways of doing this.

1.) You can look for the process ID (PID) with the ps command and send it SIGHUP (which is signal number 1):

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
# kill -1 31952

2.) Check your system documentation. Some kill commands support name values such as this:

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
Pom # kill -HUP 31952

3.) We can eliminate the ps command by using fido’s pid file like this:

# kill -1 $(cat /var/run/fido.pid)

You can verify a successful config reload by looking at /var/log/messages.

 [Download: Fido]

 

Posted in Applications, Fido, Release | Leave a comment



Fido 1.1.4 And Google’s Last Crawl

Your JoeDog can now announce the release of fido-1.1.4. To illustrate its awesome new feature we’re going to do an exercise.

“Wait. Homework? This blog sucks!”

Yeah, but this homework but this is useful. We’re going to capture the time of Google’s last crawl so you can be a hero with the SEO nerds in your company.

“Okay, fine!”

We can identify the googlebot by its User-agent. Google conveniently refers to itself as Googlebot. Here’s an entry in Your JoeDog’s logs:

208.78.85.241 - - [23/Nov/2014:06:16:24 -0500] "GET /blog/ HTTP/1.1" 
   200 57787 "-" "Googlebot/2.X (+http://www.googlebot.com/bot.html)"

Unfortunately, anybody can masquerade as a googlebot. The only way we can be certain this agent is authentic is to check its IP address.

Pom $ dig -x 208.78.85.241
;; ANSWER SECTION:
241.85.78.208.in-addr.arpa. 1316 IN PTR host241.subnet-208-78-85.gigavenue.com.

Wait a second! That’s not Google, that’s a fraud. Let’s check another entry:

Pom $ dig -x 66.249.65.47
;; ANSWER SECTION:
47.65.249.66.in-addr.arpa. 11310 IN PTR crawl-66-249-65-47.googlebot.com.

Okay, that’s Google. So in order to validate and record the time of Google’s last crawl, we have to check the IP address. How do we achieve this?

We’ll use fido to check our logs for instances of Googlebot but it can’t validate the IP address. Our action program can do that but how do we pass it the address?  Prior to fido-1.1.4, that would have been impossible. Starting with 1.1.4 it can now do regex capture and pass those variables to the action program.

To set this up, you’ll need to a file block in fido.conf which points to your access_log.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

When fido locates a match, it will capture everything inside the parentheses and send that to the googler script as $1. Here’s the googler script:

#!/usr/bin/perl
use Socket;
use strict;
use vars qw($LOCK_EX $LOCK_UN);
$LOCK_EX = 2;
$LOCK_UN = 8;
my $addr = $ARGV[0];
my $host = gethostbyaddr(inet_aton($addr), AF_INET);
if ($host !~ /\.googlebot\.com$/) {
 print "ERROR: Forged User-agent ($host)\n";
 exit;
}
my $file = "/path/to/joedog.org/google.txt";
open (FILE, ">>$file") or die "Unable to open file: $file\n";
flock(FILE, $LOCK_EX);
print FILE timestamp()." | $addr\n";
flock(FILE, $LOCK_UN);
exit;
# returns a string in the following format:
# YYYYMMDDHHMMSS
sub timestamp() {
 my $now = time;
 my @date = localtime $now;
 $date[5] += 1900;
 $date[4] += 1;
 my $stamp = sprintf(
 "%02d/%02d/%04d %02d:%02d:%02d",
 $date[4],$date[3],$date[5], $date[2], $date[1], $date[0]
 );
 $stamp .= " | ";
 $stamp .= sprintf(
 "%04d%02d%02d%02d%02d",
 $date[5],$date[4],$date[3], $date[2], $date[1], $date[0]
 );
 return $stamp;
}
sub empty { ! defined $_[0] || ! length $_[0] }

As you can imagine, there’s lots of creative things you can do with the googler script. Your JoeDog hopes to compare crawl frequencies against the site’s freshness to see if there’s a correlation.

[JoeDog: Last Google Crawl]

UPDATE:  As Tim notes in the comments below, the Internets are full of jerks.

Consider this hostname: haha.googlebot.com.fooledyou.ru

The regex has been changed so that the fully qualified hostname must end in .com

 

 

 

Posted in Applications, Fido | 1 Comment



Fido Now Supports Regex Capture and Back References

It’s a good thing Your JoeDog is not very bright. If he were an intelligent man, then he’d realize the difficulty of the task he was about to undertake before he undertook it. He would look at that task and think, “I’m not good enough to code that.” Fortunately, he’s not very bright and that means fido has a new feature.

New feature? Exciting!

Beginning with version 1.1.4, fido can do regex capture and pass the back references to the program it should run. Let’s look at an example config to help us make some sense of this. The new feature is highlighted in red.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

In this exercise, we want to find instances of GoogleBot in the access log, but we also want to verify that it’s actually the GoogleBot and not a crawler with a forged User-agent. To accomplish that, we want to capture the IP address and send it to our program for validation.

So let’s look at the rule:

^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot

If a line being with an quartet of dot separated numbers, i.e., an IP address, and it contains GoogleBot, then we have a match. Notice that IP address is wrapped in parentheses? That’s our capture. Everything within the parens will be assigned to $1. (BTW: Writing that was a royal PITA.) If we had two sets of parens, then the second set would be assigned to $2.

Fido will assign variables by name so it doesn’t matter which order you pass them to your program.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*(GoogleBot)
 action = /home/jeff/bin/googler $2 $1
}

Will run: /home/jeff/bin/googler GoogleBot 198.14.14.6

I’ll release version 1.1.4 after the documentation is up-to-date. In the meantime, early adapters can grab the code from the source repository.

You’ll need autotools on your system. Inside the fido directory, run this command to build the configure script:

utils/bootstrap

Happy hacking.

 

Posted in Applications, Fido | Leave a comment



Referer Spammers

The Internets are full of spam. Maybe you’ve noticed?

It’s in your inbox, in your comments and scattered throughout your web forums. Every spammer is a bag of dicks but the worst bottom feeder on the Internets is the referer spammer.

If you’ve never administered a website, then you’ve probably never heard of referer spam. Yeah, what is that?  Glad you asked. These dregs send requests to your web site with a fabricated referer that points to a site they want to advertise. Ideally, they’ll send requests to a site that publishes its traffic reports. When their URL makes the report, they get a free link back to their site.

Sites that publish their usage reports are easy to find. Put this in the Google Machine and see what pops up: “Top * Total Search Strings” This is what we’re looking for: Usage Stats: Top Referers.  Your JoeDog can get himself on that report by doing this:

Bully $ siege -H "Referer: http://www.joedog.org/" -g http://www.pickart.at/
HEAD / HTTP/1.0
Host: www.pickart.at
Accept: */*
User-Agent: Mozilla/5.0 (unknown-x86_64-linux-gnu) Siege/3.0.8
Referer: http://www.joedog.org/
Connection: close
HTTP/1.1 200 OK
Date: Fri, 03 Oct 2014 17:53:38 GMT
Server: Apache
Connection: close
Content-Type: text/html

Now if he’s really intent on making that report, he’ll repeat that request a few hundred times and place himself at number two on the chart. But here’s the thing: Referer Spammers will spam your logs even if you don’t publish your reports. They’ll go to all that trouble just to lure webmasters to their esoteric fetish sites.

So what can you do to prevent this stuff? Mostly you can decrease their incentive.

  1. Put your usage stats inside a password protected area
  2. Add a robots.txt with a bot exclusion rule so search engines don’t index it.
  3. Add a nofollow directive inside every link, again so engines don’t index them

I guarantee you’ll still get the stuff. They’ll send faked referrals just to capture the attention of the site’s administrators but at least you won’t award them with a boost to their Page Rank.

NOTE: Yes, Your JoeDog spelled Referrer with only two r’s. Most humans use three. Phillip Hallam-Baker is not most humans. He was the first guy to miss an ‘r’ in the original HTTP specification. I say, “first guy” because hundreds of eyeballs viewed that document and none of them noticed the misspelling. By the time it became RFC1945, “Referer” was set in stone. It would have been easier to change the world’s English-language dictionaries at that point….

Posted in Apache, Applications, Security | Leave a comment



So Are You Vulnerable To Shell-shock?

Here’s a quick command line test to see if you’re vulnerable to shell-shock, the bash vulnerability that everyone — I mean everyone — is talking about:

$ env x='() { :;}; echo 1. env' bash -c "echo 2. bash"

If your bash is vulnerable, it will execute the echo command inside the environment, if it’s not vulnerable, then it will only execute the stuff after -c

A vulnerable system prints this:

$ env x='() { :;}; echo 1. env' bash -c "echo 2. bash"
1. env
2. bash

A non-vulnerable system prints this:

$ env x='() { :;}; echo 1. env' bash -c "echo 2. bash"
2. bash

On the vulnerable system, the echo command that is set in the environment is executed by bash when the shell is invoked:

env x='() { :;}; echo 1. env' bash -c "echo 2. bash"

The stuff in red should NOT be executed. That’s a bug; it needs to be fixed.

NOTE: The second command was run on the server that hosts this blog entry. You guys can quit trying, mmmkay?

 

Posted in Applications, Security, sh | Leave a comment



Rear Recovery Onto Different Hardware

Your JoeDog still likes rear.

He uses it for bare metal recovery and system cloning. Recently he had to clone one server onto older hardware as part of a disaster recovery exercise. It was problematic.

Problem one: The rear recovery disk could not connect to the network.

This system had bonded NICs and Your JoeDog started to suspect they were causing an issue. When the recovery disk booted, he brought down all the network interfaces and tried to assign a new address to the server. The routing table looked fine. The eth0 config looked fine, but the network was unreachable.

Acting on a hunch that bonded NICs were giving him fits, Your JoeDog did a recursive grep of the rear directory …

… wait a minute, what’s a recursive grep?
You can do it like this:

$ find /usr/share/rear -print | xargs egrep -i bond

Cool, thanks …

Anyway, as a result of that search, he found this feature: SIMPLIFY_BONDING With a little more digging, he discovered that it takes ‘y’ or ‘n’ so Your JoeDog set it to y and re-archived the server. He added that directive to local.conf

SIMPLIFY_BONDING=y

When the server booted from the new recovery disk, the only network interface was eth0. Your JoeDog reset that address with ifconfig and he was able to clone the server from his rear archive. SUCCESS!!!!

Problem two: No success! After the rear recovery, the kernel panic’d and the server wouldn’t boot. Unhappy sad time. 

Your JoeDog was all, “Hmmm I’ll bet I need to rebuild the kernel for new hardware….”

So he restored again from rear. This time, when the recovery was complete, he chroot’d the mount point and rebuilt the kernel.

… wait a minute! How do you do that?
Glad you asked. Here’s my command history:

$ chroot /mnt/local
$ export PATH=/sbin:/bin:/usr/sbin:/usr/bin
$ cd /boot
$ mkinitrd -f -v initrd-2.6.32-431.20.3.el6.x86_64kdump.img \
                 2.6.32-431.20.3.el6.x86_64

NOTE: Whatever you call the kernel, i.e., whatever you use for the second argument of mkinitrd, make sure you have a directory by the same name in /lib/modules, i.e., /lib/modules/2.6.32-431.20.3.el6.x86_64

DOUBLE NOTE: Once you’re inside /boot, do an ls to find available kernel images. They’ll begin with initrd- and end in .img

Now get yourself some rear.

 

Posted in Applications, Rear | Leave a comment



Shellshocked

Wired provides an interesting angle on the bash shell bug that has all your panties in a bind

[Brian] Fox drove those tapes to California and went back to work on Bash, other engineers started using the software and even helped build it. And as UNIX gave rise to GNU and Linux—the OS that drives so much of the modern internet—Bash found its way onto tens of thousands of machines. But somewhere along the way, in about 1992, one engineer typed a bug into the code. Last week, more then twenty years later, security researchers finally noticed this flaw in Fox’s ancient program. They called it Shellshock, and they warned it could allow hackers to wreak havoc on the modern internet.

[Wired: The Internet Is Broken]

 

Posted in Applications, Programming, sh | Leave a comment



Is Hardware Outpacing Software Or Is It The Other Way Around?

Here’s an interesting experiment.

After hearing two strong players argue that the only real progress in chess engines in the last ten years was due to faster computers a special match was played to challenge this idea. Komodo 8 ran on a smartphone while a top engine of 2006 used a modern i7 computer that runs 50 times faster. This is the difference between Usain Bolt and the Concorde. Guess what happened?

 

 

Posted in Applications | Leave a comment



Fido 1.1.3

Your JoeDog had a requirements change. “Stupid requirements!” He had to ensure each file in a directory and all its sub-directories was less than eight days old. Unfortunately, Your Fido didn’t traverse directory trees. He stood watch only at the top of the tree.

That’s the problem with dogs: they have a mind of their own.

Without much effort, fido learned a new trick. It now recursively searches a directory for files. To leverage this feature, you’ll have to give it a command. “Recurse, boy, recurse!”

/export {
 rules = exceeds 7 days
 exclude = ^\.|CVS|Makefile
 action = /usr/local/bin/sendtrap.sh
 recurse = true
}

recurse takes one of two values, true or false. True means search the tree and false means remain at the top level. If you don’t set a recurse directive, then fido will treat it as false, i.e., it will remain in the top directory.

[Trending: Fido-1.1.3]

 

Posted in Applications, Fido, Release | Tagged | Leave a comment



Recent Comments

  • Tim: For those who enjoy playing at home and are extra OCD … they’ll spot something wrong with this....
  • roshni: Hi jeff, I need your help regarding running urls in a file containing post directives. Could you please send...
  • Alle: In seige, what does the pink result mean?
  • Windows User: Nice collection of Perl modules. Thanks for sharing.
  • Jeff Fulmer: No idea. What do you see in the webserver’s logs?