It Knows When You Are Sleeping

lurchMohammad Sabah is a former Netflix data scientist who worked on that company’s movie recommendation engine. He brought some of that technology to Workday, an HR software manufacturer. So what’s the result of a marriage between movie recommendations and human resource management? It’s an application called Workday Talent Insights.

What’s that do? It’s kind of cool, actually. The program predicts which employee is likely to quit and what can be done to retain them. It’s not clear how many inputs it evaluates but Your JoeDog can envision an applicable scenario. Let’s say — and why not? — a really tall employee dislikes being known as “Lurch.” The application’s corrective action could be “Stop calling him that!”

 



Block IP Addresses With, um, block

Last night we told you that ISIS assholes were attacking America and its WordPress blogs. Have no fear, kids. In the War Against Internet Dickbags, Your JoeDog is here to help. Whenever he’s attacked by these people, i.e., all the time, he grabs their IP addresses from the logs and blocks their asses. He feeds those addresses to a script called block which, um, blocks them.

The script is just a convenience which makes your life easier. iptables does the heavy lifting. If you’re running a linux system — and why aren’t you? — then you probably have iptables.

Let’s block some dickwads now! (Continued after the jump)

Continue reading Block IP Addresses With, um, block



Who’s Honoring Me Now?

Your JoeDog’s Fido was selected by Profit Bricks as its 24th Best Free Sysadmin Tool

About Fido:

A multi-threaded file watch utility, Fido can monitor a file or directory to see if its modification time changed, and it’s intuitive enough to recognize a change even if the daemon was down at the time of the change. Written for RedHat Enterprise Linux, Fido should run in all Linux flavors, and the source bundle contains RedHat init.d scripts for users’ convenience.

Key Features:

  • Content changes can be recognized with regular expression pattern matches
  • Monitors files for changes in content or modification times
  • It will kick off a user-defined script when it notices a change
  • Coded to POSIX 1003.1 standards


Pinochle 1.0.7

Yesterday was Christmas and you know what? You can’t get a stinkin’ slice of pizza on Christmas. Hardly anything is open on December 25th. Christmas is basically house arrest.

To kill the time between Christmas morning and the grand re-opening of his coffee shop, Your JoeDog played a little pinochle. It’s a pretty good game. The computer bids well and plays a reasonably strong hand but — WTF? — the bid dialog box moves all over the place.

Each time you submit a bid, the dialog moves so you can’t just click a second time without moving your mouse. Well that’s annoying. Indeed. Your JoeDog fixed that yesterday. House arrest ended today….

Life is good.

[JoeDog: Pinochle-1.0.7]

 



Siege 3.0.9

What’s Your JoeDog doing now? He’s knee-deep in old C code. This code generates software that calculates the optimum way to cut sheets of linoleum as they roll off a production line. Aren’t you glad you asked? How old is this code? It was last updated in 1999 when it was ported to HP-UX.

You know how an old song can take you back — sometimes to a good place, sometimes to hell? Old code works like that. This project was coded by other humans, but Your JoeDog sees his own flaws in it. He sees techniques that remind him to hang himself back in 1999.

Nobody codes like that anymore. There’s a reason why we’ve abandoned some techniques in favor of others. For the past two weeks, Your JoeDog has been dereferencing variables, debugging memory leaks and trying to figure out what’s whacking his stack. Context is everything, people. In this one, you don’t want anything whacking your stack.

Now siege already encapsulates much of his current programming philosophy. It’s written in C but it relies on object-oriented architecture. If you encapsulate memory management it makes it easier to pinpoint flaws.

Unfortunately, his personal projects haven’t kept up with industry standards. This coding experience has prompted him to fix his sins before they become unmanageable. Your JoeDog updated to gcc-4.7.4 and he watched the warnings fly! This version fixes all of those warnings. There’s nothing sexy about it but you should probably upgrade anyway.

[JoeDog: Siege-3.0.9]

 

 



CTR Is Hard

Sproxy is a word Your JoeDog invented to describe his [S]iege [Proxy]. At the time of this writing, this site has the top three positions for ‘sproxy’ on Google. In the past week, nine hundred people typed ‘sproxy’ into the Google machine. Of those nine hundred, only 110 clicked a link to this site. That’s a 12.22% click-through rate for a made-up word that describes an esoteric piece of software that exists right on this very site. Let’s just say that falls a little below expectation….

 

 

 



Fido 1.1.5 SIGHUP and Reload

Good morning, JoeDoggers. Let’s bask in the glow of Your Fido this morning; he’s all grown up and ready for love. What does that mean? Well, it means it now behaves like a contemporary modern daemon. Starting with version 1.1.5, if you send it SIGHUP, it will reload its configuration file.

Really? It’s been out since 2011 and you’re only adding that feature now?

Hey, what do you want from me? It’s free, isn’t it?

Here’s how it works: if you change fido’s configuration file, you can send it SIGHUP to reload its key = value pairs. There’s just one thing it won’t reload: its filenames.

Remember, a fido configuration file is divided into two parts; it contains global settings and file settings. The file settings are distinguished by a filename followed by two brackets like this: {}. Here’s an example:

/usr/local/var/my.log {
 # key = value pairs go here.
}

So if you change /usr/local/var/my.log to anything else, you’ll have to restart fido. If you change any other values, then you can just send it SIGHUP.

So how do I send it SIGHUP?

There’s several ways of doing this.

1.) You can look for the process ID (PID) with the ps command and send it SIGHUP (which is signal number 1):

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
# kill -1 31952

2.) Check your system documentation. Some kill commands support name values such as this:

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
Pom # kill -HUP 31952

3.) We can eliminate the ps command by using fido’s pid file like this:

# kill -1 $(cat /var/run/fido.pid)

You can verify a successful config reload by looking at /var/log/messages.

 [Download: Fido]

 



Fido 1.1.4 And Google’s Last Crawl

Your JoeDog can now announce the release of fido-1.1.4. To illustrate its awesome new feature we’re going to do an exercise.

“Wait. Homework? This blog sucks!”

Yeah, but this homework but this is useful. We’re going to capture the time of Google’s last crawl so you can be a hero with the SEO nerds in your company.

“Okay, fine!”

We can identify the googlebot by its User-agent. Google conveniently refers to itself as Googlebot. Here’s an entry in Your JoeDog’s logs:

208.78.85.241 - - [23/Nov/2014:06:16:24 -0500] "GET /blog/ HTTP/1.1" 
   200 57787 "-" "Googlebot/2.X (+http://www.googlebot.com/bot.html)"

Unfortunately, anybody can masquerade as a googlebot. The only way we can be certain this agent is authentic is to check its IP address.

Pom $ dig -x 208.78.85.241
;; ANSWER SECTION:
241.85.78.208.in-addr.arpa. 1316 IN PTR host241.subnet-208-78-85.gigavenue.com.

Wait a second! That’s not Google, that’s a fraud. Let’s check another entry:

Pom $ dig -x 66.249.65.47
;; ANSWER SECTION:
47.65.249.66.in-addr.arpa. 11310 IN PTR crawl-66-249-65-47.googlebot.com.

Okay, that’s Google. So in order to validate and record the time of Google’s last crawl, we have to check the IP address. How do we achieve this?

We’ll use fido to check our logs for instances of Googlebot but it can’t validate the IP address. Our action program can do that but how do we pass it the address?  Prior to fido-1.1.4, that would have been impossible. Starting with 1.1.4 it can now do regex capture and pass those variables to the action program.

To set this up, you’ll need to a file block in fido.conf which points to your access_log.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

When fido locates a match, it will capture everything inside the parentheses and send that to the googler script as $1. Here’s the googler script:

#!/usr/bin/perl
use Socket;
use strict;
use vars qw($LOCK_EX $LOCK_UN);
$LOCK_EX = 2;
$LOCK_UN = 8;
my $addr = $ARGV[0];
my $host = gethostbyaddr(inet_aton($addr), AF_INET);
if ($host !~ /\.googlebot\.com$/) {
 print "ERROR: Forged User-agent ($host)\n";
 exit;
}
my $file = "/path/to/joedog.org/google.txt";
open (FILE, ">>$file") or die "Unable to open file: $file\n";
flock(FILE, $LOCK_EX);
print FILE timestamp()." | $addr\n";
flock(FILE, $LOCK_UN);
exit;
# returns a string in the following format:
# YYYYMMDDHHMMSS
sub timestamp() {
 my $now = time;
 my @date = localtime $now;
 $date[5] += 1900;
 $date[4] += 1;
 my $stamp = sprintf(
 "%02d/%02d/%04d %02d:%02d:%02d",
 $date[4],$date[3],$date[5], $date[2], $date[1], $date[0]
 );
 $stamp .= " | ";
 $stamp .= sprintf(
 "%04d%02d%02d%02d%02d",
 $date[5],$date[4],$date[3], $date[2], $date[1], $date[0]
 );
 return $stamp;
}
sub empty { ! defined $_[0] || ! length $_[0] }

As you can imagine, there’s lots of creative things you can do with the googler script. Your JoeDog hopes to compare crawl frequencies against the site’s freshness to see if there’s a correlation.

[JoeDog: Last Google Crawl]

UPDATE:  As Tim notes in the comments below, the Internets are full of jerks.

Consider this hostname: haha.googlebot.com.fooledyou.ru

The regex has been changed so that the fully qualified hostname must end in .com

 

 

 



Fido Now Supports Regex Capture and Back References

It’s a good thing Your JoeDog is not very bright. If he were an intelligent man, then he’d realize the difficulty of the task he was about to undertake before he undertook it. He would look at that task and think, “I’m not good enough to code that.” Fortunately, he’s not very bright and that means fido has a new feature.

New feature? Exciting!

Beginning with version 1.1.4, fido can do regex capture and pass the back references to the program it should run. Let’s look at an example config to help us make some sense of this. The new feature is highlighted in red.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

In this exercise, we want to find instances of GoogleBot in the access log, but we also want to verify that it’s actually the GoogleBot and not a crawler with a forged User-agent. To accomplish that, we want to capture the IP address and send it to our program for validation.

So let’s look at the rule:

^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot

If a line being with an quartet of dot separated numbers, i.e., an IP address, and it contains GoogleBot, then we have a match. Notice that IP address is wrapped in parentheses? That’s our capture. Everything within the parens will be assigned to $1. (BTW: Writing that was a royal PITA.) If we had two sets of parens, then the second set would be assigned to $2.

Fido will assign variables by name so it doesn’t matter which order you pass them to your program.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*(GoogleBot)
 action = /home/jeff/bin/googler $2 $1
}

Will run: /home/jeff/bin/googler GoogleBot 198.14.14.6

I’ll release version 1.1.4 after the documentation is up-to-date. In the meantime, early adapters can grab the code from the source repository.

You’ll need autotools on your system. Inside the fido directory, run this command to build the configure script:

utils/bootstrap

Happy hacking.

 



Referer Spammers

The Internets are full of spam. Maybe you’ve noticed?

It’s in your inbox, in your comments and scattered throughout your web forums. Every spammer is a bag of dicks but the worst bottom feeder on the Internets is the referer spammer.

If you’ve never administered a website, then you’ve probably never heard of referer spam. Yeah, what is that?  Glad you asked. These dregs send requests to your web site with a fabricated referer that points to a site they want to advertise. Ideally, they’ll send requests to a site that publishes its traffic reports. When their URL makes the report, they get a free link back to their site.

Sites that publish their usage reports are easy to find. Put this in the Google Machine and see what pops up: “Top * Total Search Strings” This is what we’re looking for: Usage Stats: Top Referers.  Your JoeDog can get himself on that report by doing this:

Bully $ siege -H "Referer: http://www.joedog.org/" -g http://www.pickart.at/
HEAD / HTTP/1.0
Host: www.pickart.at
Accept: */*
User-Agent: Mozilla/5.0 (unknown-x86_64-linux-gnu) Siege/3.0.8
Referer: http://www.joedog.org/
Connection: close
HTTP/1.1 200 OK
Date: Fri, 03 Oct 2014 17:53:38 GMT
Server: Apache
Connection: close
Content-Type: text/html

Now if he’s really intent on making that report, he’ll repeat that request a few hundred times and place himself at number two on the chart. But here’s the thing: Referer Spammers will spam your logs even if you don’t publish your reports. They’ll go to all that trouble just to lure webmasters to their esoteric fetish sites.

So what can you do to prevent this stuff? Mostly you can decrease their incentive.

  1. Put your usage stats inside a password protected area
  2. Add a robots.txt with a bot exclusion rule so search engines don’t index it.
  3. Add a nofollow directive inside every link, again so engines don’t index them

I guarantee you’ll still get the stuff. They’ll send faked referrals just to capture the attention of the site’s administrators but at least you won’t award them with a boost to their Page Rank.

NOTE: Yes, Your JoeDog spelled Referrer with only two r’s. Most humans use three. Phillip Hallam-Baker is not most humans. He was the first guy to miss an ‘r’ in the original HTTP specification. I say, “first guy” because hundreds of eyeballs viewed that document and none of them noticed the misspelling. By the time it became RFC1945, “Referer” was set in stone. It would have been easier to change the world’s English-language dictionaries at that point….