Nerd Splaining Large Numbers

Holy shit — the Economist really outdid itself. What now? In this post, they explained why Gangnam Style will break YouTube’s view counter. They used 3726 characters and 612 words to explain that computer integers don’t go on forever. When the Gangnam Style counter reaches 2,147,483,647 it will stop counting. Why?

Integers are stored in a series of ones and zeroes. On a 32-bit platform, you can only store value in 32 consecutive ones or zeros. Go to this binary to decimal calculator and put 32 ones in the binary field. Press “Calculate” and you’ll get this answer: 4294967295.

But the Gangnam Style counter is maxed at half of that? How come? That’s because computers use positive and negative numbers. The range falls above and below zero, i.e., from -2,147,483,648 to 2,147,483,647. Gangnam Style is approaching the upper bound.

If YouTube switched to 64-bit architecture they could capture up to 9 quintilian views.

Remember kids, there are 10 kinds of people in this world. Those who understand binary numbers and those who don’t.

[Economist: Wordy Word Words on Computer Integers]

 

Posted in Programming, Tech Media, Technology | Leave a comment



Nobody Ever Typed ‘-1966631820′ Into The Internet

Your JoeDog was debugging C code. Not just any C code, but C code that was last updated in 2001 by a man who’s now retired. Or maybe he’s dead — the point is he can’t be consulted.

Well, sir, this code was inserting 4 billion and change into a field that expected 1 or 0. The insert was based on a result from a previous query. Your JoeDog debugged that variable and determined it was -1966631820. Hoping that number would shed light on his problem, he plugged it into the Internets.

As of 13:22:05 EST, no human has ever typed that into the Internets. Sensing an opportunity to monopolize a keyword, Your JoeDog typey-typed and added this: -1966631820

UPDATE: Couple things. 1.) A JoeDogger says that Google excludes from its results parameters that are prefaced with a minus sign. 2.) Your JoeDog removed the minus and tried again. A minute after publication, he had captured the number one spot on Google for the keyword ‘1966631820’

 

Posted in On The Job | Leave a comment



In Praise of Default Values

Your JoeDog likes options. He feels that if a program takes a variable value, that value should be configuarable. A programmer can spend a great deal of time selecting the perfect socket timeout, but unless the user works in the same environment it’s not necessarily perfect for them.

On the occasions when Your JoeDog uses Windows, he finds himself struggling to make the software do his bidding. It takes time to add another text field to a Windows GUI, so developers tend to limit the number of configurable options.

At the same time, he hates complicated software. You shouldn’t need a computer science PhD in order to configure scheduling software. Yet it’s impossible to use Tivoli’s workload scheduler and not feel completely overwhelmed. It can take days to set up.

These notions don’t have to be mutually exclusive. Software can be extremely flexible and simple to use. Your JoeDog achieves this notion in his own software with a novel concept known as the “default value.” If you don’t set a value, you get the default. If you require more precision, you can change those settings.

Generally speaking, software users don’t care about every configurable value. They have a subset of values they want to change. If everything has a default that doesn’t need to be set for the software to function, then the documentation becomes less overwhelming. If all you want to do is change one setting, then you can search the docs for just that configuration.

Your JoeDog does enough GUI programming that he can speak to the notion he mentioned above. It takes time to add labels and text fields to a program. Those GUI elements also take valuable screen real estate. As a result, many programmers limit the flexibility of their programs.

Here’s a thought: why not make the program configurable with a combination of a GUI and a configuration file? You can place the frequently changed stuff inside the GUI and the more obscure features inside the file. Trust me, the users who really want to change something will discover how to do that if you let them.

Keep it simple but make it flexible and your users will be appreciative … until you blog about it.

 

 

Posted in Programming | Leave a comment



Fido 1.1.5 SIGHUP and Reload

Good morning, JoeDoggers. Let’s bask in the glow of Your Fido this morning; he’s all grown up and ready for love. What does that mean? Well, it means it now behaves like a contemporary modern daemon. Starting with version 1.1.5, if you send it SIGHUP, it will reload its configuration file.

Really? It’s been out since 2011 and you’re only adding that feature now?

Hey, what do you want from me? It’s free, isn’t it?

Here’s how it works: if you change fido’s configuration file, you can send it SIGHUP to reload its key = value pairs. There’s just one thing it won’t reload: its filenames.

Remember, a fido configuration file is divided into two parts; it contains global settings and file settings. The file settings are distinguished by a filename followed by two brackets like this: {}. Here’s an example:

/usr/local/var/my.log {
 # key = value pairs go here.
}

So if you change /usr/local/var/my.log to anything else, you’ll have to restart fido. If you change any other values, then you can just send it SIGHUP.

So how do I send it SIGHUP?

There’s several ways of doing this.

1.) You can look for the process ID (PID) with the ps command and send it SIGHUP (which is signal number 1):

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
# kill -1 31952

2.) Check your system documentation. Some kill commands support name values such as this:

# ps -aef | grep fido
root 31952 1 0 09:21 ? 00:00:00 /usr/sbin/fido -f /etc/fido/fido.conf
Pom # kill -HUP 31952

3.) We can eliminate the ps command by using fido’s pid file like this:

# kill -1 $(cat /var/run/fido.pid)

You can verify a successful config reload by looking at /var/log/messages.

 [Download: Fido]

 

Posted in Applications, Fido, Release | Leave a comment



A Cyber Pearl Harbor

Earlier this year, Home Depot fell victim to one of the worst known cyber attacks. Its systems were infiltrated and attackers stole personal information from millions of customers. The company suffered little from the attack; its stock is now at an all-time high. In the past year alone, there have been many high profile cyber attacks that have been met with little more than a shrug.

Leon Panetta, a former US Secretary of Defense, once claimed it would take a cyber “Pearl Harbor” before Americans were willing to do what was necessary to fix their computer infrastructure vulnerabilities. We haven’t faced such a catastrophe but, as the New York Times discovers, people are starting to realize that more attention must be paid to these sorts of threats. Your JoeDog has seen this new attitude first hand. His company now has more security analysts than systems analysts.

[NY Times: Hacked vs. Hackers]

Posted in Security | Leave a comment



Why You Should Test Your Site Under Load

Siege users will never get embarrassed like this….

[Techcrunch: Call The Geek Squad, Best Buy Crashes On Black Friday]

Posted in Uncategorized | Leave a comment



No, A Website Doesn’t Write Cookies To Your Hard Drive

Remember Netscape Navigator? Some of you might have called it Nutscrape Irritator. Ha Ha. It’s funny because it’s true.

Well back in the day, Netscape was the shizzle. All of a sudden this stupid gopher thing was filled with images and colors. We were all happily browsing in a 3D colorful world when the lamestream media “discovered” Netscape. They poked around and saw the names of sites they had visited. The names were associated with weird strings they didn’t understand.

“Hmmm, what’s this?” an intrepid cub reporter asked?

“Oh, those? They’re called cookies.”

“How did they get here?”

Now Your JoeDog doesn’t know the how that question was answered back in 1995 but fsck that guy. Since the moment some wanna-be tech writer discovered cookies, we’ve been dealing with cookie hysteria. Someone is writing things to our hard drive! Yeah, you know who’s doing that? You are.

Here’s the problem: Hypertext transfer protocol (HTTP) is stateless. You send a request to a server and it sends something back. The server doesn’t hear from you until you make another request. There’s nothing in each ensuing request to positively identify you as the person who made that last request. To get around this problem, Netscape invented the magic cookie.

How does it work? We’ll nerdsplain after the jump….

Continue reading “No, A Website Doesn’t Write Cookies To Your Hard Drive” »

Posted in HTTP, Protocols | Leave a comment



The Massive Scale of AWS

Here’s an interesting peek behind the scenes at Amazon Web Services:

Scale is perhaps the most important thing, and no one needs to teach an online retailer like Amazon anything about that. With Amazon, there is very little talk of public cloud, and that is because Amazon believes that, by its nature, cloud means it cannot be private. Over the long haul, Amazon believes the massive scale of the public cloud will mean that very few organizations will run their own datacenters.

Interesting throughout.  (H/T to Tim for bringing it to my attention)

[Enterprise Tech: A Rare Peek Into The Massive Scale of AWS]

 

Posted in Uncategorized | Leave a comment



Fido 1.1.4 And Google’s Last Crawl

Your JoeDog can now announce the release of fido-1.1.4. To illustrate its awesome new feature we’re going to do an exercise.

“Wait. Homework? This blog sucks!”

Yeah, but this homework but this is useful. We’re going to capture the time of Google’s last crawl so you can be a hero with the SEO nerds in your company.

“Okay, fine!”

We can identify the googlebot by its User-agent. Google conveniently refers to itself as Googlebot. Here’s an entry in Your JoeDog’s logs:

208.78.85.241 - - [23/Nov/2014:06:16:24 -0500] "GET /blog/ HTTP/1.1" 
   200 57787 "-" "Googlebot/2.X (+http://www.googlebot.com/bot.html)"

Unfortunately, anybody can masquerade as a googlebot. The only way we can be certain this agent is authentic is to check its IP address.

Pom $ dig -x 208.78.85.241
;; ANSWER SECTION:
241.85.78.208.in-addr.arpa. 1316 IN PTR host241.subnet-208-78-85.gigavenue.com.

Wait a second! That’s not Google, that’s a fraud. Let’s check another entry:

Pom $ dig -x 66.249.65.47
;; ANSWER SECTION:
47.65.249.66.in-addr.arpa. 11310 IN PTR crawl-66-249-65-47.googlebot.com.

Okay, that’s Google. So in order to validate and record the time of Google’s last crawl, we have to check the IP address. How do we achieve this?

We’ll use fido to check our logs for instances of Googlebot but it can’t validate the IP address. Our action program can do that but how do we pass it the address?  Prior to fido-1.1.4, that would have been impossible. Starting with 1.1.4 it can now do regex capture and pass those variables to the action program.

To set this up, you’ll need to a file block in fido.conf which points to your access_log.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

When fido locates a match, it will capture everything inside the parentheses and send that to the googler script as $1. Here’s the googler script:

#!/usr/bin/perl
use Socket;
use strict;
use vars qw($LOCK_EX $LOCK_UN);
$LOCK_EX = 2;
$LOCK_UN = 8;
my $addr = $ARGV[0];
my $host = gethostbyaddr(inet_aton($addr), AF_INET);
if ($host !~ /\.googlebot\.com$/) {
 print "ERROR: Forged User-agent ($host)\n";
 exit;
}
my $file = "/path/to/joedog.org/google.txt";
open (FILE, ">>$file") or die "Unable to open file: $file\n";
flock(FILE, $LOCK_EX);
print FILE timestamp()." | $addr\n";
flock(FILE, $LOCK_UN);
exit;
# returns a string in the following format:
# YYYYMMDDHHMMSS
sub timestamp() {
 my $now = time;
 my @date = localtime $now;
 $date[5] += 1900;
 $date[4] += 1;
 my $stamp = sprintf(
 "%02d/%02d/%04d %02d:%02d:%02d",
 $date[4],$date[3],$date[5], $date[2], $date[1], $date[0]
 );
 $stamp .= " | ";
 $stamp .= sprintf(
 "%04d%02d%02d%02d%02d",
 $date[5],$date[4],$date[3], $date[2], $date[1], $date[0]
 );
 return $stamp;
}
sub empty { ! defined $_[0] || ! length $_[0] }

As you can imagine, there’s lots of creative things you can do with the googler script. Your JoeDog hopes to compare crawl frequencies against the site’s freshness to see if there’s a correlation.

[JoeDog: Last Google Crawl]

UPDATE:  As Tim notes in the comments below, the Internets are full of jerks.

Consider this hostname: haha.googlebot.com.fooledyou.ru

The regex has been changed so that the fully qualified hostname must end in .com

 

 

 

Posted in Applications, Fido | 1 Comment



Fido Now Supports Regex Capture and Back References

It’s a good thing Your JoeDog is not very bright. If he were an intelligent man, then he’d realize the difficulty of the task he was about to undertake before he undertook it. He would look at that task and think, “I’m not good enough to code that.” Fortunately, he’s not very bright and that means fido has a new feature.

New feature? Exciting!

Beginning with version 1.1.4, fido can do regex capture and pass the back references to the program it should run. Let’s look at an example config to help us make some sense of this. The new feature is highlighted in red.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot
 action = /home/jeff/bin/googler $1
}

In this exercise, we want to find instances of GoogleBot in the access log, but we also want to verify that it’s actually the GoogleBot and not a crawler with a forged User-agent. To accomplish that, we want to capture the IP address and send it to our program for validation.

So let’s look at the rule:

^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*GoogleBot

If a line being with an quartet of dot separated numbers, i.e., an IP address, and it contains GoogleBot, then we have a match. Notice that IP address is wrapped in parentheses? That’s our capture. Everything within the parens will be assigned to $1. (BTW: Writing that was a royal PITA.) If we had two sets of parens, then the second set would be assigned to $2.

Fido will assign variables by name so it doesn’t matter which order you pass them to your program.

/var/log/httpd/joedog-access_log {
 rules = ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*(GoogleBot)
 action = /home/jeff/bin/googler $2 $1
}

Will run: /home/jeff/bin/googler GoogleBot 198.14.14.6

I’ll release version 1.1.4 after the documentation is up-to-date. In the meantime, early adapters can grab the code from the source repository.

You’ll need autotools on your system. Inside the fido directory, run this command to build the configure script:

utils/bootstrap

Happy hacking.

 

Posted in Applications, Fido | Leave a comment



Recent Comments

  • Tim: For those who enjoy playing at home and are extra OCD … they’ll spot something wrong with this....
  • roshni: Hi jeff, I need your help regarding running urls in a file containing post directives. Could you please send...
  • Alle: In seige, what does the pink result mean?
  • Windows User: Nice collection of Perl modules. Thanks for sharing.
  • Jeff Fulmer: No idea. What do you see in the webserver’s logs?