The Internets are full of spam. Maybe you’ve noticed?
It’s in your inbox, in your comments and scattered throughout your web forums. Every spammer is a bag of dicks but the worst bottom feeder on the Internets is the referer spammer.
If you’ve never administered a website, then you’ve probably never heard of referer spam. Yeah, what is that? Glad you asked. These dregs send requests to your web site with a fabricated referer that points to a site they want to advertise. Ideally, they’ll send requests to a site that publishes its traffic reports. When their URL makes the report, they get a free link back to their site.
Sites that publish their usage reports are easy to find. Put this in the Google Machine and see what pops up: “Top * Total Search Strings” This is what we’re looking for: Usage Stats: Top Referers. Your JoeDog can get himself on that report by doing this:
Bully $ siege -H "Referer: http://www.joedog.org/" -g http://www.pickart.at/
HEAD / HTTP/1.0
User-Agent: Mozilla/5.0 (unknown-x86_64-linux-gnu) Siege/3.0.8
HTTP/1.1 200 OK Date: Fri, 03 Oct 2014 17:53:38 GMT Server: Apache Connection: close Content-Type: text/html
Now if he’s really intent on making that report, he’ll repeat that request a few hundred times and place himself at number two on the chart. But here’s the thing: Referer Spammers will spam your logs even if you don’t publish your reports. They’ll go to all that trouble just to lure webmasters to their esoteric fetish sites.
So what can you do to prevent this stuff? Mostly you can decrease their incentive.
- Put your usage stats inside a password protected area
- Add a robots.txt with a bot exclusion rule so search engines don’t index it.
- Add a nofollow directive inside every link, again so engines don’t index them
I guarantee you’ll still get the stuff. They’ll send faked referrals just to capture the attention of the site’s administrators but at least you won’t award them with a boost to their Page Rank.
NOTE: Yes, Your JoeDog spelled Referrer with only two r’s. Most humans use three. Phillip Hallam-Baker is not most humans. He was the first guy to miss an ‘r’ in the original HTTP specification. I say, “first guy” because hundreds of eyeballs viewed that document and none of them noticed the misspelling. By the time it became RFC1945, “Referer” was set in stone. It would have been easier to change the world’s English-language dictionaries at that point….