The gift that keeps on giving.... sort of.
So, like many other websites, I've been having a problem with various people trying to hammer my site for various reasons: spammers trying to get email addresses, comment spammers trying to put their garbage ads in my posts, people linking to my images to not have to bother with loading the image into their site... the usual things. Lately, I've been getting "spidered" a lot more than usual - someone's been going through my whole website, copying all the content for some reason. If it was Google or Yahoo!, then it would likely be so they can get an accurate assessment of my site to include it in search results and what not. And if it was them, I'd be fine with it. But it's not them.
Instead, it's an IP address - a location on the internet. 195.225.178.15. Not that big a deal, until you start digging. As the screenshot I posted attests, they were responsible for almost 40% of the traffic to my website in the month of June. In July, it's only 3% of the traffic - but I have no doubt that's going to 0% in a matter of minutes from now on.
A while back, when blog spamming first became a big deal, I coped with it in various ways - moving files so that they weren't in the expected locations, disabling them entirely, and generally doing anything I could to keep one step ahead. The MovableType plugin community kept pace with many of the developments in the area, and developed a couple of really useful tools - one was MTBlackList, written by Jay Allen - which has, to my understanding, evolved repeatedly and was probably the basis, in some form or another, for the TypePad Anti-Spam service we have today. Another was AutoBan, which would update an .htaccess file based on people visiting your site to comment too frequently. The idea of my weblog acting as it's own gatekeeper appealed to me, and I implemented this as soon as I could. And it was a good thing to have around.
But it didn't really stop them from coming in the first place - it merely made it harder for them to get in the door and spam after the first few comments. I wanted to make it very hard for them. A little poking around and I discovered Junk Slowdown, written by the same person who wrote AutoBan (who's name eludes me, and will surely come back to me soon). It's job is to literally waste the spammers time. In my case, I have it tied into AutoBan, so that if they get on the blacklist, they end up wasting their time.
The way it works is that it sends most of a web page to their spider bot, which happily sucks down content. Notice I said "most of a web page." When it comes to the last, final bit of the web page - the closing body tag - it inserts a wait command that pauses any output from the script for 30 seconds. This means that for everytime that the spider is accessing my site, trying to post fake comments, it's stuck waiting 30 seconds. And since comment spammers like to shotgun their spam, they tend to send a lot of comments the way of the same blog at once. This works to my advantage, because it means that every time they make a request, their spider spends another 30 seconds waiting for my script to finish it's job. Something that doesn't do much to my server - after all, it's a small little script (866 bytes), and it only outputs text - so it would, without the pause, barely cause a blip in the amount of traffic that the server generates. But with the pause, it barely causes two blips - one at the start, and then another 30 seconds later.
Is it working? I suppose. Is it effective? It's hard to say. One of the reasons that spam - and comment spam - is effective is that it really doesn't cost that much. You can buy a list of email addresses online for probably not that much. You can generate that list yourself if you have time to write a little program that will run through the alphabet and generate random email addresses. You could even just take the dictionary and generate email addresses based on that - antelope@, anteater@, aardvark@, and so on. Not that hard to do with a little time and thought. Then you just sign up for a dialup account (spammers are notorious for doing that since it's cheaper than a high speed connection, which would also usually require the normal utility credit check and whatnot) and just start broadcasting the spam. For website comments, it's pretty similar - you can do a Google search for web addresses containing certain words - like mt-comment.cgi, for example - and get a list of thousands or millions of sites using certain software. Figure out the format for a comment post - which can be pretty easy in some cases - and then set your spider loose on the sites, shotgunning your comments all over the web.
Granted, some of the sites will have setup anti-spam measures like I have - or will have upgraded to software that doesn't have some of the security holes that allow things like this to happen - but just like the email spam, they don't need it to work for everyone - they just need a small percentage to get through. The comment spammers work best when they get through because then their URL is a part of the internet for all to see - especially the normal spider engines, the ones that work for Google and Yahoo!. Once the spammers content is out there, they become part of the index of the web - one where many links to your website can be a good thing, and can cause you to become the top result for certain search phrases - which is what the spammers are hoping for.
Does that work? Less and less. Yahoo!, Google, and all the other search engines do what they can to keep the actual criteria for how a site gets ranked higher both confidential and changing - once people figure out how to game the system, the usually tweak the system to stop that from working - so it's a continual game of cat and mouse. Eventually both sides may reach a stalemate, but I don't think that will be any time soon. In the mean time, I'll continue to use any tool at my disposal to keep the spammers at bay - and you should too.
# block stuff that's been spidering my site too much
ErrorDocument 403 /sand-trap.php
deny from 195.225.176.0/22
0 TrackBacks
Listed below are links to blogs that reference this entry: The gift that keeps on giving.... sort of..
TrackBack URL for this entry: http://adam.gerstein.net/cgi-bin/MT-4.2-en/mt-tb.cgi/1493





Leave a comment