I keep finding spam in my blog comments, and a few even appeared to be from automated bots. According to many articles I have seen recently, this is becoming more and more common. Previously I had my blog set up to notify me immediately via email when any new comments are added, but that alone doesn’t cut it anymore. I am tired of cleaning the bogus comments out, but I still want to allow comments for folks who are serious. So tonight I added some blog spam countermeasures to prevent bots and slow down the manual spammers as much as possible.
One thing I did was to enable visible IP addresses of posters. Previously I had left the IP’s off because I felt anonymity would encourage folks to post more often, and I was getting the IP’s in my private logs anyway. Well now they are visble, save for the first octet which is obfuscated, so you can at least feel somewhate anonymous. This will hopefully eliminate some shill comment posts. Not as directly related to outright blog comment spam, but in the same family.
The second counter-measure I have taken is to make sure anyone who decides to post their real email address in the email comments field, the resulting html that displays the address is encoded in such a way that spam bot harvesters will not be able to pick it up or parse and identify any email addresses in the resultant html as easily, if at all. Again, not quite diretly related to blog spam, but I want folks to feel better about using valid email addresses, and I wanted to cut down on my own incoming email spam. All the changes I made are retroactive throughout my site all the way back through my entire archive.
The third countermeasure, has actually been in place for sometime in order to combat bandwidth consuming spiders, bots, etc. I simply refined it a bit tonight by adding some addtional criteria. That is, I have implemented directly via Apache and mod_rewrite and a few other tricks with PHP, etc., measures to prevent specific user-agents, IP’s and bots from harvesting information from my site or operating the comments system in my blog.
The fourth counter-measure, which is hopefully the most effective against unidentifiable bots, and also the most noticeable change, is that I have implemented some code that provides a visual challenge and response system to prevent automated postings by bots. It is similar to solutions you may have seen on many major sites like Yahoo, Hotmail, Amazon, Network Solutions, Ebay, etc., where systems that may be subject to abuse by automated scripts, bots, etc. have some protection. That is, a unique image file is dynamically generated on the fly and contains a series of alphanumeric characters embedded in it. This image is presented as part of the comments form for each and every view and is never the same for any session. A human must view this image, read it, and type in the value they see in a special field in the form in order for the comment post to be accepted. If the value entered doesn’t match, the comment is rejected and a new image is presented. The image is rendered in such a way that even making an automated script or bot that is smart enough to download the image and attempt to perform OCR (optical character recognition) on the image to determine the values, and then complete the process without a human, would be difficult, or at least time consuming enough to prevent bots. I am sure its not foolproof and there may be bugs in my implementation, but I thought it was worth a try. I am hoping it is not too cumbersome so as to prevent or discourage folks from posting legitimate comments to my blog. I welcome any input or thoughts, so go ahead and try posting a comment on this entry and let me know what you think, or any bugs you might find.
PS. If you were looking for my “Big News – Part 2: The New Job” entry, the follow up to “Big News – Part 1: I Quit My Job”, it should be up by the end of the week if not sooner. Stay tuned…