ArticleBiz.com :: Free article content
Authors: Maximum article exposure. Publishers: Reprintable article content.  
BROWSE ARTICLES
ArticleBiz.com Home
Featured Articles
Recently Added Articles
Most Viewed Articles
Article Comments
Advanced Article Search
AUTHORS
Submit Article
Check Article Status
Author TOS
PUBLISHERS
RSS Article Feeds
Terms of Service

Using Spider Traps to Discourage Site Scraping
Home :: Computers & Technology :: Search Engine Optimization
By: Rob Sullivan Sullivan Email Article
Word Count: 841 Digg it | Del.icio.us it | Google it | StumbleUpon it

  

Sometimes your competitors will do almost anything to compete with you including stealing your content.

To do this they sometimes employ automated software much like a search engine crawler to make the process quicker and easier than manually copying your site. This can cause many problems for you.

In this article we look at ways to stop this from happening.

Stealing on the web is rampant. I don’t mean stealing people’s user id’s and passwords. I mean the stealing that goes on to a website.

Webmasters and designers steal images they like, or find a cool JavaScript they like so they steal that as well.

But what really causes problems is when your competitors steal your content.

As we all know, content is king on the web. Whoever has the most content wins. So if a competitor of yours needs to grow quickly, one of the easiest ways to do it is through the use of a website harvester.

A website harvester is no different than any other search engine crawler. It goes and requests all the URLs it can find and then proceeds to download all the content associated with those URLs.

So how does one protect themselves from malicious scrapers?

Simple really. You build a spider trap.

As the name implies, you create a section of your site devoted to luring in the spiders that are not friendly, and then you proceed to either trap them or ban them from accessing your site.

What’s involved in making a spider trap?

Usually a bit of PHP code combined with a database and a URL rewriter.

The first thing you need to do is create the space on the site dedicated to capturing those bad bots. You then use robots.txt to exclude that section from crawling.

You do this because you want to ensure Googlebot, Yahoo! Slurp, MSNbot and the others don’t also get trapped. Since most good spiders will follow the robots.txt exclusion protocol you are going to politely deny them access to this location.

From here there are various options. One of my favorite involves logging to a database or text file and then dynamically denying access to the bad bot.

How does it work?

Let me give you a practical example.

I once had a client that was getting harvested many times per day by many different bad spiders. It was so bad at one point that the bad bots were doubling his bandwidth usage.

So we devised a plan whereby we’d create this trap as mentioned above and when we captured the user agent and IP info we immediately banned them from the site.

This is how it worked:

The bad bot would come to the site and find a link on an image. The link would point to the trap directory.

Normally, a regular spider would first check the robots.txt file to ensure they could in fact index the content in that directory. Since the file excluded this directory, the “good” spiders wouldn’t go in.

Page 1 of 2 :: First | Last :: Prev | 1 2 | Next

Rob Sullivan - SEO Specialist and Internet Marketing Consultant. Reproduction of this article is allowed with an html link pointing to http://www.textlinkbrokers.com

Article Source: http://www.ArticleBiz.com

This article has been viewed 649 times.

Rate Article
Rating: 0 / 5 stars - 0 vote(s).

Article Comments
There are no comments for this article.

Leave A Reply
 Your Name
 Your Email Address [will not be published]
 Your Website [optional]
 What is one + four? [tell us you're human]
Notify me of followup comments via email


Related Articles


Copyright © 2009 by ArticleBiz.com. All rights reserved.

Terms of Service | Privacy Policy | Contact Us | Submit Article | Editorial