Help - Search - Members - Calendar
Full Version: Reading search engine stats
!BigBlueHost Support Forums > General > Search Engine Talk
FunnyTshirts
what is this unknown robot?

Robots/Spiders visitors (Top 25) - Full list - Last visit

8 different robots Hits Bandwidth Last visit

Unknown robot (identified by 'crawl') 21300 3.33 MB 20 Feb 2004 - 06:59

Googlebot (Google) 187 2.10 MB 20 Feb 2004 - 06:33

LinkWalker 97 772.37 KB 01 Feb 2004 - 18:55

Alexa (IA Archiver) 84 712.85 KB 17 Feb 2004 - 17:13

Inktomi Slurp 62 554.46 KB 14 Feb 2004 - 01:53

Unknown robot (identified by 'spider') 4 242.70 KB 18 Feb 2004 - 06:26

Netcraft Web Server Survey 1 0 07 Feb 2004 - 19:47

Wget 1 60.57 KB 16 Feb 2004 - 23:44
pershoot
That spider/bot could be bad or good. It is best if you block it.

here is a list of bad and good bots and their purpose:
http://www.edithfrost.com/archive/2003/07/...ot_the_bot.html

All of the code below goes in to your .htaccess file in the root of your public_html folder.

To ban by referrer:

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(www\.)?whois\.sc [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?ctechld\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?hyperspin\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?HostItCheap\.com
RewriteRule /* http://www.fakedomainofyourchoice.com/ [R,L]

"The ^ character stipulates that the referrer string must begin with the text that follows it.
The (www\.)? part means that the rule is true whether or not the www is present in the referrer string. (ie. It will catch http://banned.com and http://www.banned.com)
Note that [OR] is placed at the end of each line except the last one".

To ban by user agent:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule /* http://www.fakedomainofyourchoice.com/ [R,L]

Src: StuartR

To block by IP or Domain Name:

SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "^Googlebot-Image" bad_bot
SetEnvIfNoCase User-Agent "^TurnitinBot" bad_bot
SetEnvIfNoCase User-Agent "^NPBot" bad_bot
SetEnvIfNoCase User-Agent "^ia_archiver" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control" bad_bot
SetEnvIfNoCase User-Agent "^grub-client" bad_bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot


<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
deny from fastsearch.net
deny from btopenworld.com
deny from infomaxinc.com
deny from ipinc.com
deny from insightbb.com
deny from alexa.com
deny from p67-120.acedsl.com
deny from sympatico.ca
deny from 64-42-36-251.atgi.net
deny from dyn.optonline.net
deny from 207.191.201.
deny from 65.94.0.
deny from 64.37.103.
deny from 66.114.67.
deny from 68.192.98.81
deny from 67.68.
deny from 63.148.99.

</Limit>

Src: mbeatty
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2010 Invision Power Services, Inc.