Fly Fishing Forum banner

Yahoo! Slurp Spider. What is it?

1K views 3 replies 3 participants last post by  bad_angler 
#1 ·
OK all you folks well versed in all things internet. I have a question. I was looking at the who is online section here under the quick links tab in attempt to see what all the guests listed on the front page are all about. At this writing there are 116 members & 2349 guests. The who is online deal shows all the registered users are and what page they are viewing. After the last registered user the guests show up. Some are listed as guests and then there was mostly a thing called .....Yahoo! Slurp Spider.

What is a Yahoo! Slurp Spider. I have never seen the term before? Is it like a web crawler that search engines use? In any event that is where most of the guest hits are coming from. Could someone explain this to me?
 
#2 ·
Its the yahoo crawler, a pain in the sides for most webmasters with forums. They take a high percentage of guest accesses and it depends on time of day etc - but are different from registered users who make up the majority of hits to the forum.

I've put some blockers in place since yahoo insists on being total asses to site owners by eating up resources and bandwidth when the world uses google anyway. Check it in a few days and I will try other means if necessary.

Dear Yahoo Search, give it up. Google won.
 
#4 ·
As a web developer I'd suggest you configure your server to not allow access to parts of your site based on the user-agent string. Robots.txt exclusion is honored by slurp and thats a simple way to limit access to parts of your site. See the slurp help url in your access logs or google for robots.txt format. Spiders are specifically telling your server what manner of client is accessing the pages and you are supposed to handle those clients differently if you don't want them indexing all of your site. If you are running apache with mod_rewrite installed you can add a rewrite rule like this

RewriteCond %{HTTP_USER_AGENT} ^Some Funky User-Agent$
RewriteRule ^index\.html$ /dontindexme.html [L]

to have parts of your site hidden from the spidering of pages. Whenever an agent named 'Some Funky User-Agent' comes to access index.html they get sent to maybe your dead page called dontindexme.html. The user--agent is there for the webmasters benefit..allowing you to control what happens when it visits.

Ok back to flyfishing.
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top