View Single Post
  #6  
Old 09-21-2006, 11:27 AM
zeslaw zeslaw is offline
OSP Starters
 
Join Date: Sep 2006
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
zeslaw is on a distinguished road
Default

I've read that you need to be under 1 query/second to really piss them off 20 queries a day is like a normal user. You can use curl to grab the page - and you can grab the # of back links w/in one query.
Otherwise, if you want to get the sites I'd use LWP and set prefs at 100 results, set the user agent to lynx or something like that and not run more than one query within 5 seconds. The delay will keep you on their good side.

I've sent a spider at yahoo that pissed them off and temporarily blocked my usage. Something like 950 pages one right after the other. I was just trying out a scraping program I grabbed. I'm sure others have seen that kind of behavior using some of the old yahoo scraping programs out there. Most new programs have anti-bombardment timing built in. -- This message may have been cut off and the rest will only be shown to members. To become a member, click here --
Reply With Quote
  Webmaster Forums - View Single Post - Does anyone know how to extract
View Single Post
  #6  
Old 09-21-2006, 11:27 AM
zeslaw zeslaw is offline
OSP Starters
 
Join Date: Sep 2006
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
zeslaw is on a distinguished road
Default

I've read that you need to be under 1 query/second to really piss them off 20 queries a day is like a normal user. You can use curl to grab the page - and you can grab the # of back links w/in one query.
Otherwise, if you want to get the sites I'd use LWP and set prefs at 100 results, set the user agent to lynx or something like that and not run more than one query within 5 seconds. The delay will keep you on their good side.

I've sent a spider at yahoo that pissed them off and temporarily blocked my usage. Something like 950 pages one right after the other. I was just trying out a scraping program I grabbed. I'm sure others have seen that kind of behavior using some of the old yahoo scraping programs out there. Most new programs have anti-bombardment timing built in. -- This message may have been cut off and the rest will only be shown to members. To become a member, click here --
Reply With Quote