5May/093
Scraping Yahoo SERP
Yahoo SERP scraper is a little more difficult to implement than Google SERP scraper. Yahoo guys are mad about redirects (former blackhats?). You have to clean URLs after them. But nothing can stop you from scraping
Scraper code example
First time here? Read about scraping websites with PHP cURL under proxy. You will find getPage source code there.
<?php $result = getPage( '[proxy IP]:[port]', // get a proxy from somewhere 'http://search.yahoo.com/search?p=apple', 'http://www.yahoo.com/', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8', 1, 5); if (empty($result['ERR'])) { preg_match_all('(<h3><a class.*href="(.*)".*>(.*)</a>)siU', $result['EXE'], $matches); for ($i = 0; $i < count($matches[1]); $i++) { // decode url $matches[1][$i] = urldecode($matches[1][$i]); // get rid of rds.yahoo.com redirect preg_match_all('/\*\*(http:\/\/.*$)/siU', $matches[1][$i], $urls); $matches[1][$i] = $urls[1][0]; } // strip tags for ($i = 0; $i < count($matches[2]); $i++) { $matches[2][$i] = strip_tags($matches[2][$i]); } // Job’s done! // $matches[1] contains URLs // $matches[2] contains anchors // … } else { // Something went wrong... } ?>
P.S.: Some URLs can still be unreadable (http://rdre1.yahoo.com/click?u=http://feedpoint.net...). Don’t panic
There’s a workaround.

Take care
August 13th, 2009 - 10:22
I start to like your site, nice free codes you have
Public proxies will make the process really slow, you need private ones if you want good results.
Also google and yahoo tend to detect shared proxies soon and ban them from scraping.
If you got for about 30 IPs you can scrape all day long without ban.
just my few cents
August 15th, 2009 - 15:37
what is the work around for the unreadable backlinks?
I get error on the line that removes the redirects. any possible solution?
August 16th, 2009 - 09:40
Ahmed, green color URLs. You can scrap them.