1Jun/0914
Scraping Bing SERP
Bing is no exception when it comes to scraping.
$result = getPage( '[proxy IP]:[port]', // get a proxy from somewhere 'http://www.bing.com/search?q=twitter', 'http://www.bing.com/', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8', 1, 5); if (empty($result['ERR'])) { preg_match_all( '(<div class="sb_tlst">.*<h3>.*<a href="(.*)".*>(.*)</a>.*</h3>.*</div>)siU', $result['EXE'], $matches); for ($i = 0; $i < count($matches[2]); $i++) { $matches[2][$i] = strip_tags($matches[2][$i]); } // Job’s done! // $matches[1] array contains all URLs, and // $matches[2] array contains all anchors // … } else { // WTF? Problems? // ... }
Grab the getPage function from Scraping websites with PHP cURL under proxy.
June 10th, 2009 - 13:21
Hi, Thanks for the tutorial code. If I want to put the call in a loop to scrape more than the 1st page results, is there a way of throttling the calls to appear more natural?
June 10th, 2009 - 15:44
Winalot, experiment with random timeout.
June 11th, 2009 - 16:49
Hi, Have bing.com changed the html of their results? The regex does not seem to be working anymore?
June 11th, 2009 - 19:38
Hi Winalot, I’ve updated the regex for you. Bing can’t escape us
June 11th, 2009 - 21:01
Thanks! Keep up the good work.
June 14th, 2009 - 20:44
Hi seozero, Have you had any luck scraping eBay search results? Since they removed their XML feeds I’ve been looking for a way to scrape search results, especially for sold items to see trends etc. Can you work your scraping magic on those? Thanks!
June 15th, 2009 - 10:08
Winalot, I’ve never scraped ebay before, but you made me think of it.
Btw, have you looked at http://developer.ebay.com/products/research/ and http://developer.researchadvanced.com/pages/developers_area/ebay_research_api/api_call_reference.html ?
I think API can meet your needs.
June 15th, 2009 - 17:30
Hi seozero, Thanks for your reply.
I’m part of the eBay developer network and use their sales API quite a bit.
The problem is the market API is not free, see http://developer.ebay.com/programs/marketdata/ and the free version you mentioned above only returns a summary.
Therefore I thought I’d just hit the eBay listing themselves!
July 4th, 2009 - 18:26
Thanks a lot for sharing…..
February 6th, 2010 - 21:50
have you done anything with youtube? am working on something right now
February 7th, 2010 - 18:50
@Jay, no. YouTube is my todo list
February 7th, 2010 - 22:06
ill share mine with you when im done with it
when you get chance can you drop me an email id like to show something that you can use with your scraped content
February 8th, 2010 - 09:23
sent you email
March 16th, 2010 - 11:59
Have anyone managed to scrap the first 100 Bing results page:
http://www.bing.com/search?q=twitter&count=100
I get the following in return:
HTTP/1.1 200 OK Cache-Control: no-cache Date: Tue, 16 Mar 2010 09:50:41 GMT Content-Length: 0 Connection: keep-alive Set-Cookie: OVR=flt=0&flt2=0&DomainVertical=0&Cashback=cbtest4&MSCorp=kievfinal&GeoPerf=0&Release=osf1; domain=.bing.com; path=/
If you figured it out please send an email at tonixx AT gmail.com
Thx!