online gambling

From Zero To SEO Achieving High Rankings Through Coding

1Jun/0915

Scraping Bing SERP

Bing is no exception when it comes to scraping.

$result = getPage(
    '[proxy IP]:[port]', // get a proxy from somewhere
    'http://www.bing.com/search?q=twitter',
    'http://www.bing.com/',
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8',
    1,
    5);
 
if (empty($result['ERR'])) {
 
    preg_match_all(
        '(<div class="sb_tlst">.*<h3>.*<a href="(.*)".*>(.*)</a>.*</h3>.*</div>)siU',
        $result['EXE'], $matches);
 
    for ($i = 0; $i < count($matches[2]); $i++) {
        $matches[2][$i] = strip_tags($matches[2][$i]);
    }
 
    // Job’s done!
    // $matches[1] array contains all URLs, and
    // $matches[2] array contains all anchors
    // …
} else {
    // WTF? Problems?
    // ...
}

Grab the getPage function from Scraping websites with PHP cURL under proxy.

online gambling
Tagged as: , , Leave a comment
Comments (15) Trackbacks (0)
  1. Hi, Thanks for the tutorial code. If I want to put the call in a loop to scrape more than the 1st page results, is there a way of throttling the calls to appear more natural?

  2. Winalot, experiment with random timeout.

  3. Hi, Have bing.com changed the html of their results? The regex does not seem to be working anymore?

  4. Hi Winalot, I’ve updated the regex for you. Bing can’t escape us :twisted:

  5. Thanks! Keep up the good work.

  6. Hi seozero, Have you had any luck scraping eBay search results? Since they removed their XML feeds I’ve been looking for a way to scrape search results, especially for sold items to see trends etc. Can you work your scraping magic on those? Thanks!

  7. Winalot, I’ve never scraped ebay before, but you made me think of it.

    Btw, have you looked at http://developer.ebay.com/products/research/ and http://developer.researchadvanced.com/pages/developers_area/ebay_research_api/api_call_reference.html ?

    I think API can meet your needs.

  8. Hi seozero, Thanks for your reply.

    I’m part of the eBay developer network and use their sales API quite a bit.

    The problem is the market API is not free, see http://developer.ebay.com/programs/marketdata/ and the free version you mentioned above only returns a summary.

    Therefore I thought I’d just hit the eBay listing themselves!

  9. Thanks a lot for sharing…..

  10. have you done anything with youtube? am working on something right now :)

  11. @Jay, no. YouTube is my todo list :)

  12. ill share mine with you when im done with it :)

    when you get chance can you drop me an email id like to show something that you can use with your scraped content :)

  13. Have anyone managed to scrap the first 100 Bing results page:
    http://www.bing.com/search?q=twitter&count=100

    I get the following in return:
    HTTP/1.1 200 OK Cache-Control: no-cache Date: Tue, 16 Mar 2010 09:50:41 GMT Content-Length: 0 Connection: keep-alive Set-Cookie: OVR=flt=0&flt2=0&DomainVertical=0&Cashback=cbtest4&MSCorp=kievfinal&GeoPerf=0&Release=osf1; domain=.bing.com; path=/

    If you figured it out please send an email at tonixx AT gmail.com

    Thx!

  14. Hi to all! I’ve got the same trouble of Toni.

    My function is recursive and look for all the pages of Bing, but
    - I cannot access more then 20 pages
    - (or) I can use this URL http://www.bing.com/search?q=ip%3a216.239.34.21&count=50&first=199 like last one

    I also tryed writing and sending back cookies to Bing, without luck.


Leave a comment


No trackbacks yet.