From Zero To SEO Achieving High Rankings Through Coding

1Jun/0914

Scraping Bing SERP

Bing is no exception when it comes to scraping.

$result = getPage(
    '[proxy IP]:[port]', // get a proxy from somewhere
    'http://www.bing.com/search?q=twitter',
    'http://www.bing.com/',
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8',
    1,
    5);
 
if (empty($result['ERR'])) {
 
    preg_match_all(
        '(<div class="sb_tlst">.*<h3>.*<a href="(.*)".*>(.*)</a>.*</h3>.*</div>)siU',
        $result['EXE'], $matches);
 
    for ($i = 0; $i < count($matches[2]); $i++) {
        $matches[2][$i] = strip_tags($matches[2][$i]);
    }
 
    // Job’s done!
    // $matches[1] array contains all URLs, and
    // $matches[2] array contains all anchors
    // …
} else {
    // WTF? Problems?
    // ...
}

Grab the getPage function from Scraping websites with PHP cURL under proxy.

  • Twitter
  • Facebook
  • Digg
  • Reddit
  • del.icio.us
  • MySpace
  • Google Bookmarks
  • Technorati
  • StumbleUpon
  • Sphinn
  • Slashdot
  • NewsVine
  • Propeller
  • Tumblr
  • BlinkList
  • Faves
  • LinkedIn
  • Mixx
  • Netvibes
  • connotea
  • MisterWong
  • Diigo
  • email
Tagged as: , , Leave a comment
Comments (14) Trackbacks (0)
  1. Hi, Thanks for the tutorial code. If I want to put the call in a loop to scrape more than the 1st page results, is there a way of throttling the calls to appear more natural?

  2. Winalot, experiment with random timeout.

  3. Hi, Have bing.com changed the html of their results? The regex does not seem to be working anymore?

  4. Hi Winalot, I’ve updated the regex for you. Bing can’t escape us :twisted:

  5. Thanks! Keep up the good work.

  6. Hi seozero, Have you had any luck scraping eBay search results? Since they removed their XML feeds I’ve been looking for a way to scrape search results, especially for sold items to see trends etc. Can you work your scraping magic on those? Thanks!

  7. Winalot, I’ve never scraped ebay before, but you made me think of it.

    Btw, have you looked at http://developer.ebay.com/products/research/ and http://developer.researchadvanced.com/pages/developers_area/ebay_research_api/api_call_reference.html ?

    I think API can meet your needs.

  8. Hi seozero, Thanks for your reply.

    I’m part of the eBay developer network and use their sales API quite a bit.

    The problem is the market API is not free, see http://developer.ebay.com/programs/marketdata/ and the free version you mentioned above only returns a summary.

    Therefore I thought I’d just hit the eBay listing themselves!

  9. Thanks a lot for sharing…..

  10. have you done anything with youtube? am working on something right now :)

  11. @Jay, no. YouTube is my todo list :)

  12. ill share mine with you when im done with it :)

    when you get chance can you drop me an email id like to show something that you can use with your scraped content :)

  13. Have anyone managed to scrap the first 100 Bing results page:
    http://www.bing.com/search?q=twitter&count=100

    I get the following in return:
    HTTP/1.1 200 OK Cache-Control: no-cache Date: Tue, 16 Mar 2010 09:50:41 GMT Content-Length: 0 Connection: keep-alive Set-Cookie: OVR=flt=0&flt2=0&DomainVertical=0&Cashback=cbtest4&MSCorp=kievfinal&GeoPerf=0&Release=osf1; domain=.bing.com; path=/

    If you figured it out please send an email at tonixx AT gmail.com

    Thx!


Leave a comment


No trackbacks yet.