Live Search, Ask and Cuil SERP scraping

The SERP scraping saga continues. This time I’ll give you required regexps and URLs only. No need to copy paste code from other scraping posts. Feel free to improve regexps and comment.

Live Search

  • URL – http://search.live.com/results.aspx?q=[keyword]
  • regexp – “(<h3><a href=\”(.*)\”.*>(.*)</a></h3>)siU”

Ask

  • URL – http://www.ask.com/web?q=[keyword]
  • regexp – “(<tr>.*<td>.*<a id=\”r\d+_t\” href=\”(.*)\”.*>(.*)</a>.*</td>.*</tr>)siU”

Cuil

  • URL – http://www.cuil.com/search?q=[keyword]
  • regexp – “(<h2 class=\”t\”><a.*href=\”(.*)\”.*>(.*)</a></h2>)siU”

Sometimes Cuil puts Timeline feature in the SERP. The regular expression above matches it, however you don’t need that. The Timeline is easy to find in URLs array – search for href=”http://#”. Don’t forget to delete relevant element from anchor’s array.

Happy scrapping!

2 thoughts on “Live Search, Ask and Cuil SERP scraping”

  1. Hello, “Live Search” is a link to “Bing” and

    “Bing” does not allow to scrap any information…

    Test: fopen url <- fail (file_get_contents)
    Test: curl <- fail

Comments are closed.