Scraping Alexa Hot URLs

Do you want to know what Alexa Toolbar users are reading right now? I know you do. Grab code example and scrap the page until it dies :twisted:

$page = file_get_contents('http://www.alexa.com/hoturls');
 
preg_match_all(
    "(<div class='listing'><a.*href='(.*)'>(.*)</a>.*</div>)siU",
    $page, $matches);
 
// Job’s done!
// $matches[1] URLs
// $matches[2] anchors

What I really like about Alexa Hot URLs is that the page is updated every 5 minutes. Get some proxies, replace file_get_contents with PHP cURL, create a database, implement tricky timeout (or set up a cron task) and you’re done. Hot URLs in your pocket.

See you!

  • Twitter
  • del.icio.us
  • Facebook
  • MySpace
  • Google Bookmarks
  • Technorati
  • StumbleUpon
  • Digg
  • Reddit
  • Sphinn
  • Slashdot
  • NewsVine
  • Propeller
  • Tumblr
  • BlinkList
  • Faves
  • LinkedIn
  • Mixx
  • Netvibes
  • connotea
  • MisterWong

You may want to subscribe to my RSS feed.

Freelance Jobs

One comment.

  1. what for ?

Post a comment.