Scraping Alexa Hot URLs

Do you want to know what Alexa Toolbar users are reading right now? I know you do. Grab code example and scrap the page until it dies 😈

$page = file_get_contents('http://www.alexa.com/hoturls');
 
preg_match_all(
    "(<div class='listing'><a.*href='(.*)'>(.*)</a>.*</div>)siU",
    $page, $matches);
 
// Job’s done!
// $matches[1] URLs
// $matches[2] anchors

What I really like about Alexa Hot URLs is that the page is updated every 5 minutes. Get some proxies, replace file_get_contents with PHP cURL, create a database, implement tricky timeout (or set up a cron task) and you’re done. Hot URLs in your pocket.

See you!

One thought on “Scraping Alexa Hot URLs”

Comments are closed.