25Jun/091
Scraping Alexa Hot URLs
Do you want to know what Alexa Toolbar users are reading right now? I know you do. Grab code example and scrap the page until it dies
$page = file_get_contents('http://www.alexa.com/hoturls'); preg_match_all( "(<div class='listing'><a.*href='(.*)'>(.*)</a>.*</div>)siU", $page, $matches); // Job’s done! // $matches[1] URLs // $matches[2] anchors
What I really like about Alexa Hot URLs is that the page is updated every 5 minutes. Get some proxies, replace file_get_contents with PHP cURL, create a database, implement tricky timeout (or set up a cron task) and you’re done. Hot URLs in your pocket.
See you!
August 11th, 2009 - 22:13
what for ?