Scraping Alexa Links

Alexa Sites Linking In feature is the fastest way to grasp overall picture of website links. But nobody likes to check them manually. Here is simple scraper that automates this tedious task.

$domain = 'fromzerotoseo.com';
 
$page = 0;
$morePages = false;
 
do {
    /* avoid file_get_contents, use cURL */
    $file = file_get_contents(
        'http://www.alexa.com/site/linksin;' . $page . '/' . $domain);
 
    preg_match_all('(<a rel.*style.*href="(.*)".*>)siU', $file, $matches);
 
    /* save links to db or file. links here -> $matches[1] */
 
    preg_match("(<a class='next' rel='next' href='(.*)')siU", $file, $nextlink);
    if ($nextlink[1]) {
        $morePages = true;
        $page++;
    } else {
        $morePages = false;
    }
} while ($morePages == true);

6 thoughts on “Scraping Alexa Links”

  1. Hey Zero,

    Lovin’ the scraping series. Any chance of scraping Amazon for reviews (only) based on a keyword search?

    Regards,

    WP

  2. @Belajar, to scrap your spammy backlinks.

    ..profiles . friendster . com / davidodang
    groups . google . com / group / bisnis-internet-online?
    www . squidoo . com / belajar-wordpress

    BTW, stop spamming my blog with your crap:

    “Learn How You Can Become A Super Affiliate In Any Niche You Want! Value $9.95”

    douchebag

  3. Winalot, what exactly you need? Product reviews?

    Like, you search for ‘iphone 3gs’, get every product from result page and save all reviews?

Comments are closed.