From Zero To SEO Achieving High Rankings Through Coding

13Jul/096

Scraping Alexa Links

Alexa Sites Linking In feature is the fastest way to grasp overall picture of website links. But nobody likes to check them manually. Here is simple scraper that automates this tedious task.

$domain = 'fromzerotoseo.com';
 
$page = 0;
$morePages = false;
 
do {
    /* avoid file_get_contents, use cURL */
    $file = file_get_contents(
        'http://www.alexa.com/site/linksin;' . $page . '/' . $domain);
 
    preg_match_all('(<a rel.*style.*href="(.*)".*>)siU', $file, $matches);
 
    /* save links to db or file. links here -> $matches[1] */
 
    preg_match("(<a class='next' rel='next' href='(.*)')siU", $file, $nextlink);
    if ($nextlink[1]) {
        $morePages = true;
        $page++;
    } else {
        $morePages = false;
    }
} while ($morePages == true);
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • del.icio.us
  • MySpace
  • Google Bookmarks
  • Technorati
  • StumbleUpon
  • Sphinn
  • Slashdot
  • NewsVine
  • Propeller
  • Tumblr
  • BlinkList
  • Faves
  • LinkedIn
  • Mixx
  • Netvibes
  • connotea
  • MisterWong
  • Diigo
  • email
Tagged as: , Leave a comment
Comments (6) Trackbacks (0)
  1. What is the function of the code? thanks.

    From David

  2. Hey Zero,

    Lovin’ the scraping series. Any chance of scraping Amazon for reviews (only) based on a keyword search?

    Regards,

    WP

  3. @Belajar, to scrap your spammy backlinks.

    ..profiles . friendster . com / davidodang
    groups . google . com / group / bisnis-internet-online?
    www . squidoo . com / belajar-wordpress

    BTW, stop spamming my blog with your crap:

    “Learn How You Can Become A Super Affiliate In Any Niche You Want! Value $9.95″

    douchebag

  4. Winalot, what exactly you need? Product reviews?

    Like, you search for ‘iphone 3gs’, get every product from result page and save all reviews?

  5. Hi Zero,

    Yeah, you got it. I don’t want product details, just the reviews.

    WP

  6. winalot, I think I can manage it. But I’m quite busy right now, so if you can wait a week or two.. ]:)

    Have a nice day


Leave a comment


No trackbacks yet.