13Jul/096
Scraping Alexa Links
Alexa Sites Linking In feature is the fastest way to grasp overall picture of website links. But nobody likes to check them manually. Here is simple scraper that automates this tedious task.
$domain = 'fromzerotoseo.com'; $page = 0; $morePages = false; do { /* avoid file_get_contents, use cURL */ $file = file_get_contents( 'http://www.alexa.com/site/linksin;' . $page . '/' . $domain); preg_match_all('(<a rel.*style.*href="(.*)".*>)siU', $file, $matches); /* save links to db or file. links here -> $matches[1] */ preg_match("(<a class='next' rel='next' href='(.*)')siU", $file, $nextlink); if ($nextlink[1]) { $morePages = true; $page++; } else { $morePages = false; } } while ($morePages == true);
July 16th, 2009 - 12:14
What is the function of the code? thanks.
From David
July 16th, 2009 - 15:31
Hey Zero,
Lovin’ the scraping series. Any chance of scraping Amazon for reviews (only) based on a keyword search?
Regards,
WP
July 16th, 2009 - 21:33
@Belajar, to scrap your spammy backlinks.
..profiles . friendster . com / davidodang
groups . google . com / group / bisnis-internet-online?
www . squidoo . com / belajar-wordpress
…
BTW, stop spamming my blog with your crap:
“Learn How You Can Become A Super Affiliate In Any Niche You Want! Value $9.95″
douchebag
July 16th, 2009 - 21:38
Winalot, what exactly you need? Product reviews?
Like, you search for ‘iphone 3gs’, get every product from result page and save all reviews?
July 16th, 2009 - 22:59
Hi Zero,
Yeah, you got it. I don’t want product details, just the reviews.
WP
July 17th, 2009 - 08:21
winalot, I think I can manage it. But I’m quite busy right now, so if you can wait a week or two.. ]:)
Have a nice day