Google SERP scraping solves many SEO problems. For example, you can monitor website ranking and scrap content from top websites. SERP scraping is just a part of SEO life ;} So, let’s scrap my friends!
Simple Google SERP scraper
This nice little PHP script uses getPage function that you can find here.
<?php $result = getPage( '[proxy IP]:[port]', 'http://www.google.com/search?q=twitter', 'http://www.google.com/', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8', 1, 5); if (empty($result['ERR'])) { // TODO: check there is no captcha // preg_match("/sorry.google.com/", $result['EXE']); preg_match_all("(<h3 class=r><a href=\"(.*)\".*>(.*)</a></h3>)siU", $result['EXE'], $matches); for ($i = 0; $i < count($matches[2]); $i++) { $matches[2][$i] = strip_tags($matches[2][$i]); } // Job’s done! // $matches[1] array contains all URLs, and // $matches[2] array contains all anchors // … } else { // WTF? Problems? // ... } ?>
5 scraper improvement tips
- Use as many proxies as you can, because Google doesn’t like scrapers. You’re stupid if you send hundred requests from one IP address. Make a list of proxies and take random proxy each time you scrap Google.
- Use anonymous proxies. No need to explain why.
- Get keywords from a database or file and build URL on the fly. urlencode function will help you.
- Be natural, give Google a rest. Use sleep and rand functions, something like sleep(rand($x, $y)).
- Use multi cURL if you like, but use wisely.
See ya!
You may want to subscribe to my RSS feed.


By DigitalChuck on May 9th, 2009.
Thanks – I found this really useful – I’ve been looking around for days for a decent solution for SERP Scraping, and I’ve seen software that costs over $100 that does the same exact thing that your code does – thanks a TON – this code has given me all sorts of evil ideas LOL
By Jan Alvin on July 12th, 2009.
Hi! I find your script to be VERY useful and best of all its free
but I can’t seem to make it work. It only shows a blank page. (I’ve already placed the code for the getpage function.)
please help me.
By seozero on July 12th, 2009.
Jan, copy print_r($result); after $result = getPage… See what data you get.
And it’s always a good idea to try another proxy.
The code works, trust me ;] Have a nice day
By proxies for scraping on August 13th, 2009.
Just as info, you do not need sleeps or similar.
The trick is to change the IP for each new search term, you can browse all sites of a keyword (max 10 sites/1000 hits in google) without changing ip and risking a ban.
But as soon as you change the keyword you need to change your IP as well.
If you don’t do that google will add a captcha before you can continue.
If you continue google will temporarily block the IP for a few hours.
If you continue again it can happen that you go into a permanent block, that’s something you want to avoid at all cost.
greets
By seozero on August 13th, 2009.
Valuable info, thanks a lot!
By Brad on August 18th, 2009.
I feel lame for asking…but do you have any recommendations as to where to get proxies. All the ones I got with a google search come back with ‘couldn’t connect to host’ (and I’ve tried about 50 so far)
By proxy guy on August 18th, 2009.
Hey Brad!
You probably tried to use “public proxies”.
Those are a waste of time.
Most are incredible slow, overused and short lived.
If you want good results and a reliable system you’ll have to invest a few bucks.
I’ve been using Cloakfish (www.cloakfish.com) for a while.
Cloakfish is cheap, you can scrape a lot for small bucks.
The downside is lower performance.
Check the site I linked, that’s seo-proxies.com (www.seo-proxies.com). They are specialized on scraping and similar jobs and offer php scripts that do most of the job (you only have to add the code from this blog and you’re done)
hope that helps
By google scraper on November 20th, 2009.
First:
Thanks for the article, it’s interesting.
I found another one specialized on google scraping:
http://google-scraper.squabbel.com
If you don’t allow other URLs here just remove it please.
But that one goes into much more detail and it has a much more advanced PHP project included.
It can filter advertisements and it scrapes all google pages. (It mainly aims for large scaled scraping)
Well thanks again for your nice site
By anon on December 15th, 2009.
In your preg_match_all, could you explain what the siU bit at the end does?
By seozero on December 16th, 2009.
anon, plz look here
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Try to experiment if you can’t get it. Remove/add modifiers one by one and see what happens.
HTH.
By Louie Sison on February 9th, 2010.
Search Engine Optimization is a passion. You got it right in this article