Why remove stopwords

You will be shocked to know that your articles can contain up to 70 percent of stopwords, i.e. only 30 percent of words drives search engine traffic.
Stopwords definition
Stopwords are common words that carry less important meaning than keywords. Usually search engines remove stopwords from a keyword phrase to return the most relevant result. I.e. stopwords drive much less traffic than keywords.
So what? Stopwords is a part of human language and there’s nothing you can do about it. Sure, but high stopword density can make your content look less important for search engines.
Look at the picture below. There are two paragraphs from above without stopwords.

Text is shorter than the original one: 66 words versus 31. Approximately 50 percent of words are stopwords. I.e. half of the text is not really important for search engines.
Who should bother about stopwords?
If you don’t afraid to experiment and have time for it, replace some (not all) stopwords with yummy words before submitting a post. This may help to get more search engine traffic to your blog.
Also guys, who scrap content for doorways may be interested. However, I’m not sure if Google likes stopwords free keyword mess. Probably, you should keep some.
Stopwords lists
There are two stopwords lists from trusted websites that you can use: Link Assistant and SEO Book. You can find more or make your own list.
Stopword removal script (Perl)
Perl is the best choice to eat some text. You have the power to make difficult tasks easier if you know Perl regular expressions.
How to run?
1. Create a stopword list (stopwords.txt) – one stopword per line.
2. Save a post as a text file (post.txt). Use ASCII, not Unicode.
3. Make sure the script is executable (chmod +x stopwords_eater.pl).
4. Run (./script post.txt stopwords.txt out.txt).
#!/usr/bin/perl if ($#ARGV + 1 != 3) { die "Usage: text file, stop words file, output file.\n"; } open POST_FILE, "<$ARGV[0]" or die "$! $ARGV[0]!\n"; open STPW_FILE, "<$ARGV[1]" or die "$! $ARGV[1]!\n"; open OUT_FILE, ">$ARGV[2]" or die "$! $ARGV[2]!\n"; { local $/=undef; $post = <POST_FILE>; } foreach $line (<STPW_FILE>) { chomp($line); $post =~ s/\b$line\b//gi; } $post =~ s/\d//g; $post =~ s/[?;:!,.'"]//g; print OUT_FILE $post; close POST_FILE; close STPW_FILE; close OUT_FILE;
Stopword removal script (PHP)
PHP script does absolutely the same job. It uses preg_replace function to run Perl regexp.
<?php if (count($argv) != 4) { echo("Usage: text file, stop words file, output file.\n"); exit; } if (!file_exists($argv[1])) { exit("Unable to open file $argv[1]!\n"); } if (!file_exists($argv[2])) { exit("Unable to open file $argv[2]!\n"); } $post = file_get_contents($argv[1]); $stop_words = file($argv[2]); foreach ($stop_words as $word) { $word = rtrim($word); $post = preg_replace("/\b$word\b/i", "", $post); } $post = preg_replace("/\d/", "", $post); $post = preg_replace("/[?;:!,.'\"]/", "", $post); $output = fopen($argv[3], 'w') or exit("Unable to open file $argv[3]\n!"); fwrite($output, $post); fclose($output); ?>
How to run?
I hope you are familiar with PHP and are able to run the scrip from the command line.
Enjoy!
February 3rd, 2009 - 13:27
hey man!! thanks for good post. I checked some of my post, and you are right! some post contain ~40 – ~70 stop words!
February 7th, 2009 - 15:22
I often think do stop works make a difference. For instance if Google does ignore stop words from a users search. Then what if I as a user search for “The Matrix”. I will get a load of pages about the maths formula “matrix”.
I think you have to make a choice based on how your page title reads. It’s important it makes the user want to click it …
my two cents ..
February 9th, 2009 - 17:40
since search engine are already ignoring it, why should we not use it?
February 10th, 2009 - 15:38
azwan,
I’m not saying that you shouldn’t use stopwords. The more stopwords in content, the less it’s important for search engines. If you replace some stopwords with normal words, you can get more traffic.
February 12th, 2009 - 19:07
maybe i can try your script, but the sample u gave does sound funny without stopword.
February 26th, 2009 - 16:39
Thank you for the information- very interesting! The more I think about this the I think it must be true. While I agree that removing all stopwords would make a normal article not flow correctly due to lack of grammar, I think while writing one could focus on using more keywords and replacing stopwords with higher ranked words to make their pages stand out in google’s eyes.
This is my first visit- I will be back! Nice site
June 17th, 2009 - 16:24
I finished to write SEO tool for my site and I found that script idea very useful. I create myown stop word file stopwords.txt and wrote litle function to clean my keywords from stop words
function del_stop_words($kw){
$kw = array_map(‘strtolower’,array_diff($kw,array(“”)));
$sw = explode(“\r\n”,file_get_content(‘stopwords.txt’));
return array_values(array_diff($kw,$sw));
}
array_map I use to make all my values in lower case.
explode “\r\n” I need because after file get content I got “strin
” but not “string”
and array_diff cleaned my $kw.
enjoy dudes!
June 17th, 2009 - 16:31
Thanks for a code man.
June 18th, 2009 - 15:44
Would it be useful / doable to swap known stopwords with related synonyms that are marked as non-stopwords?
June 18th, 2009 - 19:31
Doable, yes. Useful for what? Ranking? It depends. If you are able to generate human-readable content it’s OK. But I wouldn’t do it for sure. I would get http://datapresser.com/ subscription and make some splogs.
July 2nd, 2009 - 13:58
Thanx your programme worked great .
I am new to php coding and your programme helped me a lot
Thanx
November 12th, 2009 - 09:27
could some one help me to run any one of the above codes.
I want to eliminate stop words in french.
Could you clearly tel me how to run a PHP or a PERL script.
November 12th, 2009 - 09:34
when i try to run this line
./script post.txt stopwords.txt out.txt
This is the error that i get
bash: ./script: No such file or directory
could anybody tell me why i am getting this error????
November 12th, 2009 - 13:04
the “./script” command means run “script” from the current directory.
January 22nd, 2010 - 21:45
hi, good info. regarding the perl script, how can you remove punctuation? thanks
February 9th, 2010 - 21:37
i need it for java…can u help………