
You will be shocked to know that your articles can contain up to 70 percent of stopwords, i.e. only 30 percent of words drives search engine traffic.
Stopwords definition
Stopwords are common words that carry less important meaning than keywords. Usually search engines remove stopwords from a keyword phrase to return the most relevant result. I.e. stopwords drive much less traffic than keywords.
So what? Stopwords is a part of human language and there’s nothing you can do about it. Sure, but high stopword density can make your content look less important for search engines.
Look at the picture below. There are two paragraphs from above without stopwords.

Text is shorter than the original one: 66 words versus 31. Approximately 50 percent of words are stopwords. I.e. half of the text is not really important for search engines.
Who should bother about stopwords?
If you don’t afraid to experiment and have time for it, replace some (not all) stopwords with yummy words before submitting a post. This may help to get more search engine traffic to your blog.
Also guys, who scrap content for doorways may be interested. However, I’m not sure if Google likes stopwords free keyword mess. Probably, you should keep some.
Stopwords lists
There are two stopwords lists from trusted websites that you can use: Link Assistant and SEO Book. You can find more or make your own list.
Stopword removal script (Perl)
Perl is the best choice to eat some text. You have the power to make difficult tasks easier if you know Perl regular expressions.
How to run?
1. Create a stopword list (stopwords.txt) – one stopword per line.
2. Save a post as a text file (post.txt). Use ASCII, not Unicode.
3. Make sure the script is executable (chmod +x stopwords_eater.pl).
4. Run (./script post.txt stopwords.txt out.txt).
#!/usr/bin/perl if ($#ARGV + 1 != 3) { die "Usage: text file, stop words file, output file.\n"; } open POST_FILE, "<$ARGV[0]" or die "$! $ARGV[0]!\n"; open STPW_FILE, "<$ARGV[1]" or die "$! $ARGV[1]!\n"; open OUT_FILE, ">$ARGV[2]" or die "$! $ARGV[2]!\n"; { local $/=undef; $post = <POST_FILE>; } foreach $line (<STPW_FILE>) { chomp($line); $post =~ s/\b$line\b//gi; } $post =~ s/\d//g; $post =~ s/[?;:!,.'"]//g; print OUT_FILE $post; close POST_FILE; close STPW_FILE; close OUT_FILE; |
Stopword removal script (PHP)
PHP script does absolutely the same job. It uses preg_replace function to run Perl regexp.
<?php if (count($argv) != 4) { echo("Usage: text file, stop words file, output file.\n"); exit; } if (!file_exists($argv[1])) { exit("Unable to open file $argv[1]!\n"); } if (!file_exists($argv[2])) { exit("Unable to open file $argv[2]!\n"); } $post = file_get_contents($argv[1]); $stop_words = file($argv[2]); foreach ($stop_words as $word) { $word = rtrim($word); $post = preg_replace("/\b$word\b/i", "", $post); } $post = preg_replace("/\d/", "", $post); $post = preg_replace("/[?;:!,.'\"]/", "", $post); $output = fopen($argv[3], 'w') or exit("Unable to open file $argv[3]\n!"); fwrite($output, $post); fclose($output); ?> |
How to run?
I hope you are familiar with PHP and are able to run the scrip from the command line.
Enjoy!

I often think do stop works make a difference. For instance if Google does ignore stop words from a users search. Then what if I as a user search for “The Matrix”. I will get a load of pages about the maths formula “matrix”.
I think you have to make a choice based on how your page title reads. It’s important it makes the user want to click it …
my two cents ..
since search engine are already ignoring it, why should we not use it?
azwan,
I’m not saying that you shouldn’t use stopwords. The more stopwords in content, the less it’s important for search engines. If you replace some stopwords with normal words, you can get more traffic.
maybe i can try your script, but the sample u gave does sound funny without stopword.
I finished to write SEO tool for my site and I found that script idea very useful. I create myown stop word file stopwords.txt and wrote litle function to clean my keywords from stop words
function del_stop_words($kw){
$kw = array_map(‘strtolower’,array_diff($kw,array(“”)));
$sw = explode(“\r\n”,file_get_content(‘stopwords.txt’));
return array_values(array_diff($kw,$sw));
}
array_map I use to make all my values in lower case.
explode “\r\n” I need because after file get content I got “strin
” but not “string”
and array_diff cleaned my $kw.
enjoy dudes!
Thanks for a code man.
Would it be useful / doable to swap known stopwords with related synonyms that are marked as non-stopwords?
Doable, yes. Useful for what? Ranking? It depends. If you are able to generate human-readable content it’s OK. But I wouldn’t do it for sure. I would get http://datapresser.com/ subscription and make some splogs.
Thanx your programme worked great .
I am new to php coding and your programme helped me a lot
Thanx
could some one help me to run any one of the above codes.
I want to eliminate stop words in french.
Could you clearly tel me how to run a PHP or a PERL script.
when i try to run this line
./script post.txt stopwords.txt out.txt
This is the error that i get
bash: ./script: No such file or directory
could anybody tell me why i am getting this error????
the “./script” command means run “script” from the current directory.
hi, good info. regarding the perl script, how can you remove punctuation? thanks
i need it for java…can u help………
for java u can use apache lucene analyzer for stop word removal. In addition use acapche tike for multiple file support
Wow didn’t know the percentages were spread out like that and the influence stopwords could have.
-Jean
hii.. can u help me out i need a c or c++ code of stopwords removal
Good script! Now I need to convert a post.txt with chaset Unicode. What I need to change in script?