online gambling

From Zero To SEO Achieving High Rankings Through Coding

3Feb/0916

Why remove stopwords

200984_sign_4_stop

You will be shocked to know that your articles can contain up to 70 percent of stopwords, i.e. only 30 percent of words drives search engine traffic.


Stopwords definition

Stopwords are common words that carry less important meaning than keywords. Usually search engines remove stopwords from a keyword phrase to return the most relevant result. I.e. stopwords drive much less traffic than keywords.

So what? Stopwords is a part of human language and there’s nothing you can do about it. Sure, but high stopword density can make your content look less important for search engines.

Look at the picture below. There are two paragraphs from above without stopwords.

stopwords
Text is shorter than the original one: 66 words versus 31. Approximately 50 percent of words are stopwords. I.e. half of the text is not really important for search engines.

Who should bother about stopwords?

If you don’t afraid to experiment and have time for it, replace some (not all) stopwords with yummy words before submitting a post. This may help to get more search engine traffic to your blog.

Also guys, who scrap content for doorways may be interested. However, I’m not sure if Google likes stopwords free keyword mess. Probably, you should keep some.

Stopwords lists

There are two stopwords lists from trusted websites that you can use: Link Assistant and SEO Book. You can find more or make your own list.

Stopword removal script (Perl)

Perl is the best choice to eat some text. You have the power to make difficult tasks easier if you know Perl regular expressions.

How to run?

1. Create a stopword list (stopwords.txt) – one stopword per line.

2. Save a post as a text file (post.txt). Use ASCII, not Unicode.

3. Make sure the script is executable (chmod +x stopwords_eater.pl).

4. Run (./script post.txt stopwords.txt out.txt).

#!/usr/bin/perl
 
if ($#ARGV + 1 != 3) {
	die "Usage: text file, stop words file, output file.\n";
}
 
open POST_FILE, "<$ARGV[0]" or die "$! $ARGV[0]!\n";
open STPW_FILE, "<$ARGV[1]" or die "$! $ARGV[1]!\n";
open OUT_FILE, ">$ARGV[2]" or die "$! $ARGV[2]!\n";
 
{
	local $/=undef;
	$post = <POST_FILE>;
}
 
foreach $line (<STPW_FILE>) {
	chomp($line);
	$post =~ s/\b$line\b//gi;
}
 
$post =~ s/\d//g;
$post =~ s/[?;:!,.'"]//g;
 
print OUT_FILE $post;
 
close POST_FILE;
close STPW_FILE;
close OUT_FILE;

Stopword removal script (PHP)

PHP script does absolutely the same job. It uses preg_replace function to run Perl regexp.

<?php
 
if (count($argv) != 4) {
	echo("Usage: text file, stop words file, output file.\n");
	exit;
}
 
if (!file_exists($argv[1])) {
	exit("Unable to open file $argv[1]!\n");
}
 
if (!file_exists($argv[2])) {
	exit("Unable to open file $argv[2]!\n");
}
 
$post = file_get_contents($argv[1]);
$stop_words = file($argv[2]);
 
foreach ($stop_words as $word) {
	$word = rtrim($word);
	$post = preg_replace("/\b$word\b/i", "", $post);
}
 
$post = preg_replace("/\d/", "", $post);
$post = preg_replace("/[?;:!,.'\"]/", "", $post);
 
$output = fopen($argv[3], 'w') or
	exit("Unable to open file $argv[3]\n!");
fwrite($output, $post);
fclose($output);
?>

How to run?

I hope you are familiar with PHP and are able to run the scrip from the command line.

Enjoy!

online gambling
Comments (16) Trackbacks (0)
  1. hey man!! thanks for good post. I checked some of my post, and you are right! some post contain ~40 – ~70 stop words!

  2. I often think do stop works make a difference. For instance if Google does ignore stop words from a users search. Then what if I as a user search for “The Matrix”. I will get a load of pages about the maths formula “matrix”.
    I think you have to make a choice based on how your page title reads. It’s important it makes the user want to click it …

    my two cents ..

  3. since search engine are already ignoring it, why should we not use it?

  4. azwan,

    I’m not saying that you shouldn’t use stopwords. The more stopwords in content, the less it’s important for search engines. If you replace some stopwords with normal words, you can get more traffic.

  5. maybe i can try your script, but the sample u gave does sound funny without stopword.

  6. Thank you for the information- very interesting! The more I think about this the I think it must be true. While I agree that removing all stopwords would make a normal article not flow correctly due to lack of grammar, I think while writing one could focus on using more keywords and replacing stopwords with higher ranked words to make their pages stand out in google’s eyes.

    This is my first visit- I will be back! Nice site :)

  7. I finished to write SEO tool for my site and I found that script idea very useful. I create myown stop word file stopwords.txt and wrote litle function to clean my keywords from stop words
    function del_stop_words($kw){
    $kw = array_map(‘strtolower’,array_diff($kw,array(“”)));
    $sw = explode(“\r\n”,file_get_content(‘stopwords.txt’));
    return array_values(array_diff($kw,$sw));
    }

    array_map I use to make all my values in lower case.
    explode “\r\n” I need because after file get content I got “strin
    ” but not “string”
    and array_diff cleaned my $kw.

    enjoy dudes!

  8. Thanks for a code man.

  9. Would it be useful / doable to swap known stopwords with related synonyms that are marked as non-stopwords?

  10. Doable, yes. Useful for what? Ranking? It depends. If you are able to generate human-readable content it’s OK. But I wouldn’t do it for sure. I would get http://datapresser.com/ subscription and make some splogs.

  11. Thanx your programme worked great .
    I am new to php coding and your programme helped me a lot

    Thanx

  12. could some one help me to run any one of the above codes.
    I want to eliminate stop words in french.
    Could you clearly tel me how to run a PHP or a PERL script.

  13. when i try to run this line
    ./script post.txt stopwords.txt out.txt
    This is the error that i get
    bash: ./script: No such file or directory
    could anybody tell me why i am getting this error????

  14. the “./script” command means run “script” from the current directory.

  15. hi, good info. regarding the perl script, how can you remove punctuation? thanks

  16. i need it for java…can u help………


Leave a comment


No trackbacks yet.