views:

511

answers:

4

How can I find words like and, or, to, a, no, with, for etc. in a sentence using VB.NET and remove them. Also where can I find all words list like above.

Thanks

A: 

The easiest way is:

myString.Replace("and", "")

You'd loop over your word list and have a statement like the above. Google for a list of common English words?

List of English 2 Letter Words
List of English 3 Letter Words

colithium
A: 

You can match the words and remove them using regular expressions.

Alan Haggai Alavi
an example or a link to an example might be more helpful
jao
+3  A: 

You can indeed replace your list of words using the .Replace function (as colithium described) ...

myString.Replace("and", "")


Edit:

... but indeed, a nicer way is to use Regular Expressions (as edg suggested) to avoid replacing parts of words.


As your question suggests that you would like to clean-up a sentence to keep meaningfull words, you have to do more than just remove two- and three letter words.

What you need is a list of stop-words: http://en.wikipedia.org/wiki/Stop_word

A comma seperated list of stop-words for the English language can be found here: http://www.textfixer.com/resources/common-english-words.txt

WowtaH
good answer BUT I would use Regex instead of String.Replace
Meta-Knight
i agree... i have updated my answer
WowtaH
Agreed on the stop words. Widely used by search engines to discard 'unimprtant' words. Another list available here: http://www.ranks.nl/resources/stopwords.html
JohnC
+4  A: 

Note that unless you use Regex word boundaries you risk falling afoul of the Scunthorpe (Sfannythorpe) problem.

string pattern = @"\band\b";
Regex re = new Regex(pattern);

string input = "a band loves and its fans";

string output = re.Replace(input, "");  // a band loves  its fans

Notice the 'and' in 'band' is untouched.

Ed Guiness
That's the best way to do the replace.
Meta-Knight