views:

97

answers:

3

What is the most effecient way to turn string to a list of words in C#?

For example:

Hello... world 1, this is amazing3,really ,  amazing! *bla*

should turn into the following list:

["Hello", "world", "1", "this", "is", "amazing3", "really", "amazing", "bla"]

Note that it should support other languages than English.

I need this because I want to collect a list of keywords from specific text.

Thanks.

+2  A: 

You need a lexer.

Hans Passant
+5  A: 
char[] separators = new char[]{' ', ',', '!', '*', '.'};  // add more if needed

string str = "Hello... world 1, this is amazing3,really ,  amazing! *bla*";
string[] words= str.Split(separators, StringSplitOptions.RemoveEmptyEntries);
James Curran
TTT
@Alon, Add them to the list of seperators.
Gage
+5  A: 

How about using regular expressions? You could make the expression arbitrarily complex, but what I have here should work for most inputs.

new RegEx(@"\b(\w)+\b").Matches(text);
Brian Gideon
this is **exactly** what I needed. Thanks!
TTT