views:

130

answers:

1

Hello,

I have a collection of IEnumerable<sentence> (sentence = string)

I want to split all sentences to words (ex: .Select(t => t.Split(' ')), and after this I need to group this query by words to get a list of unique words.

Please, Help

+12  A: 

First guess:

var uniqueWords = sentences.SelectMany(s => s.Split(' ')).Distinct();

However, you probably want to remove punctuation and go to lowercase as well; you can do that by passing more characters to Split and asking it to remove empty strings, and then calling ToLowerInvariant on the result.

If the input sentences are from SQL, it will be IQueryable instead of IEnumerable, so Linq will attempt to execute the query in the database, which limits what you are able to do.

To make Linq execute in memory, giving you the full power of the BCL, use:

var uniqueWords = sentences.AsEnumerable().SelectMany(s => s.Split(' ')).Distinct();

The extra call to AsEnumerable() gets the raw results from the database into memory, so you can then proceed as normal.

Daniel Earwicker
i'm trying to apply your code to table (linq2sql) var uniqueWords = m_DataContext.StoreCategories.SelectMany(s => s.Name.Split(' ')).Distinct();But there is an error - Method 'System.String[] Split(Char[])' has no supported translation to SQL.
Maxim
I've made an update to my answer. Also I'm going to tag your question appropriately.
Daniel Earwicker
On this blog is a nice overview (with images) on how SelectMany works... http://www.codethinked.com/post/2010/03/12/A-Visual-Look-At-The-LINQ-SelectMany-Operator.aspx
Peter Gfader