Hello,
I have a collection of IEnumerable<sentence>
(sentence = string)
I want to split all sentences to words (ex: .Select(t => t.Split(' ')
), and after this I need to group this query by words to get a list of unique words.
Please, Help
Hello,
I have a collection of IEnumerable<sentence>
(sentence = string)
I want to split all sentences to words (ex: .Select(t => t.Split(' ')
), and after this I need to group this query by words to get a list of unique words.
Please, Help
First guess:
var uniqueWords = sentences.SelectMany(s => s.Split(' ')).Distinct();
However, you probably want to remove punctuation and go to lowercase as well; you can do that by passing more characters to Split and asking it to remove empty strings, and then calling ToLowerInvariant
on the result.
If the input sentences are from SQL, it will be IQueryable
instead of IEnumerable
, so Linq will attempt to execute the query in the database, which limits what you are able to do.
To make Linq execute in memory, giving you the full power of the BCL, use:
var uniqueWords = sentences.AsEnumerable().SelectMany(s => s.Split(' ')).Distinct();
The extra call to AsEnumerable()
gets the raw results from the database into memory, so you can then proceed as normal.