views:

73

answers:

1

I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?

A: 

This is very simple to implement.

  1. Use Split(a member function of string class) to split the string into words. (you can use the delimiters in the codeproject url).

  2. A forloop to enumerate all the n-gram out and use Dictionary<string, int> to get the count.

Yin Zhu