I'd rather not have to fire up lingpipe if possible which leaves me wondering if there are any quick, easy ways in java to extract all the bigrams and trigrams from a string of text?
thanks
I'd rather not have to fire up lingpipe if possible which leaves me wondering if there are any quick, easy ways in java to extract all the bigrams and trigrams from a string of text?
thanks
Always the easiest way is to use an existing library. You can take a look on simmetrics library. You can also use lucene NgramTokenizer. You can also implement this algorithm yourself. First, You have to find all words (using StringTokenizer) in the text and than generate n-grams you need.