views:

534

answers:

2

Hello everyone,

Which one is the structure that provides best performance results? Trie, Suffix Tree or Suffix Array? There are other equivalent structures?

What are good Java implementations of these structures?

Thanks for your answers.

EDIT: In this case I want to make string matching between a large dictionary of names and a large set of natural language texts, in order to identify the names of the dictionary on texts.

Best Regards, ukrania

+1  A: 

Using Suffix Trees you can write something that will match your dictionary to your text in O(n+m+k) time where n is letters in your dictionary, m is letters in your text, and k is the number of matches. Tries are much slower for this. I'm not sure what a Suffix Array is, so I can't comment on that.

That said, it's non-trivial to code and I don't happen to know of any Java libraries that provide the necessary functions.

swestrup
About Suffix Arrays: http://en.wikipedia.org/wiki/Suffix_array
ukrania
Yeah, I've been reading up on Suffix Arrays. Turn out they have the same asymptotic speed as Suffix trees but are more space efficient. They're definitely an option.
swestrup
A: 

EDIT: In this case I want to make string matching between a large dictionary of names and a large set of natural language texts, in order to identify the names of the dictionary on texts.

This sounds like an application for the Aho-Corasick algorithm: construct an automaton from the dictionary (in linear time), which can then be used to find all the occurrences of any of the dictionary words in multiple texts (also in linear time).

(The description in these lecture notes, linked from the "External links" section of the Wikipedia page, is a lot easier to read than the description on the page itself.)

Matthew Slattery