views:

284

answers:

2

Can someone please point to tutorials on - "Token Suffix Trees".

Google, gives only links to research papers that are already using them! :(

Thanks in advance.

A: 

From googling that same phrase and scanning the first couple of results, my guess is that they are talking about a suffix tree in which the "letters" (or "characters", or "elements") are not individual ASCII or UNICODE characters as we are accustomed to, but rather the lexical tokens from some computer language.

So e.g. for C you would have a "letter" called int, and another letter called (, and so on. I'm not sure exactly how tokens that are subsequences of other tokens (e.g. + is a subsequence of ++) would be handled, but my guess would be that they are handled in the same way the lexer deals with them, which is (for C at least) by always greedily building the longest token (so e.g. the 5 input characters +++++ will be lexed as ++, ++, +).

j_random_hacker
Yes, you are right - the "letters" are HTML tokens for the project I am looking at. Thanks, for the effort though. :)
Bart J
A: 

Not sure if it is what you are looking for, but your question reminds me of what I know as 'suffix trees on words', e.g. http://www.larsson.dogma.net/words-alg.pdf

Fabian Steeg