views:

54

answers:

2

I'm working on a Braille translation library, and I need to translate a string of text into braille. I plan to do this in multiple passes, but I need a way to keep track of which parts of the string have been translated and which have not, so I don't retranslate them.

I could always create a class which would track the ranges of positions in the string which had been processed, and then design my search/replace algorithm to ignore them on subsequent passes, but I'm wondering if there isn't a more elegant way to accomplish the same thing.

I would imagine that multi-pass string translation isn't all that uncommon, I'm just not sure what the options are for doing it.

+4  A: 

A more usual approach would be to tokenize your input, then work on the tokens. For example, start by tokenizing the string into a token for each character. Then, in a first pass generate a straightforward braille mapping, token by token. In subsequent passes, you can replace more of the tokens - for example, by replacing sequences of input tokens with a single output token.

Because your tokens are objects or structs, rather than simple characters, you can attach additional information to each - such as the source token(s) you translated (or rather, transliterated) the current token from.

Nick Johnson
Thanks a lot for this! I think it just might work!
Aaron
reminds me of Limpel Ziv encoding.
easement
A: 

Check out some basic compiler theory..

flesh