Hi,
I'm sorry for the generic title of this question but I wish I was able to articulate it less generically. :-}
I'd like to write a piece of software (in this case, using C++) which translates a stream of input tokens into a stream of output tokens. There are just five input tokens (lets call them 0, 1, 2, 3, 4) and each of them can have a few different attributes (like, There might be an 4.x property or 0.foo). There are a few more output tokens, about ten, let's call them (Out0..Out9) each of them also has a few properties.
Now, we've been working on a mapping of sequences from input tokens to possible output tokens, like this:
01 -> Out0
34 -> Out1
0101 -> Out3
...so different sequences of input tokens map to a single output token.
In my scenario, the set of input tokens is fixed, but the set of output tokens is not - we might decide to introduce new 'productions'. My question is:
Does anybody know whether there are good patterns and/or idioms which help in such a situation?
Right now I have a set of 'Compressor' object, each of which can eat the input tokens and eventually produces the output tokens. The problem is that some input tokens clash, consider 'Out0' and 'Out3' in the above case. The input '0101' should yield Out3 but not Out0. However, the input '0104 should yield Out0 and then leave 0 and 4 in the queue.
I'm wondering whether there are maybe patterns from data compression or other areas which might be beneficial.
This work of 'reducing' an input of lowlevel tokens to highlevel tokens and dealing with possible conflicts is common among parser writers, no? Are there are useful patterns there?
UPDATE:
A bit more information:
in my concrete case, the input tokens are C structs, and the output tokens are C++ objects. I have no control whatsoever over the input stream of tokens, but I can queue them and then modify the queue in case that is beneficial.
I solved clashes (like Out3 (0101) vs. Out0 (01)) by trying to match Out3 first and then Out0, but it's a bit ugly. The possible productions are in a list and I simply try to apply them to the input stream, one after the other
The list of possible productions can be extended by the user, so I cannot generate one huge DAG and then have a state machine which implements that to handle every possible transition. Of course, this means that the user can add clashes, but that's just the way it is.