I based my answer off of this one, since what you're trying to do is get a non-greedy match. It seems like this is difficult to make happen in pyparsing, but not impossible with some cleverness and compromise. The following seems to work:
from pyparsing import *
Parameter = Literal('SPEED_X') | Literal('SPEED_Y') | Literal('SPEED_Z')
UndParam = Suppress('_') + Parameter
Identifier = SkipTo(UndParam)
Value = Word(nums)
Entry = Identifier + UndParam + Value
When we run this from the interactive interpreter, we can see the following:
>>> Entry.parseString('ABC_123_SPEED_X 123')
(['ABC_123', 'SPEED_X', '123'], {})
Note that this is a compromise; because I use SkipTo
, the Identifier
can be full of evil, disgusting characters, not just beautiful alphanums
with the occasional underscore.
EDIT: Thanks to Paul McGuire, we can concoct a truly elegant solution by setting Identifier
to the following:
Identifier = Combine(Word(alphanums) +
ZeroOrMore('_' + ~Parameter + Word(alphanums)))
Let's inspect how this works. First, ignore the outer Combine
; we'll get to this later. Starting with Word(alphanums)
we know we'll get the 'ABC'
part of the reference string, 'ABC_123_SPEED_X 123'
. It's important to note that we didn't allow the "word" to contain underscores in this case. We build that separately in to the logic.
Next, we need to capture the '_123'
part without also sucking in '_SPEED_X'
. Let's also skip over ZeroOrMore
at this point and return to it later. We start with the underscore as a Literal
, but we can shortcut with just '_'
, which will get us the leading underscore, but not all of '_123'
. Instictively, we would place another Word(alphanums)
to capture the rest, but that's exactly what will get us in trouble by consuming all of the remaining '_123_SPEED_X'
. Instead, we say, "So long as what follows the underscore is not the Parameter
, parse that as part of my Identifier
. We state that in pyparsing terms as '_' + ~Parameter + Word(alphanums)
. Since we assume we can have an arbitrary number of underscore + WordButNotParameter repeats, we wrap that expression a ZeroOrMore
construct. (If you always expect at least underscore + WordButNotParameter following the initial, you can use OneOrMore
.)
Finally, we need to wrap the initial Word and the special underscore + Word repeats together so that it's understood they are contiguous, not separated by whitespace, so we wrap the whole expression up in a Combine
construct. This way 'ABC _123_SPEED_X'
will raise a parse error, but 'ABC_123_SPEED_X'
will parse correctly.
Note also that I had to change Keyword
to Literal
because the ways of the former are far too subtle and quick to anger. I do not trust Keyword
s, nor could I get matching with them.