tags:

views:

29

answers:

1

python/pyparsing

When I use scanString method, it is giving the start and end location of the matched token, in the text.

e.g.

line = "cat bat"
pat = Word(alphas)
for i in pat.scanString(line):
    print i

I get the following:

((['cat'], {}), 0, 3)
((['bat'], {}), 4, 7)

But cat end location should be "2" right? Why it is reporting the next location as the end location?

+1  A: 

This is consistent with Python's [begin:end] slicing conventions, where the "end" is the index of the next character. By putting the end as the next location, it is very straightforward to extract the matching substring using the returned values:

for t,start,end in pat.scanString(line):
    print line[start:end]

You can see how this is used if you look in the pyparsing source code for the implementation of transformString.

Paul McGuire