ansaurus

Question

Answer 1

+6 A:

It's not pretty, but this may do what you want:

def paired_iter(it):
    token = it.next()
    for lookahead in it:
        yield (token, lookahead)
        token = lookahead
    yield (token, None)

def scan(s):
    for c in s:
        yield c

for this_token, next_token in paired_iter(scan("ABCDEF")):
    print "this:%s next:%s" % (this_token, next_token)

Prints:

this:A next:B
this:B next:C
this:C next:D
this:D next:E
this:E next:F
this:F next:None

Paul McGuire 2009-10-05 01:23:56

`next` is a Python builtin.

jleedev 2009-10-05 02:20:46

Sorry, I'm still thinking pre-Python3! Changed to next_token instead.

Paul McGuire 2009-10-05 05:12:24

scan() can be replaced by the builtin iter()

NicDumZ 2009-10-05 06:31:25

@NicDumZ: True. I thought about that, but I wanted to emulate the code in the OP's question.

Paul McGuire 2009-10-05 11:54:54

Answer 2

+1 A:

You can write a wrapper that buffers some number of items from the generator, and provides a lookahead() function to peek at those buffered items:

class Lookahead:
    def __init__(self, iter):
        self.iter = iter
        self.buffer = []

    def __iter__(self):
        return self

    def next(self):
        if self.buffer:
            return self.buffer.pop(0)
        else:
            return self.iter.next()

    def lookahead(self, n):
        """Return an item n entries ahead in the iteration."""
        while n >= len(self.buffer):
            try:
                self.buffer.append(self.iter.next())
            except StopIteration:
                return None
        return self.buffer[n]

Ned Batchelder 2009-10-05 02:03:46

Really nice, both simple and flexible. I think this implementation mostly fits what I would have imagined, thank you.By the way, I'm wondering how issues like that are commonly handled by scanners, parsers or the like in Python. I've gone threw some Python core library code like the SRE module or the tokenizer but I haven't seen something like a lookahead iterator being used.

jena 2009-10-05 13:03:19

Answer 3

A:

Paul's is a good answer. A class based approach with arbitrary lookahead might look something like:

class lookahead(object):
    def __init__(self, generator, lookahead_count=1):
        self.gen = iter(generator)
        self.look_count = lookahead_count

    def __iter__(self):
        self.lookahead = []
        self.stopped = False
        try:
            for i in range(self.look_count):
                self.lookahead.append(self.gen.next())
        except StopIteration:
            self.stopped = True
        return self

    def next(self):
        if not self.stopped:
            try:
                self.lookahead.append(self.gen.next())
            except StopIteration:
                self.stopped = True
        if self.lookahead != []:
            return self.lookahead.pop(0)
        else:
            raise StopIteration

x = lookahead("abcdef", 3)
for i in x:
    print i, x.lookahead

Anthony Towns 2009-10-05 02:04:01

Answer 4

A:

Here is an example that allows a single item to be sent back to the generator

def gen():
    for i in range(100):
        v=yield i           # when you call next(), v will be set to None
        if v:
            yield None      # this yields None to send() call
            v=yield v       # so this yield is for the first next() after send()

g=gen()

x=g.next()
print 0,x

x=g.next()
print 1,x

x=g.next()
print 2,x # oops push it back

x=g.send(x)

x=g.next()
print 3,x # x should be 2 again

x=g.next()
print 4,x

gnibbler 2009-10-05 02:10:48

Answer 5

+5 A:

Pretty good answers there, but my favorite approach would be to use itertools.tee -- given an iterator, it returns two (or more if requested) that can be advanced independently. It buffers in memory just as much as needed (i.e., not much, if the iterators don't get very "out of step" from each other). E.g.:

import itertools
import collections

class IteratorWithLookahead(collections.Iterator):
  def __init__(self, it):
    self.it, self.nextit = itertools.tee(iter(it))
    self._advance()
  def _advance(self):
    self.lookahead = next(self.nextit, None)
  def __next__(self):
    self._advance()
    return next(self.it)

You can wrap any iterator with this class, and then use the .lookahead attribute of the wrapper to know what the next item to be returned in the future will be. I like to leave all the real logic to itertools.tee and just provide this thin glue!-)

Alex Martelli 2009-10-05 03:00:18

Answer 6

A:

Since you say you are tokenizing a string and not a general iterable, I suggest the simplest solution of just expanding your tokenizer to return a 3-tuple: (token_type, token_value, token_index), where token_index is the index of the token in the string. Then you can look forward, backward, or anywhere else in the string. Just don't go past the end. Simplest and most flexible solution I think.

Also, you needn't use a list comprehension to create a list from a generator. Just call the list() constructor on it:

 token_list = list(scan(string))

Don O'Donnell 2009-10-05 04:25:33

This is a very interesting idea since it avoids the issue in the first place. But I think there a two downsides: First, in case the part of accessing a token from the token stream is up to a different instance than the scanner, both token stream and original string would have to be provided.However, I could live with that and it might be a good idea to let the scanner do the accessing work anyway.But I think peeking a token by making use of the original string only provides the value but not other annotational stuff like the token's type which might be essential in some cases (so in mine).

jena 2009-10-05 12:50:15

ansaurus

tags:

views:

answers:

Using lookahead with generators

related questions