views:

157

answers:

3

I want to collapse or expand sub-sequences of a list

e.g. ['A', 'B', 'D', 'E', 'H'] -> ['AB', 'DE', 'H'] and vice versa

EDIT: the example above may cause misunderstanding. the following is better:

e.g. ['foo', 'bar', 'wtf'] <-> ['baz', 'wtf']

currently I wrote some ugly code like:

while True:
  for i, x in enumerate(s):
    if x == 'foo' and s[i+1] == 'bar':
      s[i:i+2] = 'baz'
      break
  else:
    break

For people who asking 'why do that thing':

Actually I'm working on a optimizing compiler and this is the peephole part. Writing pattern matching is a little annoying.

P.S. I found the following code works, but a bit ridiculous, why enumerate know our modification?

s = ['foo', 'bar', 'wtf', 'foo', 'bar', 'wtf', 'foo', 'bar', 'wtf']

def collapse():
    for i, x in enumerate(s):
        if s[i] == 'foo' and s[i+1] == 'bar':
            s[i:i+2] = ['baz']

def expand():
    for i, x in enumerate(s):
        if s[i] == 'baz':
            s[i:i+1] = ['foo', 'bar']

collapse()
print s
expand()
print s
+1  A: 

See itertools. Specifically, here's a recipe for more or less what you want (actually, what I thought you wanted after your kind of misleading original post!):

from itertools import tee, izip

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

This will return tuples which you can join().

To undo this, just join() your final sequence and iterate over the individual items (chars).

I will try to come up with an answer to your new/real question.

Andrew Jaffe
sorry I don't get the point relative to my question :( maybe the example is too bad. I've edited it. Please check it again
forgot
@forgot, instead of using indexing to get the next element, use this to do automatic lookahead. It still gets a little complicated since you have to skip what you're adding on the next step, but such is life. You'd have to add a lot of complication to make your current approach work right and buglessly too.
Mike Graham
@Mike Graham, absolutely, sad but true :)
forgot
A: 

I think that your way with enumerate is pretty good actually. Enumerate is able to keep track of the modifications because it creates a generator that uses an iterator of the array you input. The problem I see is if you change your array to be:

s = ['foo', 'bar', 'wtf', 'foo', 'bar', 'wtf', 'foo']

then the last 'foo' that doesn't have a 'bar' will give you an exception when your code tries to look at the item beyond the end of the array. I'm not sure how to fix this yet because my attempts haven't been successful.

Edit:

It may not be very elegant, but this code for the collapse() function will work even on the above case:

def collapse():
    i = 1
    L = len(s)
    while i < L:
        if s[i-1] == 'foo' and s[i] == 'bar':
            s[i-1:i+1] = ['baz']
            L -= 1
        i += 1
Justin Peel
People certainly have every right to downvote my answer, but why was my answer so bad?
Justin Peel
I haven't vote yet. You answer is not bad, but seems python coders don't like keep index in this way. btw you can simply replace variable L by 'while i < len(s):'
forgot
@forgot yes, you can replace that if you want. I thought about putting it like that at first, but I'm pretty certain that the len() function would be a little slower. If you're planning to use this as a compiler, I thought that speed might be important. However, I admit that it is premature optimization (which is evil).
Justin Peel
+2  A: 

I wouldn't call this much better but it's a different way to do it, and it also handles the quirk Justin points out. (I was more interested in finding a subsequence from a list, and I couldn't find a good function on Google)

def findsubseq(L, subseq):
    if not subseq: return # just die on zero-len input
    i = -1
    try:
        while True:
            i = L.index(subseq[0], i+1)
            for j in range(1, len(subseq)):
                if L[i+j] != subseq[j]:
                    break
            else:
                yield i
    except ValueError: pass
    except IndexError: pass

def replace(target, changethis, tothis):
    subseqs = [x for x in findsubseq(target, changethis)]
    subseqs.reverse()
    for i in subseqs:
        target[i:i+len(changethis)] = tothis
def collapse():
    global s
    replace(s, ['foo', 'bar'], ['baz'])
def expand():
    global s
    replace(s, ['baz'], ['foo', 'bar'])

s = ['foo', 'bar', 'wtf', 'foo', 'bar', 'wtf',
       'foo', 'bar', 'bar', 'bar', 'foo']
print s
collapse()
print s
expand()
print s

C:\Scripts>subseq.py
['foo', 'bar', 'wtf', 'foo', 'bar', 'wtf', 'foo', 'bar', 'bar', 'bar', 'foo']
['baz', 'wtf', 'baz', 'wtf', 'baz', 'bar', 'bar', 'foo']
['foo', 'bar', 'wtf', 'foo', 'bar', 'wtf', 'foo', 'bar', 'bar', 'bar', 'foo']

Edit: generalized it to just a simple replace function

Nick T
I was wondering if this could be reflowed so the generator would provide indicies to replace as it chugs along rather than 1) getting everything, 2) flipping them to avoid shrinking/growing throwing them off, 3) replacing, but the generator might need to adjust where it's searching (`list.index()`) if `changethis` is empty, which might lead to `changethis == tothis` causing it to hang. Easily remedied, but that's left as an exercise to the reader.
Nick T