views:

131

answers:

4

(I am working interactively with a WordprocessingDocument object in IronPython using the OpenXML SDK, but this is really a general Python question that should be applicable across all implementations)

I am trying to scrape out some tables from a number of Word documents. For each table, I have an iterator that is giving me table row objects. I then use the following generator statement to get a tuple of cells from each row:

for row in rows:
    t = tuple([c.InnerText for c in row.Descendants[TableCell]()])

Each tuple contains 4 elements. Now, in column t[1] for each tuple, I need to apply a regex to the data. I know that tuples are immutable, so I'm happy to either create a new tuple, or build the tuple in a different way. Given that row.Descendants[TableCell]() returns an iterator, what's the most Pythonic (or at least simplest) way to construct a tuple from an iterator where I want to modify the nth element returned?

My brute-force method right now is to create a tuple from the left slice (t[:n-1]), the modified data in t[n] and the right slice (t[n+1:]) but I feel like the itertools module should have something to help me out here.

+1  A: 

If every tuple contains 4 elements, then, frankly, I think you'd be better off assigning them to individual variables, manipulating those, and then building your tuple:

for row in rows:
    t1, t2, t3, t4 = tuple([c.InnerText for c in row.Descendants[TableCell]()])
    t1 = ...
    t = (t1, t2, t3, t4)
Pavel Minaev
Unfortunately, the number of columns I'll be working with varies from table to table, so I can't assign each to a concrete variable.
technomalogical
+6  A: 
def item(i, v):
  if i != 1: return v
  return strangestuff(v)

for row in rows:
  t = tuple(item(i, c.InnerText)
            for i, c in enumerate(row.Descendants[TableCell]())
           )
Alex Martelli
This is why I should always keep a copy of "Python in a Nutshell" on my desk. I had no idea about the enumerate built-in, having never needed it before. While waiting for responses, I concocted a similar construct using itertools.izip and itertools.count. Thanks Alex!
technomalogical
@tech-).
Alex Martelli
+1  A: 

I would do this:

temp_list = [c.InnerText for c in row.Descendants[TableCell]()]
temp_list[2] = "Something different"
t = tuple(temp_list)

It would work like this:

>>> temp_list = [i for i in range(4)]
>>> temp_list[2] = "Something different"
>>> t = tuple(temp_list)
>>> t
(0, 1, 'Something different', 3)
hughdbrown
A: 

What i've generally done, but am not a fan of:

l = list(oldtuple) l[2] = foo t = tuple(l)

I kind of want something like update() for dicts

newtuple = update(oldtuple, (None, None, val, None))

Or perhaps the right structure is a zip

newtuple = update(oldtuple, ((2, val), (3, val)))

jrodman