tags:

views:

138

answers:

5

Hi all,

Doing some XML processing in python. (Edit: I'm forced to use Python 2.4 for this project, boo!) I want to know what is the most Pythonic way to do this (create union of all values in multiple lists):

def getUniqueAttributeValues(xml_attribute_nodes):
    # split attribute values by whitespace into lists
    result_lists=list(item.getContent().split() for item in xml_attribute_nodes)

    # find all unique values
    unique_results=[]
    for result_list in result_lists:
        for result in result_list:
            if result in unique_results:
                continue
            unique_results.append(result)

    return unique_results

Thanks,

-aj

A: 
def getUniqueAttributeValues(xml_attribute_nodes):
    return set(l 
       for item in xml_attribute_nodes
       for l in item.getContent().split())

If you want to have a list, just convert the set to a list before returning.

Torsten Marek
You might want to test this. I get NameError: global name 'item' is not defined with your code and this with my translation:>>> L = [[1,2,3],[1,2]]>>> [e for e in subL for subL in L]Traceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'subL' is not defined
telliott99
The ordering of loops was wrong, it's fixed now.
Torsten Marek
+1  A: 

Unions are not supported by lists, which are ordered, but are supported by sets. Check out set.union.

fatcat1111
+6  A: 

set.union does what you want:

>>> results_list = [[1,2,3], [1,2,4]]
>>> results_union = set().union(*results_list)
>>> print results_union
set([1, 2, 3, 4])

You can also do this with more than two lists.

sth
@sth, thanks for example, but when I run it I get an error:Traceback (most recent call last): File "so_example.py", line 33, in ? results_union=set().union(*result_lists)TypeError: union() takes exactly one argument (3 given)
AJ
@AJ: According to the documentsion (http://docs.python.org/library/stdtypes.html#set.union) `union()` only supports multiple arguments for Python version 2.6 or higher. You seem to use a version before that, so you probably have to use an explicit loop: `total = set(); for x in results_list: total.update(x)` *(s/;/\n/)*
sth
A: 

I used the following to do intersections, which avoids the need for sets.

a, b= [[1,2,3], [1,2]]
s = filter( lambda x: x in b, a)

or,

s = [ x for x in b if x in a ]
Bear
This doesn't work for an arbitrary number of lists.
Seth Johnson
Why would you even want to "avoid the need for sets"? They're faster, and clearer, for this purpose. And your "x in a" does a linear, brute-force search through the list each time you execute it. Yuck.
Peter Hansen
sets require type casting, and linear speed isn't bad unless you are dealing with a large N.
Bear
"Type casting"? In Python? Since when? Sets are basically dicts with only the keys, and they use hash and equality comparisons. Using "x in a" on a list does an equality comparison too. What's all this about type casting?
Peter Hansen
+3  A: 

Since you seem to be using Python 2.5 (it would be nice to mention in your Q if you need an A for versions != 2.6, the current production one, by the way;-) and want a list rather than a set as the result, I recommend:

   import itertools

   ...

   return list(set(itertools.chain(*result_list)))

itertools is generally a great way to work with iterators (and so with many kinds of sequences or collections) and I heartily recommend you become familiar with it. itertools.chain, in particular, is documented here.

Alex Martelli
+1 A perfect example of a good time to dip into the wonderful `itertools` package.
gotgenes
@Alex thanks...edited my question to specify version and remove blame from myself for being so behind in versions :) I'll make it a point to look into itertools, appreciate the suggestion.
AJ
@AJ, no blame, we all can suffer under such constraints after all (but please do remember to specify in future Qs!-); `itertools.chain` works fine in Python 2.4 as well, by the way.
Alex Martelli