views:

203

answers:

5

Hi,

I wish to compare to nested lists of unequal length. I am interested only in a match between the first element of each sub list. Should a match exist, I wish to add the match to another list for subsequent transformation into a tab delimited file. Here is an example of what I am working with:

x = [['1', 'a', 'b'], ['2', 'c', 'd']]

y = [['1', 'z', 'x'], ['4', 'z', 'x']]

match = []

def find_match():
    for i in x:
        for j in y:
            if i[0] == j[0]:
                 match.append(j)
            return match

This returns:

[['1', 'x'], ['1', 'y'], ['1', 'x'], ['1', 'y'], ['1', 'z', 'x']]

Would it be good practise to reprocess the list to remove duplicates or can this be done in a simpler fashion?

Also, is it better to use tuples and/or tuples of tuples for the purposes of comparison?

Any help is greatly appreciated.

Regards, Seafoid.

+1  A: 
if i[1] == j[1]

checks whether the second elements of the arrays are identical. You want if i[0] == j[0].

Otherwise, I find your code quite readable and wouldn't necessarily change it.

Tim Pietzcker
+2  A: 

I don't know if I interpret your question correctly, but given your example it seems that you might be using a wrong index:

change

if i[1] == j[1]:

into

if i[0] == j[0]:
disown
You are correct! Thanks for pointing that out. I edited to incorporate that and showed the output.
Seafoid
A: 

A simplier expression should work here too:

list_of_lists = filter(lambda l: l[0][0] == l[1][0], zip(x, y))
map(lambda l: l[1], list_of_lists)
pajton
This compares the whole sublists, which doesn't seem to be what OP wanted.
Mike Graham
Also, I'm not sure this is what I would call simpler. I almost always find it nicer to use list comprehensions than filter/map with an anonymous function. For example, I think `[suby for subx, suby in zip(x, y) if subx == suby]` is (I think) exactly equivalent to your code, but a whole lot nicer to read than the `filter` and `map` version. I think `[suby for subx in x for suby in y if subx[0] == suby[0]]` is more equivalent to OP's code.
Mike Graham
@Mike I like list comprehensions as well, though the functional approcach tempts me once in a while:-). thanks for pointing mistake, corrected now.
pajton
@pajton, Using `map` and `filter` isn't really more functional than using a list comprehension.
Mike Graham
+1  A: 

You can do this a lot more simply by using sets.

set_x = set([i[0] for i in x])
set_y = set([i[0] for i in y])
matches = list(set_x & set_y)
Daniel Roseman
Thanks Daniel - I didn't know that I could specify indices within set. I thought that set would only return complete matches for all contents of the list.
Seafoid
Good idea [to work with sets]. You should stress however that the results will differ from these of the original program, with regards to ordering and possibly to duplicate values (which may or may not matter, depending on the eventual use of the results).
mjv
@Seafoid: the indices are applied to the sublists of x and y, not to any of the sets.
mjv
You should not use [] in the the set constructors. Removing the braces creates a generator expression, which does not need to allocate memory for an entire list.
mikerobi
+1  A: 
  • Use sets to obtain collections with no duplicates.

    • You'll have to use tuples instead of lists as the items because set items must be hashable.
  • The code you posted doesn't seem to generate the output you posted. I do not have any idea how you are supposed to generate that output from that input. For example, the output has 'y' and the input does not.

  • I think the design of your function could be much improved. Currently you define x, y, and match as the module level and read and mutate them explicitly. This is not how you want to design functions—as a general rule, a function shouldn't mutate something at the global level. It should be explicitly passed everything it needs and return a result, not implicitly receive information and change something outside itself.

    I would change

    x = some list
    y = some list
    match = []
    def find_match():
        for i in x:
            for j in y:
                if i[0] == j[0]:
                     match.append(j)
        return match # This is the only line I changed. I think you meant 
                     # your return to be over here?
    find_match()
    

    to

    x = some list
    y = some list
    
    
    def find_match(x, y):
        match = []
        for i in x:
            for j in y:
                if i[0] == j[0]:
                     match.append(j)
         return match
    match = find_match(x, y)
    
  • To take that last change to the next level, I usually replace the pattern

    def f(...):
        return_value = []
        for...
            return_value.append(foo)
        return return_value
    

    with the similar generator

    def f(...):
        for...
            yield foo
    

    which would make the above function

    def find_match(x, y):
        for i in x:
            for j in y:
                if i[0] == j[0]:
                     yield j
    

    another way to express this generator's effect is with the generator expression (j for i in x for j in y if i[0] == j[0]).

Mike Graham