Hi, I have several long lists in python and have compare them and find the lists that are equal to each other except the last elements in the them. Which is the fastest way?
A:
To compare two lists, I think something like this would avoid copying any part of your lists, and stops as soon as a mismatch is found:
len(a)==len(b) and all(a[i] == b[i] for i in range(len(a)-1))
To find all matches in an arbitrary set of lists, I think you'd need to compare each pair of lists -- or at least, each pair which you haven't checked some equivalent version of (e.g., if A=B and B=C, you don't need to check A=C). I don't know offhand of an algorithm that makes this simple.
Alternatively, if the lists are outrageously long and you want to avoid traversing them, you could maybe compute a checksum of the first N-1 elements of each, and then just compare the checksums.
Ken
2010-10-05 20:16:14
How would computing a checksum avoid traversing the lists?
Nathon
2010-10-05 20:18:16
If you need to compare every list against every other, you're looking at N*(N-1)/2 list traversals. If you compute hashes for each, you only have N traversals of the original lists. Depending on how many lists "several" is and how long the "long lists" are, that could avoid a whole lot of traversing! Provided the checksum is good enough, anyway.
Ken
2010-10-05 20:25:25
This answer could potentially compare every element in a list to another of the same length -- not all except for the last element, so it doesn't address the question.
martineau
2010-10-05 20:46:29
Ken: You don't need hashing, you can just do `all(map(equal, *list_of_lists))`, where `equal = lambda xs: reduce(operator.eq, xs)`.
Piet Delport
2010-10-05 20:49:31
martineau: You're right, I forgot a `-1`. But the other answers don't even try to address many parts of the question, like "fastest" or "find the lists that are equal to each other". :-)
Ken
2010-10-05 21:15:58
Piet: I can't tell what you're trying to do there. How does that "find the lists that are equal to each other"? I made a list-of-lists-of-ints and it doesn't even run here (`TypeError: <lambda>() takes exactly 1 argument (... given)`). It returns one bool -- what is that value supposed to be?
Ken
2010-10-05 21:22:04
Forget Python, forget lists. If you have a big set of big objects and want to "find [those] that are equal to each other", you need to compare each combination. Hashing is simply a way to avoid expensive comparisons (in this case, traversal of long lists).
Ken
2010-10-05 21:31:00