ansaurus

Question

Pythonic way to compare two lists and print the unmatched items?

Answer 1

A:

The Python stdlib has a class, difflib.SequenceMatcher that looks like it can do what you want, though I don't know how to use it!

Ned Batchelder 2010-08-19 21:03:16

Answer 2

+5 A:

The equivalent of what you're currently doing, but the other way around, is:

unmatched_items_10 = [d for d in entries10 if d not in entries9]

While more concise than your way of coding it, this has the same performance problem: it will take time proportional to the number of items in each list. If the lengths you're interested in are about 9 or 10 (as those numbers seem to indicate), no problem.

But for lists of substantial length you can get much better performance by sorting the lists and "stepping through" them "in parallel" so to speak (time proportional to N log N where N is the length of the longer list). There are other possibilities, too (of growing complication;-) if even this more advanced approach is not sufficient to get you the performance you need. I'll refrain from suggesting very complicated stuff unless you indicate that you do require it to get good performance (in which case, please mention the typical lengths of each list and the typical contents of the dicts that are their items, since of course such "details" are the crucial consideration for picking algorithms that are a good compromise between speed and simplicity).

Edit: the OP edited his Q to show what he cares about, for any two dicts d1 and d2 one each from the two lists, is not whether d1 == d2 (which is what the in operator checks), but rather d1[a]==d2[a] and d1[b]==d2[b]. In this case the in operator cannot be used (well, not without some funky wrapping, but that's a complication that's best avoided when feasible;-), but the all builtin replaces it handily:

unmatched_items_10 = [d for d in entries10
                      if all(d[a]!=d1[a] or d[b]!=d2[b] for d2 in entries9)]

I have switched the logic around (to != and or, per De Morgan's laws) since we want the dicts that are not matched. However, if you prefer:

unmatched_items_10 = [d for d in entries10
                      if not any(d[a]==d1[a] and d[b]==d2[b] for d2 in entries9)]

Personally, I don't like if not any and if not all, for stylistic reasons, but the maths are impeccable (by what the Wikipedia page calls the Extensions to De Morgan's laws, since any is an existential quantifier and all a universal quantifier, so to speak;-). Performance should be just about equivalent (but then, the OP did clarify in a comment that performance is not very important for them on this task).

Alex Martelli 2010-08-19 21:08:59

Thank you for this detailed answer. Performance isn't an issue - it's a one-off script to clean up some data and it doesn't matter how long it takes. Unfortunately though, 'd not in entries9' doesn't work, because the match condition is more complicated - I have to compare certain fields. It's more like "if d[a]==entries9_item[a] and d[b]==entries9_item[b]". I'll update the question to make this clearer.

AP257 2010-08-19 21:28:11

@AP257, it _would_ have been nice of you to mention that in the first place, you know -- equality checks are obviously special cases, and that's what you were using;-). Anyway, editing my answer to show how the code changes.

Alex Martelli 2010-08-20 01:43:13

Sorry. Thank you for this - very neat use of all() and any(). To get the joint_items list, do you think I should simply do "joint_items = [d for d in entries10 if all(d[a]==d1[a] or d[b]==d2[b] for d2 in entries9)]"? That seems repetitive, but probably safer than messing around with the original objects.

AP257 2010-08-20 10:41:11

@AP257, safer, yes, though you want `any`, not `all`. Though _three_ loops are starting to stretch it, there's no good clean way to do it with a single loop, so the potential performance gains are small. If the `d[a]` and `d[b]` for every `d` are hashable, there are much faster ways of course (but you did says you don't care much about performance here, so I'd just do three loops).

Alex Martelli 2010-08-20 14:25:42

great. thanks again.

AP257 2010-08-20 15:16:50

Answer 3

A:

You may consider using sets and their associated methods, like intersection. You will however, need to turn your dictionaries into immutable data so that you can store them in a set (e.g. strings). Would something like this work?

a = set(str(x) for x in entries9)
b = set(str(x) for x in entries10)  

# You'll have to change the above lines if you only care about _some_ of the keys

joint_items = a.union(b)
unmatched_items = a - b

# Now you can turn them back into dicts:
joint_items     = [eval(i) for i in joint_items]
unmatched_items = [eval(i) for i in unmatched_items]

scrible 2010-08-20 01:13:39

I would use `dict.items` and `dict` rather then `str` and `eval`, if possible.

DiggyF 2010-08-20 17:17:48

ansaurus

tags:

views:

answers:

Pythonic way to compare two lists and print the unmatched items?

related questions