ansaurus

Question

Answer 1

+1 A:

You are looking for a symmetric difference:

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }

diff = chst ^ chbs

If you need the asymmetric differences, use -:

chst - chbs # tuples in chst but not in chbs
chbs - chst # tuples in chbs but not in chst

If you need the actual line, instead of a tuple ( device, IP ) you can use dictionaries instead of sets:

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }

diff = chst.items( ) ^ bar.items( )

This works because dict.items( ) returns a view on the items, which has setlike properties. Note that this is called dict.viewitems( ) in Python 2.x.

katrielalex 2010-08-11 08:42:31

The sets module is deprecated since Python 2.6. Starting from 2.6, set and frozensets are indeed builtins.

Jim Brissom 2010-08-11 08:52:38

Oops, the backporting team *has* been busy! Fixed.

katrielalex 2010-08-11 08:58:15

I also quite sure just calling items won't work (and is also unrelated to dict views) - you would have to call viewitems on that dict, supported starting with 2.7. The items method just returns a list of key/value pairs,and for lists, the ^ operator is not supported, whereas viewitems returns an actual view of type dict_ietms.

Jim Brissom 2010-08-11 09:04:55

I tried it before posting and it works in Py3k.

katrielalex 2010-08-11 09:14:41

Thanks all, here's what workedthe top two lines from first answer plus the two x - y. I then joined those strings and tested it. It run very quickly on the large datasets and I did some sample searches of the results in the files to test and all seemed good. Well done

Bill 2010-08-11 10:56:11

Answer 2

A:

There's a bug in line 9 where you are doing ='.'.join(line) instead of =','.join(line) i.e. a dot in the quotes instead of a comma. Or maybe the lines in chbs should be split on dots instead of commas later.

At the moment if there are three lines for device7 is in chbs but not chst the script will tell you three times, but your description of the problem implies that you don't need to know how many times it appears. Do you really want that or is a single report OK for multiple occurrences? In that case you could simplify it by just using the device name as the dictionary key and checking if the other dictionary has that key.

Also at the moment you're recording the line numbers, but not really using them. If you do need to know how many times a device appears why not report that instead of having to count them? In which case when adding a device key to the dictionary first check if it's already there and if so increment a counter (perhaps in another dictionary also keyed by the device name).

Simon Hibbs 2010-08-11 09:14:55

Thanks Simon, indeed a typo. The way I had it took way too long anyway so grateful for answer above

Bill 2010-08-11 10:53:54

ansaurus

tags:

views:

answers:

python analyse 2 logfiles

related questions