I have 2 large logfiles. I want to see if a device is in a but not b and vice versa (exclude lines where the device is common) the files look like this example.
04/09/2010,13:11:52,Authen OK,user1,Default Group,00-24-2B-A1-08-88,29,10.1.1.1,(Default),,,,,,13,EAP-TLS,,device1,
04/19/2010,15:35:24,Authen OK,user2,Default Group,00-24-2B-A1-05-EA,29,10.1.1.2,(Default),,,,,,13,EAP-TLS,,device2,
04/09/2010,13:11:52,Authen OK,user3,Default Group,00-24-2B-A1-08-88,29,10.1.1.3,(Default),,,,,,13,EAP-TLS,,device3,
04/19/2010,15:35:24,Authen OK,user4,Default Group,00-24-2B-A1-05-EA,29,10.1.1.4,(Default),,,,,,13,EAP-TLS,,device4,
to reiterate, I need device (field [-2]) and IP (field [7]) for each device that is in logfile a but not b, and is in b but not a
Here's what I've done so far, but seems a little clunky and is very slow (each file has about 400K lines). I'm cross referring twice. Can anyone suggest efficiencies please? Perhaps I am using the wrong logic??
chst={}
chbs={}
for i,line in enumerate(open('chst.txt').readlines()):
line=line.split(',')
chst[line[-2]+','+str(i)]=','.join(line)
for i,line in enumerate(open('chbs.txt').readlines()):
line=line.split(',')
chbs[line[-2]+','+str(i)]='.'.join(line)
print "these lines are in CHST but not in CHBS"
for a in chst:
if a.split(',')[0] not in str(chbs.values()):
line=chst[a].split(',')
print line[-2], line[7]
print "\nthese lines are in CHBS but not in CHST"
for a in chbs:
if a.split(',')[0] not in str(chst.values()):
line=chbs[a].split(',')
print line[-2], line[7]