views:

97

answers:

2

Sorry for the double post, I will update this question if I can't get things to work :)

I am trying to compare two files. I will list the two file content:

 File 1                           File 2

"d.complex.1"                     "d.complex.1"

  1                                 4
  5                                 5
  48                                47
  65                                21

d.complex.10                    d.complex.10

  46                                6
  21                                46
 109                               121
 192                               192

TI am trying to compare the contents of the two file but not in a trivial way. I will explain what I want with an example. If you observe the file content I have typed above, the d.complex.1 of file_1 has "5" similar to d.complex.1 in file_2; the same d.complex.1 in file_1 has nothing similar to d.complex.10 in file_2. What I am trying to do is just to print out those d.complex. which has nothing in similar with the other d.complex. Consider the d.complex. as a heading if you want. But all I am trying is compare the numbers below each d.complex. and if nothing matches, I want that particular d.complex. from both files to be printed. If even one number is present in both d.complex. of both files, I want it to be rejected.

My Code: The method I chose to achieve this was to use sets and then do a difference. Code I wrote was:

first_complex=open( "file1.txt", "r" )
first_complex_lines=first_complex.readlines()
first_complex_lines=map( string.strip, first_complex_lines )
first_complex.close()

second_complex=open( "file2.txt", "r" )
second_complex_lines=second_complex.readlines()
second_complex_lines=map( string.strip, second_complex_lines )
second_complex.close()


list_1=[]
list_2=[]

res_1=[]
for line in first_complex_lines:
    if line.startswith( "d.complex" ):
        res_1.append( [] )
    res_1[-1].append( line )

res_2=[]
for line in second_complex_lines:
    if line.startswith( "d.complex" ):
        res_2.append( [] )
    res_2[-1].append( line )
h=len( res_1 )
k=len( res_2 )
for i in res_1:
   for j in res_2:
       print i[0]
       print j[0]
       target_set=set ( i )
       target_set_1=set( j )
       for s in target_set:
           if s not in target_set_1:
               if s[0] != "d":
                   print s

The above code is giving an output like this (just an example): d.complex.1.dssp d.complex.1.dssp 1 48 65

d.complex.1.dssp
d.complex.10.dssp    
46
21

109

What I would like to have is:

d.complex.1
d.complex.1 (name from file2)

d.complex.1
d.complex.10 (name from file2)

I am sorry for confusing you guys, but this is all that is required.

I am so new to python so my concept above might be flawed. Also I have never used sets before :(. Can someone give me a hand here?

+2  A: 

The problem is that you are using the intersection instead of the difference :)

If you use target_set.difference(target_set_1) you will have the results you're looking for.

I'm not sure if I'm completely getting what you want, but is this what you are looking for?

def complex_file_to_dict(filename):
    out = dict()
    for line in open(filename):
        line = line.strip()
        if line.startswith('d.complex'):
            name = line
            out[name] = set()
        elif line:
            out[name].add(line)

    return out

res_1 = complex_file_to_dict('a.txt')
res_2 = complex_file_to_dict('b.txt')

for k, set_1 in res_1.iteritems():
    print k
    set_2 = res_2.get(k, set())
    for v in set_1 - set_2:
        print v
    print
WoLpH
Oh yes I got it wrong there but I would like to have which d.complex was compared as header. Also the above code does not run through the first set through all the sets in second file. Hope you understood what I meant. :)
forextremejunk
I've updated my answer with a script that might do what you want :)
WoLpH
I think he wants to iterate through all the sets in the second file for each set in the first. I've appended my answer with a modified version of your code.
miles82
+1  A: 

You need to use difference instead of intersection, since the latter will give you items that are in both sets. You can also use the set1 - set2 syntax. See the python docs for sets.

I think you're after this (thanks to Rick for the original code):

def complex_file_to_dict(filename):
    out = dict()
    for line in open(filename):
        line = line.strip()
        if line.startswith('d.complex'):
            name = line
            out[name] = set()
        elif line:
            out[name].add(line)

    return out

res_1 = complex_file_to_dict('file1.txt')
res_2 = complex_file_to_dict('file2.txt')

for k1, set_1 in res_1.iteritems():
    for k2, set_2 in res_2.iteritems():
      print k1
      print k2
      for v in set_1 - set_2:
        print v
      print

EDIT: You can change the loop to this:

for k1, set_1 in res_1.iteritems():
    for k2, set_2 in res_2.iteritems():
      print k1
      print k2,
      l = [v for v in set_1 - set_2]
      print '(' + ', '.join(l) + ')'

to get the output like this:

d.complex.1
d.complex.1 (1, 65, 48)
d.complex.1
d.complex.10 (1, 65, 48)
d.complex.10
d.complex.1 (46, 109, 192)
d.complex.10
d.complex.10 (109, 21)
miles82
This code does what it says thank you but if you have some time, please have a look at the modified question above.
forextremejunk