I am trying to compare two files. I will list the two file content:
File 1 File 2
"d.complex.1" "d.complex.1"
1 4
5 5
48 47
65 21
d.complex.10 d.complex.10
46 5
21 46
109 121
192 192
There are totally 2000 d.complex in each file. I am trying to compare both the files but the problem is the values listed under d.complex.1 in first file has to be checked with all the 2000 d.complex entries in the second file and if the entry do not match, they are to be printed out. For example in the above files, in file1 d.complex.1 number 48 is not present in file2 d.complex.1; so that number has to be stored in a list (to print out later). Then again the same d.complex.1 has to be compared with d.complex.10 of file2 and since 1, 48 and 65 are not there, they have to be appended to a list.
The method I chose to achieve this was to use sets and then do a intersection. Code I wrote was:
first_complex=open( "file1.txt", "r" )
first_complex_lines=first_complex.readlines()
first_complex_lines=map( string.strip, first_complex_lines )
first_complex.close()
second_complex=open( "file2.txt", "r" )
second_complex_lines=second_complex.readlines()
second_complex_lines=map( string.strip, second_complex_lines )
second_complex.close()
list_1=[]
list_2=[]
res_1=[]
for line in first_complex_lines:
if line.startswith( "d.complex" ):
res_1.append( [] )
res_1[-1].append( line )
res_2=[]
for line in second_complex_lines:
if line.startswith( "d.complex" ):
res_2.append( [] )
res_2[-1].append( line )
h=len( res_1 )
k=len( res_2 )
for i in res_1:
for j in res_2:
print i[0]
print j[0]
target_set=set ( i )
target_set_1=set( j )
for s in target_set:
if s not in target_set_1:
print s
The above code is giving an output like this (just an example):
1
48
65
d.complex.1.dssp
d.complex.1.dssp
46
21
109 d.complex.1.dssp d.complex.1.dssp d.complex.10.dssp
Though the above answer is correct, I want a more efficient way of doing this, can anyone help me? Also two d.complex.1.dssp are printed instead of one which is also not good.
What I would like to have is:
d.complex.1
d.complex.1 (name from file2)
1
48
65
d.complex.1
d.complex.10 (name from file2)
1
48
65
I am so new to python so my concept above might be flawed. Also I have never used sets before :(. Can someone give me a hand here?