views:

70

answers:

2

Hi, I have a certain check to be done and if the check satisfies, I want the result to be printed. Below is the code:

import string
import codecs
import sys
y=sys.argv[1]

list_1=[]
f=1.0
x=0.05
write_in = open ("new_file.txt", "w")
write_in_1 = open ("new_file_1.txt", "w")
ligand_file=open( y, "r" ) #Open the receptor.txt file
ligand_lines=ligand_file.readlines() # Read all the lines into the array
ligand_lines=map( string.strip, ligand_lines ) #Remove the newline character from all     the pdb file names
ligand_file.close()

ligand_file=open( "unique_count_c_from_ac.txt", "r" ) #Open the receptor.txt file
ligand_lines_1=ligand_file.readlines() # Read all the lines into the array
ligand_lines_1=map( string.strip, ligand_lines_1 ) #Remove the newline character from all the pdb file names
ligand_file.close()
s=[]
for i in ligand_lines:
   for j in ligand_lines_1:
      j = j.split()
      if i == j[1]:
     print j

The above code works great but when I print j, it prints like ['351', '342'] but I am expecting to get 351 342 (with one space in between). Since it is more of a python question, I have not included the input files (basically they are just numbers).

Can anyone help me?

Cheers,

Chavanak

+4  A: 

To convert a list of strings to a single string with spaces in between the lists's items, use ' '.join(seq).

>>> ' '.join(['1','2','3'])
'1 2 3'

You can replace ' ' with whatever string you want in between the items.

Mark Rushakoff
and add 'print' to the front of that if you're not using the interactive prompt.
dash-tom-bang
`str.join` takes any iterable of strings as its argument. The one you passed was a *list*, not an array; "array" refers to a different, not-commonly-used type in Python.
Mike Graham
Yeah, I've been looking at too much Ruby lately.
Mark Rushakoff
+2  A: 

Mark Rushakoff seems to have solved your immediate problem, but there are some other improvements that could be made to your code.

  • Always use context managers (with open(filename, mode) as f:) for opening files rather than relying on close getting called manually.
  • Don't bother reading a whole file into memory very often. Looping over some_file.readilines() can be replaced with looping over some_file directly.

    • For example, you could have used map(string.strip, ligland_file) or better yet [line.strip() for line in ligland_file]
  • Don't choose names to include the type of the object they refer to. This information can be found other ways.

For exmaple, the code you posted can be simplified to something along the lines of

import sys
from contextlib import nested

some_real_name = sys.argv[1]
other_file = "unique_count_c_from_ac.txt"

with nested(open(some_real_name, "r"), open(other_file, "r")) as ligand_1, ligand_2:
    for line_1 in ligand_1:
        # Take care of the trailing newline
        line_1 = line_1.strip()

        for line_2 in ligand_2:
            line_2 = line2.strip()

            numbers = line2.split()

            if line_1 == numbers[1]:
                # If the second number from this line matches the number that is 
                # in the user's file, print all the numbers from this line
                print ' '.join(numbers)

which is more reliable and I believe more easily read.

Note that the algorithmic performance of this is far from ideal because of these nested loops. Depending on your need, this could potentially be improved, but since I don't know exactly what data you need to extract to tell you whether you can.

The time this takes currently in my code and yours is O(n*m*q), where n is the number of lines in one file, m is the number of lines in the other, and q is the length of lines in unique_count_c_from_ac.txt. If two of these are fixed/small, then you have linear performance. If two can grow arbitrarily (I sort of imagine n and m can?), then you could look into improving your algorithm, probably using sets or dicts.

Mike Graham