tags:

views:

55

answers:

2

Hi,

I have two files. One is a csv and contains the search strings (one per line) and the other is a huge file which contains the search term at the start of each line but has extra information after which I would like to extract.

The search terms file is called 'search.csv' and looks like this:

3ksr

3ky8

2g5w

2gou

The file containing the other info is called 'CSA.txt' and looks like this:

3ksr,INFO.....

3ky8,INFO.....

2g5w,INFO.....

2gou,INFO.....

However, it is a very big file (over 8mb) and each search term has more than one occurence but the information is different for every occurence. I have some sample code:

import fileinput
import csv

csa = fileinput.input("CSA.dat", inplace=1)
pdb = csv.reader(open("search.csv"))
outfile = csv.writer(open("outfile.csv"), dielect = 'excel', delimiter = '\t')

for id in pdb:
    for line in csa:
        if id in str(line):
            outfile.writerow([id, line])

csa.close()

However, this code doesnt work and seems to delete CSA.dat every time I try and run it (its backed up in an archive), or it says 'Text file busy'. Please help! Thanks in advance!

+1  A: 

Depending on how many search terms you have, and assuming they're all 4 characters:

terms = open('search.csv').split(',')

with open('CSV.dat', 'r') as f:
   for line in f:
       if line[:4] in terms:
           #do something with line
           print line

if they're not 4 chars you can do line[:line.find(',')] that will return either up to the first ',', or if that's not found it will return the entire line.

edit: I had never heard of fileinput, but I just looked at it and "you're doing it wrong."

Helper class to quickly write a loop over all standard input files.

fileinput is for passing files to your program as command line arguments, which you're not doing. open(filename, mode) is how you open files in Python.

And for something that (seems) this simple, the csv reader is overkill, though it's probably worth using to write your file if you really want it in an excel format.

Wayne Werner
csv module is handy if there's any quoting involved in either direction...
bstpierre
A: 

It appears that the deletion of CSA.dat happens because you say inplace=1 in the fileinput constructor.

ssegvic