tags:

views:

79

answers:

1

I have a large file called fulldataset. I would like to write lines from fulldataset to a new file called newdataset. I only want to write the lines from fulldataset though that contain the id numbers present in the listfile. Also all the id numbers start with XY. The id numbers occur in the middle of each line though.

Here is an example line from list file:

Robert, Brown, "XY-12344343", 1929232, 324934923, 

Here is the program I have so far. It runs fine, but doesn't write anything into the new file.

datafile = open('C:\\listfile.txt', 'r')
completedataset = open('C:\\fulldataset.txt', 'r')
smallerdataset = open('C:\\newdataset.txt', 'w')

matchedLines = []

for line in datafile:
    if line.find("XY"):
        matchedLines.append( line )

counter = 1
for line in completedataset:
    print counter
    counter +=1

    for t in matchedLines:
        if t in line:
            fulldataset.write(line)
            del line
            break

datafile.close()
completedataset.close()
fulldataset.close()

EDIT:

Ok here is the new program:

datafile = open('C:\\tryexcel33.txt', 'r')
completedataset = open('C:\\fulldataset.txt', 'r')
smallerdataset = open('C:\\newdataset.txt', 'w')


counter = 1
for line in completedataset:
    print counter
    counter +=1

    if any( id in line for id in datafile ):
        smallerdataset.write( line )
        break

datafile.close()
completedataset.close()
fulldataset.close()

I still don't have anything being written to the new file. I think a problem might be that in the full file the id numbers have a " in front of them but this doesn't exist in the listfile. Any thoughts?

+1  A: 

I don't understand your code. Here's the code to do what you've asked:

ids = set( datafile.readlines( ) )
for line in fulldataset:
    if any( id in line for id in ids ):
        smallerdataset.write( line )

EDIT: I did the best I could with incomplete data. The fact that the IDs in the fulldataset are prefixed with XY is irrelevant, since we are searching through the whole string anyway ("foo" in "XY-foo" is still true). If no lines are being written, that's because the lines of datafile are not exactly IDs. Please post a sample from datafile.

You are also reusing the variable line, which will make your code go wrong in mysterious ways.

You also have a break statement, which will cause at most one line to be written. Why?


EDIT

Many apologies, I just re-read the code -- for some reason I had assumed that datafile was a list. It's actually a file, so my previous code won't work. Please see the fixed code.

katrielalex
+1 - I was writing the same thing!
laurent-rpnet
It still didn't work. See my edit....
Robert A. Fettikowski
@Robert: See above.
katrielalex
@Robert: Many apologies, I was silly. Please see edit.
katrielalex
@killown: dude, chill -- you've been aggresively downvoting some people's perfectly good answers over the last few days. I have read the Style Guide (in fact, I've read *all* the PEPs, though not in great depth). I don't like the look of whitespace-free parentheses. If you want to follow the style guide, go ahead. I'm sticking with my way.
katrielalex
@katrielalex follow your way is follow wrong way, you're persuing me in my answers and giving me down votes with no sense and you trying manipulate the facts, if you keep doing this i will flag and report you, try accept you're wrong, i am just doing the right thing,post a code who goes against the style guide is a bad idea since anyone can just think that is a right style way to follow
killown
@killown: Do whatever you feel best; you seem to think I am "persuing" you. The style guide is just that: a __guide__. I only downvote when I feel it's merited.
katrielalex
@katrielalex: and you merited for the downvote for not follow the styleguide, try to be more polite to the people, nobody is earning money to be answering questions, his criticisms are as follows: "your code is useless" or "its a waste" Like You Said few minutes ago.
killown