views:

188

answers:

8

I am new to programming and having some problem figuring out nested loops. I have a list of data that I want to extract from a larger file. I am able to extract one item of data from the larger file successfully but I need to extract 100 different trials from this larger file of thousands of trials. Each trial is one line of data of the larger file. This is the program I have used to extract one line of data successfully one at a time. In this example it extracts the data for trial 1. It is based off of examples I have seen in prior questions and tutorials. The problem is that I don't need trials 1-100, or any ordered pattern. I need trials 134, 274, 388, etc. It skips around. So I don't know how to do a nested loop using the for statement if it doesn't have a range that I can enter. Any help is appreciated. Thanks.

completedataset = open('completedataset.txt', 'r')

smallerdataset = open('smallerdataset.txt', 'w')


for line in completedataset:
    if 'trial1' in line: smallerdataset(line)


completedataset.close()
smallerdataset.close()

I'd really like to do it like this:

trials = ('trial12', 'trial23', 'trial34')

for line in completedataset: for trial in trials: if trial in line: smallerdataset(line)

but this isn't working. Can anyone help me modify this program so that it works correctly?

A: 

It seems to me that you need a list that holds all the trial numbers that you are interested in. So maybe you could try something like this:

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

trials = [134, 274, 388]
completedata = completedataset.readlines()

for t in trials:
    for line in completedata:
        if "trial"+str(t) in line:
            smallerdataset.write(line)
completedataset.close()
smallerdataset.close()
inspectorG4dget
A: 

You could just do this:

trials = ['trial1', 'trial134', 'trial274']

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)

For more efficient operation you could match each line with 'trial[0-9]+' -regex and look up whether the symbol can be found from a set.

Cheery
So I tried this one first since it looked the simplest and it says unexpected character after line continuation character and highlights the ':'. Suggestions?
Robert A. Fettikowski
A: 

If each trial in the complete set is a known byte size, you can use file.seek(n), where n is the byte to start reading at. For example, if each line in the file is 3 bytes long, you could do something such as:

myfile = open('file.txt', 'r')
myfile.seek(lineToStartAt * 3)

myfile.readline()#etc

If the number of bytes per line is variable or unknown, you would simply have to read in lines and discard the lines you don't care for (as in KLee1's answer)

Brian S
A: 

Assuming you know the trials ahead of time, you can do

trials = ('trial12', 'trial23', 'trial34')

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)
MikeD
So I tried this one first since it looked the simplest and it says unexpected character after line continuation character and highlights the ':'. Suggestions?
Robert A. Fettikowski
I'd really like to do it like this....can anyone help?
Robert A. Fettikowski
Which ':' does it highlight? Like Haukur suggests, it sounds like you have a backslash somewhere at the end of a line. Not really sure what I can do to help without actually looking at your code.
MikeD
A: 

You are going to run into some problems with the way you are specifying your trials. If you look for lines containing 'trial1', you will also get lines containing 'trial123'. If you larger dataset is structured in some way you can trying looking for the trial number in a particular field. For instance, if the data is comma delimited you can make use of the csv package. Finally, using a generator expression instead of the loop will make things a little cleaner. Assuming that the trial number was in the first column of your dataset you could do something like:

import csv

trials = ['trial134', 'trial1', 'trial56']
data = csv.reader(open('completedataset.txt'))

with open('smalldataset.txt','w') as outf:
    csv.writer(outf).writerows(l for l in data if l[0] in trials)
macedoine
A: 

Assuming you had a function that, looking at a line, is able to tell you whether that line is "desired", the proper structure for your code would be very simple:

with open('completedataset.txt', 'r') as completedataset:
    with open('smallerdataset.txt', 'w') as smallerdataset:
        for line in completedataset:
            if iwantthisone(line):
                smallerdataset.write(line)

The with statements take care of the closing for you. In Python 2.7, you could merge the two withs into one; in Python 2.5, you need to start your module with a from __future__ import with_statement; in Python 2.6, currently the most widespread version, the above code is the right form.

So, absolutely everything boils down to that iwantthisone function. You don't tell us anything about the format of your lines, making it impossible for us to help you much further. But assume for example that the first word in each line identifies the test, e.g. test432 ..., and you have the numbers of the tests you want in a set named want_these, e.g set([113, 432, 251, ...]). Then, a very simple way to write iwantthisone might be:

def iwantthisone(line):
    firstword = line.split(None, 1)[0]
    testnumber = int(firstword[4:])
    return testnumber in want_these

The proper contents of iwantthisone entirely depend on your lines' format and how do you tell what lines you actually do want to keep, of course. But I hope this general structure still helps.

Note that there are really no nested loops in this general structure I recommend!-)

Alex Martelli
A: 

About the error message you show in in your comments: the line continuation character is a backslash, so it's telling you that you have an errant backslash character somewhere in that line.

Haukur Hreinsson
A: 

Assuming the lines always start with the trial identifier you can use the startswith function and filter to pull out the ones you want.

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

wantedtrials = ('trial134', 'trial274', 'trial388')

for line in completedataset:
    if filter(line.startswith, wantedtrials):
        smallerdataset.write(line)