views:

303

answers:

2

A sample of the following text file i have is:

> 1 -4.6    -4.6    -7.6
> 
> 2 -1.7    -3.8    -3.1
> 
> 3 -1.6    -1.6    -3.1

the data is separated by tabs in the text file and the first column indicates the position.

I need to iterate through every value in the text file apart from column 0 and find the lowest value.

once the lowest value has been found that value needs to be written to a new text file along with the column name and position. Column 0 has the name "position" Column 1 "fifteen", column 2 "sixteen" and column 3 "seventeen"

for example the lowest value in the above data is "-7.6" and is in column 3 which has the name "seventeen". Therefore "7.6", "seventeen" and its position value which in this case is 1 need to be written to the new text file.

I then need a number of rows deleted from the above text file.

E.G. the lowest value above is "-7.6" and is found at position "1" and is found in column 3 which as the name "seventeen". I therefore need seventeen rows deleted from the text file starting from and including position 1

so the the column in which the lowest value is found denotes the amount of rows that needs to be deleted and the position it is found at states the start point of the deletion

+1  A: 

Open this file for reading, another file for writing, and copy all the lines that don't match the filter:

readfile = open('somefile', 'r')
writefile = open('otherfile', 'w')

for line in readfile:
  if not somepredicate(line):
    writefile.write(line)

readfile.close()
writefile.close()
Ignacio Vazquez-Abrams
Of course, at this point you may as well just write your program as a standard input filter (read from stdin, write to stdout) and do the appropriate redirections from your shell.
jemfinch
Sure, that's a viable approach (and the one I usually take).
Ignacio Vazquez-Abrams
A: 

Here's a stab at what I think you wanted (though your requirements were kind of difficult to follow):

def extract_bio_data(input_path, output_path):
    #open the output file and write it's headers
    output_file = open(output_path, 'w')
    output_file.write('\t'.join(('position', 'min_value', 'rows_skipped')) + '\n')

    #map column indexes (after popping the row number) to the number of rows to skip
    col_index = { 0: 15, 
                  1: 16, 
                  2: 17 }

    skip_to_position = 0
    for line in open(input_path, 'r'):
        #remove the '> ' from the beginning of the line and strip newline characters off the end
        line = line[2:].strip()

        #if the line contains no data, skip it
        if line == '':
            continue

        #split the columns on whitespace (change this to split('\t') for splitting only on tabs)
        columns = line.split()

        #extract the row number/position of this data
        position = int(columns.pop(0))

        #this is where we skip rows/positions
        if position < skip_to_position:  
            continue

        #if two columns share the minimum value, this will be the first encountered in the list
        min_index = columns.index(min(columns, key=float))

        #this is an integer version of the 'column name' which corresponds to the number of rows that need to be skipped
        rows_to_skip = col_index[min_index]

        #write data to your new file (row number, minimum value, number of rows skipped)
        output_file.write('\t'.join(str(x) for x in (position, columns[min_index], rows_to_skip)) + '\n')

        #set the number of data rows to skip from this position
        skip_to_position = position + rows_to_skip


if __name__ == '__main__':
    in_path = r'c:\temp\test_input.txt'
    out_path = r'c:\temp\test_output.txt'
    extract_bio_data(in_path, out_path)

Things that weren't clear to me:

  1. Is there really "> " at the beginning of each line or is that a copy/paste error?
    • I assumed it wasn't an error.
  2. Did you want "7.6" or "-7.6" written to the new file?
    • I assumed you wanted the original value.
  3. Did you want to skip rows in the file? or positions based on the first column?
    • I assumed you wanted to skip positions.
  4. You say you want to delete data from the original file.
    • I assumed that skipping positions was sufficient.
tgray