ansaurus

Question

How to eliminate last digit from each of the top lines

Answer 1

+4 A:

if line.startswith('>Sequence'):
  line = line[:-2] # trim 2 characters from the end of the string

or if there could be more than one digit after the period:

if line.startswith('>Sequence'):
  dot_pos = line.rfind('.') # find position of rightmost period
  line = line[:dot_pos] # truncate upto but not including the dot

Edit for if the sequence occurs on the same line as >Sequence

If we know that there will always be only 1 digit to remove we can cut out the period and the digit with:

line = line[:13] + line[15:]

This is using a feature of Python called slices. The indexes are zero-based and exclusive for the end of the range so line[0:13] will give us the first 13 characters of line. Except that if we want to start at the beginning the 0 is optional so line[:13] does the same thing. Similarly line[15:] gives us the substring starting at character 15 to the end of the string.

mikej 2009-08-14 15:57:51

What to do if I have this case>Sequence 1.1.1 atgcgcgcgatatatashhshshshSo now I only have to remove ".1" but not a single digit on either the right side or left side of it, just the last number ".1" or whatever it is. Thanks

2009-08-14 16:07:00

Are you saying the atgcgcgcgatatat is on the same line as >Sequence 1.1.1 or it that just the way your comment has formatted? Please explain a bit more what you mean

mikej 2009-08-14 16:14:34

Sure. Yes you interpreted right. atgcgcgatga sequence is on the same line. So, in this case now I have to remove last digit from the series of digits. One thing is sure that this last digit will always be present on 15th index on every line starting with '>'.

2009-08-14 16:18:17

@Arshan please include all possible formattings in your question

Otto Allmendinger 2009-08-14 16:23:33

Here is the detailed question.>Sequence 1.1.1 atatatccchhchcasjssjsjjsjsjsjsj>Sequence 1.2.2 atatatatatatatassdjdjdjfjfjfjjjgjg>Sequence 1.2.1 atatatatatatatatatatatatatatatatNow, I have to remove last digit from every line that starts with '>'. Like in case of first line, I have to remove '.1' (rightmost) and in second case, I have to remove '.2' (rightmost).

2009-08-14 16:28:37

please consider that every line that starts with '>' is a new line.

2009-08-14 16:29:13

@Arshan you can use the 'edit' link to update and clarify your question. Then you can use all the formatting which is not available in comments.

mikej 2009-08-14 16:33:42

ok editing done. Please check.

2009-08-14 16:37:17

Thansk it helped and done :)

2009-08-14 17:37:59

Answer 2

+2 A:

map "".join(line.split('.')[:-1]) to each line of the file.

Steve B. 2009-08-14 15:58:11

Answer 3

+7 A:

import fileinput
import re

for line in fileinput.input(inplace=True, backup='.bak'):
  line = line.rstrip()
  if line.startswith('>'):
    line = re.sub(r'\.\d$', '', line)
  print line

many details can be changed depending on details of the processing you want, which you have not clearly communicated, but this is the general idea.

Alex Martelli 2009-08-14 15:58:29

Cool use of fileinput. I'd never heard of this module.

hughdbrown 2009-08-14 17:20:04

Thanks all. It helped

2009-08-14 17:23:13

So Arshan, accept an answer that's helped you most -- that's fundamental StackOverflow etiquette!

Alex Martelli 2009-08-14 18:57:22

@hughdbrown, glad you liked it -- it's a great module especially for "pseudo-inplace" alteration of textfiles.

Alex Martelli 2009-08-14 18:58:13

Answer 4

+4 A:

import re
trimmedtext = re.sub(r'(\d+\.\d+)\.\d', '$1', text)

Should do it. Somewhat simpler than searching for start characters (and it won't effect your DNA chains)

Oli 2009-08-14 15:59:03

looks great but sorry could not understood your code. I started python yesterday so could you be very kind to start with the the opening of file? Thanks.

2009-08-14 16:09:48

@Arshan: Have you read the Python tutorial yet? It should help you understand the basic steps like reading files and iterating through lines to give you the context for using this solution.

Nathan Kitchen 2009-08-14 16:24:44

Oli is using what is called a regular expression for performing substitution on the text. These patterns such as (\d+\.\d+)\.\d are a general concept and not specific to Python.

mikej 2009-08-14 16:30:46

Yes I know how to open, read and write file. But not sure about iterating lines in a file.

2009-08-14 16:31:30

You don't *need* to iterate lines with this. You can if you want to but you can just chuck the whole file through with `open('filename').read()`. And yes, my code before is based on the regex library built into python. Regex is something worth learning as it's very useful for doing operations like this. It's also great for input validation.

Oli 2009-08-14 17:54:15

Answer 5

+1 A:

Here's a short script. Run it like: script [filename to clean]. Lots of error handling omitted.

It operates using generators, so it should work fine on huge files as well.

import sys
import os

def clean_line(line):
    if line.startswith(">"):
        return line.rstrip()[:-2]
    else:
        return line.rstrip()

def clean(input):
    for line in input:
        yield clean_line(line)

if __name__ == "__main__":
    filename = sys.argv[1]

    print "Cleaning %s; output to %s.." % (filename, filename + ".clean")

    input = None
    output = None
    try:
        input = open(filename, "r")
        output = open(filename + ".clean", "w")
        for line in clean(input):
            output.write(line + os.linesep)
            print ": " + line
    except:
        input.close()
        if output != None:
            output.close()

Skurmedel 2009-08-14 16:13:19

ansaurus

tags:

views:

answers:

How to eliminate last digit from each of the top lines

related questions