tags:

views:

127

answers:

1

Hi all. I have a file.

Sequence 1.1.1 ATGCGCGCGATAAGGCGCTA
ATATTATAGCGCGCGCGCGGATATATATATATATATATATT
Sequence 1.2.2 ATATGCGCGCGCGCGCGGCG
ACCCCGCGCGCGCGCGGCGCGATATATATATATATATATATT
Sequence 2.1.1 ATTCGCGCGAGTATAGCGGCG

NOW,I would like to remove the last digit from each of the line that starts with '>'. For example, in this first line, i would like to remove '.1' (rightmost) and in second instance i would like to remove '.2' and then write the rest of the file to a new file. Thanks,

A: 
import re

input_file = open('in')
output_file = open('out', 'w')

for line in input_file:
    line = re.sub(r'(\d+[.]\d+)[.]\d+', r'\1', line)
    output_file.write(line)
Roberto Bonvallet
Thanks so much. It worked.
One issue left please. It is working when I have 1.9.1 etc so it removes last digit but when I have 2.10.1, it does not work.
Resloved. Thank you.
I fixed it so it works with numbers with more than one digit.
Roberto Bonvallet