tags:

views:

99

answers:

3

Hi all. In the below file I have 3 occurrences of '.1'. I want to eliminate the last one and write the rest of file to a new file. Kindly suggest some way to do it in PYTHON and thank you all.

d1dlwa_ a.1.1.1 (A:) Protozoan/bacterial hemoglobin {Ciliate (Paramecium caudatum) [TaxId: 5885]} slfeqlggqaavqavtaqfyaniqadatvatffngidmpnqtnktaaflcaalggpnawt

+7  A: 

If the file's not too horrendously huge, by far the simplest approach is:

f = open('oldfile', 'r')
data = f.read()
f.close()

data = data.replace('.1.1.1', '.1.1')

f = open('newfile', 'w')
f.write(data)
f.close()

If the file IS horrendously huge, you'll need to read it and write it by pieces. For example, if each line ISN'T too horrendously huge:

inf = open('oldfile', 'r')
ouf = open('newfile', 'w')
for line in inf:
    line = line.replace('.1.1.1', '.1.1')
    ouf.write(line)
ouf.close()
inf.close()
Alex Martelli
Quick, to the point.
hughdbrown
Instead of data = data.replace('.1.1.1', '.1.1') you can use a regular expression to remove the last '.1' no matter how many there are.import redata = re.sub( '\.1\>', '', data )
Steve K
@Steve, sure, you can do all sort of complicated things if you need them; "do the simplest thing that could possibly work" is the operating principle -- avoiding problems such as your code would give (use r'...', '\>' doesn't work in Python REs [that's word boundary in _vi_ REs; in Python it's r'\b'], AND once finally repaired your code removes ALL THREE occurrences of '.1', not just the last one...;-). Keeping it simple is thus very advisable.
Alex Martelli
thansk for help
@Arshan, always glad to help, but be sure to accept an answer that has helped you: that's the fundamental bit of SO etiquette!
Alex Martelli
the version that works with big file sizes is very clear and easy, I don't see why it shouldn't be used even if the files are small.
nosklo
@nosklo: speed. Measure both versions on a million files of a few megabytes each.
Alex Martelli
A: 

You can have something like this :


line = line.split(" ")
line[0] = line[0][0:line[0].rindex(".")]
print " ".join(line)

Not the prettiest code, but from my console tests, it works.

Geo
Thansk for help
+3  A: 

Works with any size file:

open('newfile', 'w').writelines(line.replace('.1.1.1', '.1.1') 
                                for line in open('oldfile'))
nosklo