ansaurus

Question

Python - appending word position nos. to Unicode text

Answer 1

A:

Try changing:

counter = 1
for line in words:
    # etc...

to:

for line in words:
    counter = 1
    # etc...

This will reset the counter to 1 for each new line.

Mark Byers 2010-02-20 11:18:56

Hello Sir..:) Thank you for your answer. I was able to kind of solve the previous query with help of my project guide wrt mapping in python by placing characters like 'AA' in the dictionary first and then 'A'..( I hope you remember we were discussing this) i tried it out its not working, I am still not getting any op in the console or in the file, sir how can I also prevent serial nos. from having word positions, If you can please look into it also Sir as that also was an added question.

mgj 2010-02-20 12:43:03

Answer 2

+2 A:

I think your code should something like:

# the input part is fine as is
lines = text.split('\n')
outlines = []
for line in lines:
    lout = []
    counter = 1
    for i, word in enumerate(lines.split()):
        if i == 0:  # leave 1st word of line alone, it's a marker:
            lout.append(word)
            continue
        # process each and every other word
        if word[-1] in separators and len(word) > 1:
            lout.append(word[:-1] + (u'(%d) ' % counter) +
                        word[-1] + (u'(%d) ' % counter+1))
            counter += 1
        else :
            lout.append(word + u'(%d)' % counter)
        counter += 1
    outlines.append(' '.join(lout))

f1=open(output_file,'w')
f1.write('\n'.join(outlines))
f1.close()

Can't test this code, so there might be minor issues left, but I think the main principles in it are sound: work on two levels (by line within fine, with \n as separator, and by word within line, with space as separator) and each time use lists (with append and join) rather than build up strings by pieces.

Alex Martelli 2010-02-20 12:47:12

Hello Sir..:) Thank you for your answer. I was able to kind of solve the previous query with help of my project guide wrt mapping in python by placing characters like 'AA' in the dictionary first and then 'A'..( I hope you remember we were discussing this with even Mark Byers Sir). Sir are you the author of the book Python Cookbook and Python in a NutShell.. Getting your guidance is an honour Sir...:) Thank you....:) God Bless..

mgj 2010-02-20 14:36:11

@mgj, yep I wrote the Cookbook and Nutshell, tx for the kind words. Funny to get kind words and no upvote though;-).

Alex Martelli 2010-02-20 16:04:28

@Alex: Don't feel bad, I upvoted you :-). It's funny you found non-upvoting funny but not the use of "Sir".

Alok 2010-02-20 17:31:33

@Alok, I do call people "Sir" sometimes (though in conversation rather than writing) so why should I find it funny if others do too?-)

Alex Martelli 2010-02-20 18:29:38

Answer 3

+1 A:

This code will give you the desired output. I added a check for the number at the start of the line, which should not be numbered.

I adapted your original code, which was (mostly) working. You just needed to reset the counter at the end of an input line, and add a newline to your output as well.

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# encoding: utf-8

import re

list1 = []
separators = [u'।', ',', '.']
text = open('hinstest1.txt').read().decode('UTF-8')
output_file = ('ophwp1.txt')

for line in text.splitlines():
    counter = 1
    output = ''
    for word in line.split():
        # Special case for the number at the start of the line
        # The regex matches one or more decimal digits (\d+) followed by a dot (\.)
        if re.match(r'\d+\.', word):
            output += word + ' '
            continue
        # Special case: the last char is a separator joined to the word
        if word[-1] in separators and len(word) > 1:
            # word up to the second to last char
            output += word[:-1] + u'(%d) ' % counter
            counter += 1
            # last char
            output += word[-1] + u'(%d) ' % counter
            counter += 1
        else:
            output += word + u'(%d) ' % counter
            counter += 1
    output += u'\n'
    list1.append(output.encode('UTF-8'))

f1=open(output_file,'w')
f1.write(''.join(list1))
f1.close()

I tested this code on the input file you provided and, for the most part, I retained your coding style.

Danilo Piazzalunga 2010-02-20 12:55:22

Thank you very much for your time...:) The code works perfectly well..:) Thanks again Danilo..:)

mgj 2010-02-20 14:15:20

ansaurus

tags:

views:

answers:

Python - appending word position nos. to Unicode text

related questions