tags:

views:

101

answers:

2

Hi:) I am not able to figure out what the error in the program is could you please help me out with it. Thank you..:)

The input file contains the following:

3.  भारत का इतिहास काफी समृद्ध एवं विस्तृत है।
57. जैसे आज के झारखंड प्रदेश से, उन दिनों, बहुत से लोग चाय बागानों में मजदूरी करने के उद्देश्य से असम आए।

( its basically sample sentences for which i need to get word positions in the output appended to each word in hindi)

for e.g the output for the first sentence would look like this:

3.  भारत(1) का(2) इतिहास(3) काफी(4) समृद्ध(5) एवं(6) विस्तृत(7) है(8) ।(9)

I should get a similar op for the following sentence(s)

The code looks like this:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# encoding: utf-8
separators = [u'।', ',', '.']
text = open("hinstest1.txt").read()
#This converts the encoded text to an internal unicode object, where
# all characters are properly recognized as an entity:
text = text.decode("UTF-8")
#this breaks the text on the white spaces, yielding a list of words:
words = text.split()

counter = 1

output = ""
#if the last char is a separator, and is joined to the word:
for word in words:
    if word[-1] in separators and len(word) > 1:
        #word up to the second to last char:
        output += word[:-1] + u'(%d) ' % counter
        counter += 1
        #last char
        output += word[-1] +  u'(%d) ' % counter
    else:
        output += word + u'(%d) ' % counter
        counter += 1

    print output

The error I am getting is:

  File "pyth_hinwp.py", line 22
    output += word[-1] +  u'(%d) ' % counter
                         ^
SyntaxError: invalid syntax

I know this question is something similar to what I have asked earlier, but since I am not able to successfully execute some of the answers given to me earlier hence I am kinda restructuring the question to the place where I am currently getting stuck.

A: 

If you have syntax error, your editor may be showing it before even running it? I any case why don't you try removing that char where error is being indicated, because I am not able to replicate problem, after copying your code.

Anurag Uniyal
+3  A: 

What is posted here does not have the error. Note that what is posted has TWO space characters between the + and the u in output += word[-1] + u'(%d) ' % counter. What is probably happening is that you have a whitespace character other than a space in there. A possibility is NBSP (U+00A0) aka "no-break space". What SO does to format your code is likely to scrub away such things.

Diagnosis: At the Python interactive prompt, type

open("pyth_hinwp.py").readlines()[22-1]

What do you see between the + and the u?

Fix: in your editor, delete both characters between the + and the u. Insert a single space.

By the way, with a syntax error, the problem is entirely within the named SOURCE file; the code has not been run (because it couldn't be compiled) and so what is in your INPUT file has no bearing on the problem.

John Machin
Thank you for your response:) , I tried running what you said at the interactive prompt. This is what I got :"\t\toutput += word[-1] +\xc2\xa0u'(%d) ' % counter\r\n"What do you think can I do to rectify this error?
mgj
`'\xc2\xa0'` is as I guessed an NBSP (U+00A0) encoded in UTF-8. Fix == rectify. Generalising what I wrote in my answer, use an editor to delete whatever is between the + and the u and then insert a single space.
John Machin
Also, do not use any "word processing" editor of any kind to produce Python code ever. You must use the barest, simplest text-only editor. Spacing matters, and invisible characters (like a non-breaking space) are impossible to diagnose. Use `idle` or `komodo edit` or `BBEdit` or some programming tool. Do not use a word processor.
S.Lott