views:

50

answers:

1

Hi all

first a short summery:

python ver: 3.1 system: Linux (Ubuntu)

I am trying to do some data retrieval through Python and BeautifulSoup.

Unfortunately some of the tables I am trying to process contains cells where the following text string exists:

789.82 ± 10.28

For this i to work i need two things:

How do i handle "weird" symbols such as: ± and how do i remove the part of the string containing: ± and everything to the right of this?

Currently i get an error like: SyntaxError: Non-ASCII charecter '\xc2' in file ......

Thank you for your help

[edit]:

# dataretriveal from html files from DETHERM
# -*- coding: utf8 -*-

import sys,os,re
from BeautifulSoup import BeautifulSoup


sys.path.insert(0, os.getcwd())

raw_data = open('download.php.html','r')
soup = BeautifulSoup(raw_data)


for numdiv in soup.findAll('div', {"id" : "sec"}):
    currenttable = numdiv.find('table',{"class" : "data"})
    if currenttable:
        numrow=0
        for row in currenttable.findAll('td', {"class" : "dataHead"}):
            numrow=numrow+1

        for col in currenttable.findAll('td'):
            col2 = ''.join(col.findAll(text=True))
            if col2.index('±'):
                col2=col2[:col2.indeindex('±')]
            print(col)
        print(numrow)
        ref=numdiv.find('a')
        niceref=''.join(ref.findAll(text=True))
        print(niceref)

Now this code is followed by an error of:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

Where did the ASCII reference pop up from ?

A: 

You need to have your Python file encoded in utf-8. Otherwise, it's quite trivial:

>>> s = '789.82 ± 10.28'
>>> s[:s.index('±')]
'789.82 '
>>> s.partition('±')
('789.82 ', '±', ' 10.28')
SilentGhost
I thourght IDLE kept track of that.
Daniel
@Daniel: of what?
SilentGhost
The python file encoding for the sourcefile, i have include the code together with the new warning recieved
Daniel
@Daniel: what it has to do with IDLE? Are you using it to run your script?
SilentGhost
nah, script is run through GNOME terminal. I am just looking for the correct place to change encodings for the file.
Daniel