Hi,
I'm using the Python Shell in this way:
>>> s = 'Ã'
>>> s
'\xc3'
How can I print s variable to show the character Ã??? This is the first and easiest question. Really, I'm getting the content from a web page that has non ascii characters like the previous and others with tilde like á, é, í, ñ, etc. Also, I'm trying to execute a regex with these characters in the pattern expression against the content of the web page.
How can solve this problem??
This is an example of one regex:
u'<td[^>]*>\s*Definición\s*</td><td class="value"[^>]*>\s*(?P<data>[\w ,-:\.\(\)]+)\s*</td>'
If I use Expresson application works fine.
EDIT[05/26/2009 16:38]: Sorry, about my explanation. I'll try to explain better.
I have to get some text from a page. I have the url of that page and I have the regex to get that text. The first thing I thought was the regex was wrong. I checked it with Expresso and works fine, I got the text I wanted. So, the second thing I thought was to print the content of the page and that was when I saw that the content was not what I see in the source code of the web page. The differences are the non ascii characters like á, é, í, etc. Now, I don't know what I have to do and if the problem is in the encoding of the page content or in the pattern text of the regex. One of the regex I've defined is the previous one.
The question wolud be: is there any problem using regex which pattern text has non ascii characters???