views:

69

answers:

1

The following matches in Idle, but does not match when run in a method in a module file:

import re
re.search('\\bשלום\\b','שלום עולם',re.UNICODE)

while the following matches in both cases:

import re
re.search('שלום','שלום עולם',re.UNICODE)

(Notice that stackoverflow erroneously switches the first and second items in the line above as this is a right to left language)

How can I make the first code match inside a py file?

Update: What I should have written for the first segment is that it matches in Idle, but does not match when run in eclipse console with PyDev.

+2  A: 

Seems to work for me when I'm using unicode strings:

# -*- coding: utf-8 -*-

import re
match = re.search(u'\\bשלום\\b', u'שלום עולם', re.U)

See it in action: http://codepad.org/xWz5cZj5

Kobi
Is the `# coding=utf-8` notation the same as `# -*- coding: utf-8 -*-`? I'm asking because it's the first time I see it like this. If not, please correct it.
ΤΖΩΤΖΙΟΥ
@ΤΖΩΤΖΙΟΥ - sorry to disappoint you, but I don't know. `:|` I don't know any Python, in fact, and learned every bit from Google and the documentations. I did that odd thing because I want to learn Python (one day), and I know Hebrew.
Kobi
No disappointment here, don't worry; you did fine for someone not knowing Python :) It was possible that there was an alternative notation that I didn't know. I corrected it for you.
ΤΖΩΤΖΙΟΥ
@ΤΖΩΤΖΙΟΥ - No problem. The code I posted doesn't work without it, but I guess codepad.org isn't an accurate representation. Thanks!
Kobi
@ΤΖΩΤΖΙΟΥ: See http://docs.python.org/reference/lexical_analysis.html#encoding-declarations
John Machin