ansaurus

Question

Python: How can I replace full-width characters with half-width characters?

Answer 1

+3 A:

I don't think there's a built-in function to do multiple replacements in one pass, so you'll have to do it yourself.

One way to do it:

>>> src = (u'１',u'２',u'３',u'４',u'５',u'６',u'７',u'８',u'９',u'１０')
>>> dst = ('1','2','3','4','5','6','7','8','9','0')
>>> string = u'a１２３'
>>> for i, j in zip(src, dst):
...     string = string.replace(i, j)
... 
>>> string
u'a123'

Or using a dictionary:

>>> trans = {u'１': '1', u'２': '2', u'３': '3', u'４': '4', u'５': '5', u'６': '6', u'７': '7', u'８': '8', u'９': '9', u'０': '0'}
>>> string = u'a１２３'
>>> for i, j in trans.iteritems():
...     string = string.replace(i, j)
...     
>>> string
u'a123'

Or finally, using regex (and this might actually be the fastest):

>>> import re
>>> trans = {u'１': '1', u'２': '2', u'３': '3', u'４': '4', u'５': '5', u'６': '6', u'７': '7', u'８': '8', u'９': '9', u'０': '0'}
>>> lookup = re.compile(u'|'.join(trans.keys()), re.UNICODE)
>>> string = u'a１２３'
>>> lookup.sub(lambda x: trans[x.group()], string)
u'a123'

Max Shawabkeh 2010-03-11 02:49:46

Answer 2

+3 A:

Using the unicode.translate method:

>>> table = dict(zip(map(ord,u'０１２３４５６７８９'),map(ord,u'0123456789')))
>>> print u'１２３'.translate(table)
123

It requires a mapping of code points as numbers, not characters. Also, using u'unicode literals' leaves the values unencoded.

jleedev 2010-03-11 03:00:20

Nice! I didn't know `unicode` had a `translate()` method different from pure `str`, though in retrospect it makes perfect sense.

Max Shawabkeh 2010-03-11 03:14:42

Answer 3

+4 A:

The built-in unicodedata module can do it:

>>> import unicodedata
>>> foo = u'１２３４５６７８９０'
>>> unicodedata.normalize('NFKC', foo)
u'1234567890'

Note that it also normalizes all sorts of other things at the same time, like separate accent marks and Roman numeral symbols.

Daniel Newby 2010-03-11 03:03:34

Answer 4

A:

Regex approach

>>> re.sub(u"[\uff10-\uff19]",lambda x:chr(ord(x.group(0))-0xfee0),u"４５６")
u'456'

S.Mark 2010-03-11 03:14:45

ansaurus

tags:

views:

answers:

Python: How can I replace full-width characters with half-width characters?

related questions