ansaurus

Question

How do I convert unicode characters to floats in Python?

Answer 1

+1 A:

Since there are only a fixed number of fractions defined in Unicode, a dictionary seems appropriate:

Fractions = {
    u'¼': 0.25,
    u'½': 0.5,
    u'¾': 0.75,
    u'⅕': 0.2,
    # add any other fractions here
}

Update: the unicodedata module is a much better solution.

Greg Hewgill 2009-08-12 01:27:59

Specifically, you're looking at characters U+00BC-E(http://www.unicode.org/charts/PDF/U0080.pdf) and U+2153-E (http://www.unicode.org/charts/PDF/U2150.pdf). Just search the index (http://www.unicode.org/Public/UNIDATA/Index.txt) for "vulgar".

Ben Blank 2009-08-12 01:32:25

Answer 2

+15 A:

You want to use the unicodedata module:

import unicodedata
unicodedata.numeric(u'⅕')

This will print:

0.20000000000000001

If the character does not have a numeric value, then unicodedata.numeric(unichr[, default]) will return default, or if default is not given will raise ValueError.

Karl Voigtland 2009-08-12 01:28:06

Hey, that's pretty cool!

Greg Hewgill 2009-08-12 01:31:48

Python should get a new slogan by borrowing from Apple: "There's a module for that".

John Fouhy 2009-08-12 01:36:03

Yup batteries included.

Karl Voigtland 2009-08-12 01:37:43

I didn't realize it until I just read the docs that ftp.unicode.org has a UnicodeData.txt file which is where the unicodedata module is getting all its data from.

Karl Voigtland 2009-08-12 01:40:54

I had no idea that you could do that!

mhawke 2009-08-12 01:53:21

Neither did I - that's truly amazing

Martin Beckett 2009-08-12 02:11:01

For the morbidly curious it seems the python implementation of numeric is basically just a big lookup table, see python/trunk/Objects/unicodectype.cAlso, there are obviously a lot more unicode characters with numeric values than just the standard fractions ... check out http://www.fileformat.info/info/unicode/char/0f2e/index.htm for example!

akent 2009-08-12 02:15:43

Answer 3

+1 A:

Maybe you could decompose the fraction using the "unicodedata" module and then look for the FRACTION SLASH character and then it's just a matter of simple division.

For example:

>>> import unicodedata
>>> unicodedata.lookup('VULGAR FRACTION ONE QUARTER')
u'\xbc'
>>> unicodedata.decomposition(unicodedata.lookup('VULGAR FRACTION ONE QUARTER'))
'<fraction> 0031 2044 0034'

Update: I'll leave this answer here for reference but using unicodedata.numeric() as per Karl's answer is a much better idea.

akent 2009-08-12 01:31:32

Answer 4

+1 A:

In Python 3.1, you don't need the 'u', and it will produce 0.2 instead of 0.20000000000000001 .

>>> unicodedata.numeric('⅕')
0.2

Selinap 2009-08-12 12:30:40

assert (0.2 == 0.20000000000000001) ... What you possibly meant to say is that the float produced by the unicodedata.numeric() has NOT changed, but repr() has been enhanced to produce a less frightening but still computationally equivalent answer where possible.

John Machin 2009-08-12 14:54:09

ansaurus

tags:

views:

answers:

How do I convert unicode characters to floats in Python?

related questions