I am parsing a webpage which has Unicode representations of fractions. I would like to be able to take those strings directly and convert them to floats. For example:
"⅕" would become 0.2
Any suggestions of how to do this in Python?
I am parsing a webpage which has Unicode representations of fractions. I would like to be able to take those strings directly and convert them to floats. For example:
"⅕" would become 0.2
Any suggestions of how to do this in Python?
Since there are only a fixed number of fractions defined in Unicode, a dictionary seems appropriate:
Fractions = {
u'¼': 0.25,
u'½': 0.5,
u'¾': 0.75,
u'⅕': 0.2,
# add any other fractions here
}
Update: the unicodedata
module is a much better solution.
You want to use the unicodedata module:
import unicodedata
unicodedata.numeric(u'⅕')
This will print:
0.20000000000000001
If the character does not have a numeric value, then unicodedata.numeric(unichr[, default])
will return default, or if default is not given will raise ValueError.
Maybe you could decompose the fraction using the "unicodedata" module and then look for the FRACTION SLASH character and then it's just a matter of simple division.
For example:
>>> import unicodedata
>>> unicodedata.lookup('VULGAR FRACTION ONE QUARTER')
u'\xbc'
>>> unicodedata.decomposition(unicodedata.lookup('VULGAR FRACTION ONE QUARTER'))
'<fraction> 0031 2044 0034'
Update: I'll leave this answer here for reference but using unicodedata.numeric() as per Karl's answer is a much better idea.
In Python 3.1, you don't need the 'u', and it will produce 0.2 instead of 0.20000000000000001 .
>>> unicodedata.numeric('⅕')
0.2