ansaurus

Question

Removing non-breaking spaces from strings using Python

Answer 1

A:

There is no indication in what you write that you're necessarily doing anything wrong: if the original string had a non-breaking space between 'Foo' and 'Bar', you now have a normal space there instead. This assumes that at some point you've decoded your input string (which I imagine is a bytestring, unless you're on Python 3 or file was opened with the function from the codecs module) into a Unicode string, else you're unlikely to locate a unicode character in a non-unicode string of bytes, for the purposes of the replace. But still, there are no clear indications of problems in what you write.

Can you clarify what's the input (print repr(myString) just before the replace) and what's the output (print repr(myString) again just after the replace) and why you think that's a problem? Without the repr, strings that are actually different might look the same, but repr helps there.

Alex Martelli 2010-04-07 18:18:26

Answer 2

+2 A:

No, u"\u00A0" is the escape code for non-breaking spaces. "\u00A0" is 6 characters that are not any sort of escape code. Read this.

Ignacio Vazquez-Abrams 2010-04-07 18:29:52

Thanks for that link Ignacio!

dontsaythekidsname 2010-04-07 18:43:45

The link you provided might be good for a beginner but it is misleading. It completely neglects Unicode normalization e.g., `'ć'` is `u'\u0107'` and it could be represented as `u'c\u0301'` http://unicode.org/reports/tr15/

J.F. Sebastian 2010-04-07 20:32:18

Answer 3

+2 A:

You don't have a unicode string, but a UTF-8 list of bytes (which are what strings are in Python 2.x).

Try

myString = myString.replace("\xc2\xa0", " ")

Better would be two switch to unicode -- see this article for ideas. Thus you could say

uniString = unicode(myString, "UTF-8")
uniString = uniString.replace(u"\u00A0", " ")

and it should also work (caveat: I don't have Python 2.x available right now), although you will need to translate it back to bytes (binary) when sending it to a file or printing it to a screen.

Kathy Van Stone 2010-04-07 18:32:48

ansaurus

tags:

views:

answers:

Removing non-breaking spaces from strings using Python

related questions