How can I split a line in Python at a non-printing ascii character (such as the long minus sign hex 0x97 , Octal 227)? I won't need the character itself. The information after it will be saved as a variable.
You can use re.split
.
>>> import re
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
Adjust the pattern to only include the characters you want to keep.
See also: stripping-non-printable-characters-from-a-string-in-python
Example (w/ the long minus):
>>> # \xe2\x80\x93 represents a long dash (or long minus)
>>> s = 'hello – world'
>>> s
'hello \xe2\x80\x93 world'
>>> import re
>>> re.split("\xe2\x80\x93", s)
['hello ', ' world']
Or, the same with unicode:
>>> # \u2013 represents a long dash, long minus or so called en-dash
>>> s = u'hello – world'
>>> s
u'hello \u2013 world'
>>> import re
>>> re.split(u"\u2013", s)
[u'hello ', u' world']
Just use the string/unicode split method (They don't really care about the string you split upon (other than it is a constant. If you want to use a Regex then use re.split)
To get the split string either escape it like the other people have shown "\x97"
or
use chr(0x97) for strings (0-255) or unichr(0x97) for unicode
so an example would be
'will not be split'.split(chr(0x97))
'will be split here:\x97 and this is the second string'.split(chr(0x97))
_, _, your_result= your_input_string.partition('\x97')
or
your_result= your_input_string.partition('\x97')[2]
If your_input_string
does not contain a '\x97'
, then your_result
will be empty. If your_input_string
contains multiple '\x97'
characters, your_result
will contain everything after the first '\x97'
character, including other '\x97'
characters.