tags:

views:

94

answers:

2

I have some code that pulls data from a com-port and I want to make sure that what I got really is a printable string (i.e. ASCII, maybe UTF-8) before printing it. Is there a function for doing this? The first half dozon places I looked didn't have anything that looks like what I want. (string has printable but I didn't see anything (there or in the string methods) to check if every char in one string is in another.

Note: control characters are not printable for my purposes.


Edit: I was/am looking for a single function, not a roll your own solution:

What I ended up with is:

all(ord(c) < 127 and c in string.printable for c in input_str)
+1  A: 

try/except seems the best way:

def isprintable(s, codec='utf8'):
    try: s.decode(codec)
    except UnicodeDecodeError: return False
    else: return True

I would not rely on string.printable, which might deem "non-printable" control characters that can commonly be "printed" for terminal control purposes (e.g., in "colorization" ANSI escape sequences, if your terminal is ANSI-compliant). But that, of course, depends on your exact purposes for wanting to check this!-)

Alex Martelli
string.printable is well defined. "a combination of digits, letters, punctuation, and whitespace." Whitesapce OTOH is a little less so: "On most systems this includes the characters space, tab, linefeed, return, formfeed, and vertical tab."
BCS
@BCS, it's basically the same concept as C's bad old `isprint` macro, and exhibits exactly the same failings (no control sequences / escape sequences -- but many terminals and printers can accept some control / escape sequences for cosmetic purposes such as colorization, and, depending on the app's purposes, forbidding such characters from the output may therefore prove unwise).
Alex Martelli
My concern is that whitespace could include *more* than those 6 chars. I know that if my data source ever contains "control chars", that I can assume they are junk.
BCS
+2  A: 

As you've said the string module has printable so it's just a case of checking if all the characters in your string are in printable:

>>> hello = 'Hello World!'
>>> bell = chr(7)
>>> import string
>>> all(c in string.printable for c in hello)
True
>>> all(c in string.printable for c in bell)
False

You could convert both strings to sets - so the set would contain each character in the string once - and check if the set created by your string is a subset of the printable characters:

>>> printset = set(string.printable)
>>> helloset = set(hello)>>> bellset = set(bell)
>>> helloset
set(['!', ' ', 'e', 'd', 'H', 'l', 'o', 'r', 'W'])
>>> helloset.issubset(printset)
True
>>> set(bell).issubset(printset)
False

So, in summary, you'd probably want to do this:

import string
printset = set(string.printable)
isprintable = set(yourstring).issubset(printset)
Dave Webb
I was kinda hoping for a non-roll your own solution. Why the heck doesn't python have this as a function?
BCS