ansaurus

Question

Getting correct string length in Python for strings with ANSI color codes

Answer 1

A:

The Python code printing out the text should check sys.stdout.isatty() and not use color codes if it is False.

Ignacio Vazquez-Abrams 2010-02-02 19:21:02

That is good to know, but doesn't answer my question. This script is always run inside of PuTTY or something similar anyway, so I'm not too concerned.

Paul D. 2010-02-02 19:40:48

Answer 2

+3 A:

The pyparsing wiki includes this helpful expression for matching on ANSI escape sequences:

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + Optional(delimitedList(integer,';')) + 
                oneOf(list(alphas)))

Here's how to make this into an escape-sequence-stripper:

from pyparsing import *

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + Optional(delimitedList(integer,';')) + 
                oneOf(list(alphas)))

nonAnsiString = lambda s : Suppress(escapeSeq).transformString(s)

unColorString = nonAnsiString('\x1b[1m0.0\x1b[0m')
print unColorString, len(unColorString)

prints:

0.0 3

Paul McGuire 2010-02-02 19:33:30

Technically the delimited list can have strings in it too, although it is unlikely you will ever meet such a sequence. See also http://stackoverflow.com/questions/1833873/python-regex-escape-characters/1834669#1834669

bobince 2010-02-02 20:29:52

OH, don't I know it! In my youth, we made those VT100's dance, flashing their LED's, changing their scroll regions, outputting double-high-double-wide fonts, in bold reverse video - ah, what heady days those were...

Paul McGuire 2010-02-02 21:40:51

Thanks, that worked perfectly! I was hoping there was just some blahlibrary.unescape() method somewhere I was overlooking, but this is the next best thing!

Paul D. 2010-02-03 15:16:08

Answer 3

A:

Looking in ANSI_escape_code, the sequence in your example is Select Graphic Rendition (probably bold).

Try to control column positioning with the CUrsor Position ( CSI n ; m H) sequence. This way, width of preceding text does not affect current column position and there is no need to worry about string widths.

A better option, if you target Unix, is using the curses module window-objects. For example, a string can be positioned on the screen with:

window.addnstr([y, x], str, n[, attr])

Paint at most n characters of the string str at (y, x) with attributes attr, overwriting anything previously on the display.

gimel 2010-02-02 19:45:53

Thanks - I'll take a look at curses.

Paul D. 2010-02-02 20:21:58

Answer 4

+1 A:

I don't understand TWO things.

(1) It is your code, under your control. You want to add escape sequences to your data and then strip them out again so that you can calculate the length of your data?? It seems much simpler to calculate the padding before adding the escape sequences. What am I missing?

Let's presume that none of the escape sequences change the cursor position. If they do, the currently accepted answer won't work anyway.

Let's assume that you have the string data for each column (before adding escape sequences) in a list named string_data and the pre-determined column widths are in a list named width. Try something like this:

temp = []
for colx, text in enumerate(string_data):
    npad = width[colx] - len(text) # calculate padding size
    assert npad >= 0
    enhanced = fancy_text(text, colx, etc, whatever) # add escape sequences
    temp.append(enhanced + " " * npad)
sys.stdout.write("".join(temp))

Update after OP's comment """The reason I want to strip them out and calculate the length after the string contains the color codes is because all the data is built up programmatically. I have a bunch of colorize methods and I'm building up the data something like this: str = "%s/%s/%s" % (GREEN(data1), BLUE(data2), RED(data3)) It would be pretty difficult to color the text after the fact."""

If the data is built up of pieces each with its own formatting, you can still compute the displayed length and pad as appropriate. Here's a function which does that for one cell's contents:

BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE = range(40, 48)
BOLD = 1

def render_and_pad(reqd_width, components, sep="/"):
    temp = []
    actual_width = 0
    for fmt_code, text in components:
        actual_width += len(text)
        strg = "\x1b[%dm%s\x1b[m" % (fmt_code, text)
        temp.append(strg)
    if temp:
        actual_width += len(temp) - 1
    npad = reqd_width - actual_width
    assert npad >= 0
    return sep.join(temp) + " " * npad

print repr(
    render_and_pad(20, zip([BOLD, GREEN, YELLOW], ["foo", "bar", "zot"]))
    )

If you think that the call is overburdened by punctuation, you could do something like:

BOLD = lambda s: (1, s)
BLACK = lambda s: (40, s)
# etc
def render_and_pad(reqd_width, sep, *components):
    # etc

x = render_and_pad(20, '/', BOLD(data1), GREEN(data2), YELLOW(data3))

(2) I don't understand why you don't want to use the supplied-with-Python regular expression kit. No "hackery" (for any possible meaning of "hackery" that I'm aware of) is involved:

>>> import re
>>> test = "1\x1b[a2\x1b[42b3\x1b[98;99c4\x1b[77;66;55d5"
>>> expected = "12345"
>>> # regex = re.compile(r"\x1b\[[;\d]*[A-Za-z]")
... regex = re.compile(r"""
...     \x1b     # literal ESC
...     \[       # literal [
...     [;\d]*   # zero or more digits or semicolons
...     [A-Za-z] # a letter
...     """, re.VERBOSE)
>>> print regex.findall(test)
['\x1b[a', '\x1b[42b', '\x1b[98;99c', '\x1b[77;66;55d']
>>> actual = regex.sub("", test)
>>> print repr(actual)
'12345'
>>> assert actual == expected
>>>

Update after OP's comment """I still prefer Paul's answer since it's more concise"""

More concise that what? Isn't the regex solution concise enough for you:

# === setup ===
import re
strip_ANSI_escape_sequences_regex = re.compile(r"""
    \x1b     # literal ESC
    \[       # literal [
    [;\d]*   # zero or more digits or semicolons
    [A-Za-z] # a letter
    """, re.VERBOSE)
def strip_ANSI_escape_sequences(s):
    return strip_ANSI_escape_sequences_regex("", s)

# === usage ===
raw_data = strip_ANSI_escape_sequences(formatted_data)

??

John Machin 2010-02-02 22:55:38

Thanks for the answer John. The reason I want to strip them out and calculate the length *after* the string contains the color codes is because all the data is built up programmatically. I have a bunch of colorize methods and I'm building up the data something like this:str = "%s/%s/%s" % (GREEN(data1), BLUE(data2), RED(data3))It would be pretty difficult to color the text after the fact. As for hackery, maybe what I should have said was, "I imagine this is a solved problem and I just can't find the right library". Guess not, but I still prefer Paul's answer since it's more concise.

Paul D. 2010-02-03 15:14:47

OK, I'll bite on this. I see what you're getting at with calculating the length as you go. The solution you propose doesn't quite work, just because knowing the desired column length ahead of time requires that you know the length of the largest string to be stored in a cell of that column. Nonetheless, I don't think it'd be too hard to write something that got around this and didn't require stripping the color sequences out after the fact.

Paul D. 2010-02-04 04:06:11

As for the regex comment, I don't have any problem using Python's built in support. I just generally tend to shy away from doing regex parsing because it's easy to mess it up and forget some edge case. See the unending list of questions here on SO from people trying to use regexes with HTML for proof of that. Or, just look at the comment on Paul's post, which points out that what he provided doesn't actually account for non-color control codes. That said, when only worried about colors, it's pretty straightforward as you've shown.

Paul D. 2010-02-04 04:07:14

"Forgetting" the implementation of an edge case is independent of the implementation tool (pyParsing, regex, assembly language). Regexes are incapable of parsing HTML properly; the unending list of questions from people who don't know that proves nothing. Actually, the comment on Paul's post referred to sequences with a string constant parameter instead of an integer constant, and mentioned that they are rare. Colour-related sequences are a small subset of the set graphic rendition command, and that's just one of many commands. You haven't addressed "more concise".

John Machin 2010-02-04 09:17:39

ansaurus

tags:

views:

answers:

Getting correct string length in Python for strings with ANSI color codes

related questions