ansaurus

Question

Python tabstop-aware len() and padding functions

Answer 1

+6 A:

TABWIDTH=8
def my_len(s):
    return len(s.expandtabs(TABWIDTH))

def pad_with_tabs(s,maxlen):
    return s+"\t"*((maxlen-len(s)-1)/TABWIDTH+1)

Why did I use expandtabs()?
Well it's fast

$ python -m timeit '"Bear\tnecessities\t".expandtabs()'
1000000 loops, best of 3: 0.602 usec per loop
$ python -m timeit 'for c in "Bear\tnecessities\t":pass'
100000 loops, best of 3: 2.32 usec per loop
$ python -m timeit '[c for c in "Bear\tnecessities\t"]'
100000 loops, best of 3: 4.17 usec per loop
$ python -m timeit 'map(None,"Bear\tnecessities\t")'
100000 loops, best of 3: 2.25 usec per loop

Anything that iterates over your string is going to be slower, because just the iteration is ~4 times slower than expandtabs even when you do nothing in the loop.

$ python -m timeit '"Bear\tnecessities\t".split("\t")'
1000000 loops, best of 3: 0.868 usec per loop

Even just splitting on tabs takes longer. You'd still need to iterate over the split and pad each item to the tabstop

gnibbler 2009-11-17 02:34:14

`pad_with_tabs` should probably call `my_len` instead of `len`, in case there are embedded tabs in the string to be tab-padded.

Paul McGuire 2009-11-18 07:41:55

Answer 2

A:

TABWIDTH * int( math.ceil(len(s)*1.0/TABWIDTH) ) is indeed a massive over-kill; you can get the same result much more simply. For positive i and n, use:

def round_up_positive_int(i, n):
    return ((i + n - 1) // n) * n

This procedure works in just about any language I've ever used, after appropriate translation.

Then you can do next_pos = round_up_positive_int(len(s), TABWIDTH)

For a slight increase in the elegance of your code, instead of

while(s_len < maxlen):

use this:

while s_len < maxlen:

John Machin 2009-11-17 14:37:37

Answer 3

+1 A:

I believe gnibbler's is the best for most prectical cases. But anyway, here is a naive (without accounting CR, LF etc) solution to compute the length of string without creating expanded copy:

def tab_aware_len(s, tabstop=8):
    pos = -1
    extra_length = 0
    while True:
        pos = s.find('\t', pos+1)
        if pos<0:
            return len(s) + extra_length
        extra_length += tabstop - (pos+extra_length) % tabstop - 1

Probably it could be useful for some huge strings or even memory mapped files. And here is padding function a bit optimized:

def pad_with_tabs(s, max_len, tabstop=8):
    length = tab_aware_len(s, tabstop)
    if length<max_len:
        s += '\t' * ((max_len-1)//tabstop + 1 - length//tabstop)
    return s

Denis Otkidach 2009-11-17 14:46:06

ansaurus

tags:

views:

answers:

Python tabstop-aware len() and padding functions

related questions