views:

4673

answers:

6

I am looking for a way to do smart truncating in python. By this I mean that if a string is over X amount of characters then it will only show the first X characters and a suffix like '...'. By "smart" I mean that it will not cutoff words in the middle instead it will cut off on spaces. For instance, not "this is rea...", instead "this is really..."

Anything that would lead me in the right direction would be appreciated.

Thanks.

+11  A: 

I actually wrote a solution for this on a recent project of mine. I've compressed the majority of it down to be a little smaller.

def smart_truncate(content, length=100, suffix='...'):
    if len(content) <= length:
        return content
    else:
        return ' '.join(content[:length+1].split(' ')[0:-1]) + suffix

What happens is the if-statement checks if your content is already less than the cutoff point. If it's not, it truncates to the desired length, splits on the space, removes the last element (so that you don't cut off a word), and then joins it back together (while tacking on the '...').

Adam
+12  A: 

Here's a slightly better version of the last line in Adam's solution:

return content[:length].rsplit(' ', 1)[0]+suffix

(This is slightly more efficient, and returns a more sensible result in the case there are no spaces in the front of the string.)

bobince
That's interesting about the rsplit. I guess I never ran across it.
Adam
A quick test of the two approaches (Python 2.4.3):Adam's code: >>> smart_truncate('The quick brown fox jumped over the lazy dog.', 26)"The quick brown fox jumped..."With bobince's code: >>> smart_truncate('The quick brown fox jumped over the lazy dog.', 26)The quick brown fox...
Patrick Cuff
Yeah, I added in length+1 on the truncation to handle if the truncation splits exactly at the end of a word naturally.
Adam
This one is better. But I'd put it under the if and skip the else, it's more pythonix.
e-satis
Well, then, let's use the conditional expression:def smart_truncate(content, length=100, suffix='...'): return (content if len(content) <= length else content[:length].rsplit(' ', 1)[0]+suffix)
hughdbrown
+3  A: 
def smart_truncate(s, width):
    if s[width].isspace():
        return s[0:width];
    else:
        return s[0:width].rsplit(None, 1)[0]

Testing it:

>>> smart_truncate('The quick brown fox jumped over the lazy dog.', 23) + "..."
'The quick brown fox...'
Vebjorn Ljosa
Note: If width > len(s), you get an out of bounds on s[width]. You probably want an extra check for the case where truncation isn't needed.
Brian
+4  A: 
def smart_truncate1(text, max_length=100, suffix='...'):
    """Returns a string of at most `max_length` characters, cutting
    only at word-boundaries. If the string was truncated, `suffix`
    will be appended.
    """

    if len(text) > max_length:
        pattern = r'^(.{0,%d}\S)\s.*' % (max_length-len(suffix)-1)
        return re.sub(pattern, r'\1' + suffix, text)
    else:
        return text

OR

def smart_truncate2(text, min_length=100, suffix='...'):
    """If the `text` is more than `min_length` characters long,
    it will be cut at the next word-boundary and `suffix`will
    be appended.
    """

    pattern = r'^(.{%d,}?\S)\s.*' % (min_length-1)
    return re.sub(pattern, r'\1' + suffix, text)

OR

def smart_truncate3(text, length=100, suffix='...'):
    """Truncates `text`, on a word boundary, as close to
    the target length it can come.
    """

    slen = len(suffix)
    pattern = r'^(.{0,%d}\S)\s+\S+' % (length-slen-1)
    if len(text) > length:
        match = re.match(pattern, text)
        if match:
            length0 = match.end(0)
            length1 = match.end(1)
            if abs(length0+slen-length) < abs(length1+slen-length):
                return match.group(0) + suffix
            else:
                return match.group(1) + suffix
    return text
MizardX
i always love regex based solutions :)
Corey Goldberg
+4  A: 

There are a few subtleties that may or may not be issues for you, such as handling of tabs (Eg. if you're displaying them as 8 spaces, but treating them as 1 character internally), handling various flavours of breaking and non-breaking whitespace, or allowing breaking on hyphenation etc. If any of this is desirable, you may want to take a look at the textwrap module. eg:

def truncate(text, max_size):
    if len(text) <= max_size:
        return text
    return textwrap.wrap(text, max_size-3)[0] + "..."

The default behaviour for words greater than max_size is to break them (making max_size a hard limit). You can change to the soft limit used by some of the other solutions here by passing break_long_words=False to wrap(), in which case it will return the whole word. If you want this behaviour change the last line to:

    lines = textwrap.wrap(text, max_size-3, break_long_words=False)
    return lines[0] + ("..." if len(lines)>1 else "")

There are a few other options like expand_tabs that may be of interest depending on the exact behaviour you want.

Brian