views:

184

answers:

2

I have a long string (multiple paragraphs) which I need to split into a list of line strings. The determination of what makes a "line" is based on:

  • The number of characters in the line is less than or equal to X (where X is a fixed number of columns per line_)
  • OR, there is a newline in the original string (that will force a new "line" to be created.

I know I can do this algorithmically but I was wondering if python has something that can handle this case. It's essentially word-wrapping a string.

And, by the way, the output lines must be broken on word boundaries, not character boundaries.

Here's an example of input and output:

Input:

"Within eight hours of Wilson's outburst, his Democratic opponent, former-Marine Rob Miller, had received nearly 3,000 individual contributions raising approximately $100,000, the Democratic Congressional Campaign Committee said.

Wilson, a conservative Republican who promotes a strong national defense and reining in the size of government, won a special election to the House in 2001, succeeding the late Rep. Floyd Spence, R-S.C. Wilson had worked on Spence's staff on Capitol Hill and also had served as an intern for Sen. Strom Thurmond, R-S.C."

Output:

"Within eight hours of Wilson's outburst, his"
"Democratic opponent, former-Marine Rob Miller,"
" had received nearly 3,000 individual "
"contributions raising approximately $100,000,"
" the Democratic Congressional Campaign Committee"
" said."
""
"Wilson, a conservative Republican who promotes a "
"strong national defense and reining in the size "
"of government, won a special election to the House"
" in 2001, succeeding the late Rep. Floyd Spence, "
"R-S.C. Wilson had worked on Spence's staff on "
"Capitol Hill and also had served as an intern"
" for Sen. Strom Thurmond, R-S.C."
+4  A: 

You probably want to use the textwrap function in the standard library:

http://docs.python.org/library/textwrap.html

Paul McMillan
+4  A: 

EDIT

What you are looking for is textwrap, but that's only part of the solution not the complete one. To take newline into account you need to do this:

from textwrap import wrap
'\n'.join(['\n'.join(wrap(block, width=50)) for block in text.splitlines()])

>>> print '\n'.join(['\n'.join(wrap(block, width=50)) for block in text.splitlines()])

Within eight hours of Wilson's outburst, his
Democratic opponent, former-Marine Rob Miller, had
received nearly 3,000 individual contributions
raising approximately $100,000, the Democratic
Congressional Campaign Committee said.

Wilson, a conservative Republican who promotes a
strong national defense and reining in the size of
government, won a special election to the House in
2001, succeeding the late Rep. Floyd Spence,
R-S.C. Wilson had worked on Spence's staff on
Capitol Hill and also had served as an intern for
Sen. Strom Thurmond
Nadia Alramli
Good answer! That's how I thought to do it, too. But what's with the "Wilson, a" after the blank line?
Andrei Vajna II
@Andrei, I updated my answer with a yet better solution.
Nadia Alramli
Cool! But now it looks messy. :P
Andrei Vajna II
@Andrei, hehehe. It's not that bad!
Nadia Alramli
Perfect. Thank you.
Karim