tags:

views:

128

answers:

5

Emacs's auto-fill mode splits the line to make the document look nice. I need to join the strings read from the document.

For example, (CR is the carriage return, not the real character)

  - Blah, Blah, and (CR)
    Blah, Blah, Blah, (CR)
    Blah, Blah (CR)
  - A, B, C (CR) 
    Blah, Blah, Blah, (CR)
    Blah, Blah (CR)

is read into string buffer array with readlines() function to produce

["Blah, Blah, and Blah, Blah, Blah, Blah, Blah", "A, B, C Blah, Blah, Blah, Blah, Blah"]

I thought about having loop to check '-' to concatenate all the stored strings before it, but I expect Python has efficient way to do this.

ADDED:

Based on kindall's code, I could get what I want as follows.

lines = ["- We shift our gears toward nextGen effort"," contribute the work with nextGen."]
out = [(" " if line.startswith(" ") else "\n") + line.strip() for line in lines]
print out
res = ''.join(out).split('\n')[1:]
print res

The result is as follows.

['\n- We shift our gears toward nextGen effort', ' contribute the work with nextGen.']
['- We shift our gears toward nextGen effort contribute the work with nextGen.']
A: 

Use file.readlines(). It returns a list of strings, each string being a line of the file:

readlines(...)
    readlines([size]) -> list of strings, each a line from the file.

    Call readline() repeatedly and return a list of the lines so read.
    The optional size argument, if given, is an approximate bound on the
    total number of bytes in the lines returned.

EDIT: readlines() is not the best way to go, as has been pointed out in the comments. Disregard that suggestion and use the following one instead

If you were to use the output that emacs provides as input into a python function, then I would give you this (if the emacs output is one long string):

[s.replace("\n", "") for s in emacsOutput.split('-')]

Hope this helps

inspectorG4dget
The questioner knows about `readlines` but wants to know how to combine the lines correctly after reading!
Andrew Jaffe
@Andrew : It's true that I need to ask how to comibine the lines, but I had a typo with function name of readlines() in the post, he just introduced me file.readlines().
prosseek
I thought I was giving an overly simplistic solution. Anyway, @prossek, can you please post the format of the input file that you are trying to read so that I can write the appropriate function for you?
inspectorG4dget
@inspectorG4dget : I think I post the file in the post, it starts with '-', and one long line is split/reformatted into multiple line by emacs.
prosseek
it's almost never necessary to use `readlines`. almost all uses stem from forgetting that `f` is itself an iterable can be treated like a sequence of it's lines.
aaronasterling
A: 

Why not use the join function?

something like

" ".join(string)

will concatenate every element in your list separated by a single space to a single string.

sahhhm
This doesn't really fit the context of the question.
Daenyth
+2  A: 

I'm not sure if you want just :

result = thefile.read()  

or maybe :

result = ''.join(line.strip() for line in thefile)  

or something else ...

dugres
A: 

Your example does not help much to clarify. Why are the lines that have no hyphenation joined together in a single string? Isn't a \n at the end of each line?

Anyhow, given that you are using readlines(), which immediately loads all your file in memory, you could instead use read() and do the parsing yourself. For example:

>>> a = "line one!\nThis is a li\n-ne that has been broken.\nWhile this is line 3."
>>> print a
line one!
This is a li
-ne that has been broken.
While this is line 3.
>>> a.replace("\n-","").split("\n")
['line one!', 'This is a line that has been broken.', 'While this is line 3.']

Of course in the example above, a should be the return value of file.read()

mac
+1  A: 

As I read it, your problem is to undo hard-wrapping and restore each set of indented lines to a single soft-wrapped line. This is one way to do it:

# hard-coded input, could also readlines() from a file
lines = ["- Blah, Blah, and", 
         "  Blah, Blah, Blah,",
         "  Blah, Blah",
         "- Blah, Blah, and",
         "  Blah, Blah, Blah,",
         "  Blah, Blah"]

out = [(" " if line.startswith(" ") else "\n") + line.strip() for line in lines]
out = ''.join(out)[1:].split('\n')

print out
kindall