I have a CSV-like text file that has about 1000 lines. Between each record in the file is a long series of dashes. The records generally end with a \n, but sometimes there is an extra \n before the end of the record. Simplified example:
"1x", "1y", "Hi there"
-------------------------------
"2x", "2y", "Hello - I'm lost"
-------------------------------
"3x", "3y", "How ya
doing?"
-------------------------------
I want to replace the extra \n's with spaces, i.e. concatenate the lines between the dashes. I thought I would be able to do this (Python 2.5):
text = open("thefile.txt", "r").read()
better_text = re.sub(r'\n(?!\-)', ' ', text)
but that seems to replace every \n, not just the ones that are not followed by a dash. What am I doing wrong?
I am asking this question in an attempt to improve my own regex skills and understand the mistakes that I made. The end goal is to generate a text file in a format that is usable by a specific VBA for Word macro that generates a styled Word document which will then be digested by a Word-friendly CMS.