The two main alternatives: read everything in as a single string and remove newlines:
clean = open('thefile.txt').read().replace('\n', '')
or, read line by line, removing the newline that ends each line, and join it up again:
clean = ''.join(l[:-1] for l in open('thefile.txt'))
The former alternative is probably faster, but, as always, I strongly recommend you MEASURE speed (e.g., use python -mtimeit
) in cases of your specific interest, rather than just assuming you know how performance will be. REs are probably slower, but, again: don't guess, MEASURE!
So here are some numbers for a specific text file on my laptop:
$ python -mtimeit -s"import re" "re.sub('\n','',open('AV1611Bible.txt').read())"
10 loops, best of 3: 53.9 msec per loop
$ python -mtimeit "''.join(l[:-1] for l in open('AV1611Bible.txt'))"
10 loops, best of 3: 51.3 msec per loop
$ python -mtimeit "open('AV1611Bible.txt').read().replace('\n', '')"
10 loops, best of 3: 35.1 msec per loop
The file is a version of the KJ Bible, downloaded and unzipped from here (I do think it's important to run such measurements on one easily fetched file, so others can easily reproduce them!).
Of course, a few milliseconds more or less on a file of 4.3 MB, 34,000 lines, may not matter much to you one way or another; but as the fastest approach is also the simplest one (far from an unusual occurrence, especially in Python;-), I think that's a pretty good recommendation.