views:

484

answers:

3
#!/usr/bin/python

import random
import string

appendToFile = open("appendedFile", "a" )

# Generator

for i in range(1, 100000):

    chars = "".join( [random.choice(string.letters) for i in xrange(15)] )
    chars2 = "".join( [random.choice(string.letters) for i in xrange(15)] )

    appendToFile.write(chars + ":" + chars2 + "\n")

appendToFile.close()

Code modified from this question.

The above code generates 100,000 lines of random text in the format of STRING:STRING. Resultant text file is 3.1 MB.

How would one rapidly alphabetise the file, using the first STRING in STRING:STRING? Case is irrelevant.

Bubble sort is very slow, no?

+4  A: 

The obvious first approach is simply to use the built-in sort feature in Python. Is this not what you had in mind? If not, why? With only 100,000 lines of random text, the built-in sort would be very fast.

lst = open("appendedFile", "rt").readlines()
lst.sort(key=str.lower)

Done. We could do it as a one-liner if you really wanted to:

lst = sorted(open("appendedFile", "rt").readlines(), key=str.lower)

EDIT: I just checked, and strings.letters includes both upper-case and lower-case letters. So, above code is modified to be case-insensitive.

EDIT: more on sorting in Python: http://wiki.python.org/moin/HowTo/Sorting

steveha
If not, why? - New to python XD. I created that code so that you'd have a file to work from.
Nazarius Kappertaal
FYI, the built-in Python sort for lists is a merge sort. Complexity O(n*log n).
BrainCore
+1  A: 

Try this (case insensitive):

l=file(appendedFile).readlines()
l.sort(key=lambda x:x.lower())

For these kinds of sizes optimalisation is not really necessary (timings on my slow machine ;-):

christophe@orion:~$ time python -c "l=file('appendedFile').readlines();l.sort(key=lambda x:x.lower())"

real    0m0.615s
user    0m0.576s
sys 0m0.024s
ChristopheD
Thanks for the timing mechanism - didn't know it existed.
Nazarius Kappertaal
The "time" command is available under Linux. It is probably available under Mac OS X. You can also get it for Windows but Microsoft didn't build it in. The easiest way to get it for Windows is to install Cygwin. A purely Python-based approach, which is thus portable, is to use the "timeit" module: http://docs.python.org/library/timeit.html
steveha
@CrhistopheD, you don't need the `lambda` function; you can just use: `key=str.lower`
steveha
+2  A: 

This is very fast (under 1 second on my computer). It uses a case-insensitive sort, which is assume what you mean by "case is irrelevant"?

#!/usr/bin/python

appendToFile = open("appendedFile", "r")
sortToFile = open("sortedFile", "w")

for line in sorted(appendToFile, key = str.lower):
    sortToFile.write(line)
Mark Byers
Why not just: `... sorted(..., key=str.lower)` ?
Don O'Donnell