Slightly meta-answer(?) to Autoplectic's suggestion of using zip()
With 3 lines in the input file (from the supplied data in the question):
The zip()
method takes an average of 0.404729390144
seconds, compared to 0.341339087486
with the simple for loop constructing two lists (the code from mipadi's currently accepted answer).
With 10,000 lines in the input file (random generated 3-12 character words. I reduced the timeit.repeat()
values to 100 times, repeated twice):
zip()
took an average of 1.43965339661
seconds, compared to 1.52318406105
with the for loop.
Both benchmarks were done using Python version 2.5.1
Hardly a huge difference.. Given how much more readable the simple for loop is, I would recommend using it.. The zip
code might be a bit quicker with large files, but the difference is about 0.083 seconds with 10,000 lines..
Benchmarking code:
import timeit
# http://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743313#743313
code_zip = """english2german = open('english2german.txt')
eng, ger = zip(*( line.split() for line in english2german ))
"""
# http://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743268#743268
code_for = """english2german = open("english2german.txt")
englist = []
gerlist = []
for line in english2german:
(e, g) = line.split()
englist.append(e)
gerlist.append(g)
"""
for code in [code_zip, code_for]:
t = timeit.Timer(stmt = code)
try:
times = t.repeat(10, 10000)
except:
t.print_exc()
else:
print "Code:"
print code
print "Time:"
print times
print "Average:"
print sum(times) / len(times)
print "-" * 20