ansaurus

Question

Something wrong with output from list in python

Answer 1

+6 A:

You want something like this:

english2german = open("english2german.txt")
englist = []
gerlist = []

for line in english2german:
    (e, g) = line.split()
    englist.append(e)
    gerlist.append(g)

The problem with your code before is that englist[i:] is actually a slice of a list, not just a single index. A string is also iterable, so you were basically stuffing a single letter into several indices. In other words, something like gerlist[0:] = "alfa" actually results in gerlist = ['a', 'l', 'f', 'a'].

mipadi 2009-04-13 07:30:34

+1 for readability.

tgray 2009-04-13 14:06:42

Answer 2

+1 A:

Like this you mean:

english2german = open('k.txt', 'r')
englist = []
gerlist = []

for i, line in enumerate(english2german):
    englist.append(line.split()[0])
    gerlist.append(line.split()[1])

print englist
print gerlist

which generates:

['A', 'B', 'C'] ['Alfa', 'Betta', 'Charlie']

amo-ej1 2009-04-13 07:32:34

Answer 3

+5 A:

and even shorter, and likely faster:

In [1]: english2german = open('english2german.txt')
In [2]: eng, ger = zip(*( line.split() for line in english2german ))
In [3]: eng
Out[3]: ('A', 'B', 'C')
In [4]: ger
Out[4]: ('Alfa', 'Betta', 'Charlie')

if you're using python 3.0 or from future_builtins import zip, this is memory-efficient too. otherwise replace zip with izip from itertools if english2german is very long.

Autoplectic 2009-04-13 07:58:57

That's.. horrible. It might be faster, but I really doubt it's "usefully-faster", and it's far harder to read (the * especially)

dbr 2009-04-13 13:59:50

it's the 'unzip' operation, it's a fairly common idiom to join up pairs of things.

Autoplectic 2009-04-13 14:14:35

I've benchmarked the zip method against the code in mipadi's answer. zip is slightly slower with a small set of data, but slightly quicker with 10,000 lines... but the difference is about 0.05 on each..

dbr 2009-04-13 14:40:20

Answer 4

+1 A:

The solutions already posted are OK if you have no spaces in any of the words (ie each line has a single space). If I understand correctly, you are trying to build a dictionary, so I would suggest you consider the fact that you can also have definitions of multiple word expressions. In that case, you'd better use some other character instead of a space to separate the definition from the word. Something like "|", which is impossible to appear in a word.

Then, you do something like this:

for line in english2german:
    (e, g) = line.split("|")
    englist.append(e)
    gerlist.append(g)

ionut bizau 2009-04-13 08:46:23

-1: changing the file format. Use parition instead of split -- same effect--no change to the file format.

S.Lott 2009-04-13 10:10:00

Oh well, I didn't say he *has* to change the file format! I just *suggested*. I don't really see how partition can fix the problem I described, anyway.

ionut bizau 2009-04-13 10:45:29

Answer 5

+2 A:

just an addition: you're working with files. please close them :) or use the with construct:

with open('english2german.txt') as english2german:
  englist, gerlist = zip(*(line.split() for line in english2german))

ZeD 2009-04-13 14:04:15

Answer 6

+1 A:

Slightly meta-answer(?) to Autoplectic's suggestion of using zip()

With 3 lines in the input file (from the supplied data in the question):

The zip() method takes an average of 0.404729390144 seconds, compared to 0.341339087486 with the simple for loop constructing two lists (the code from mipadi's currently accepted answer).

With 10,000 lines in the input file (random generated 3-12 character words. I reduced the timeit.repeat() values to 100 times, repeated twice):

zip() took an average of 1.43965339661 seconds, compared to 1.52318406105 with the for loop.

Both benchmarks were done using Python version 2.5.1

Hardly a huge difference.. Given how much more readable the simple for loop is, I would recommend using it.. The zip code might be a bit quicker with large files, but the difference is about 0.083 seconds with 10,000 lines..

Benchmarking code:

import timeit

# http://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743313#743313
code_zip = """english2german = open('english2german.txt')
eng, ger = zip(*( line.split() for line in english2german ))
"""

# http://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743268#743268
code_for = """english2german = open("english2german.txt")
englist = []
gerlist = []

for line in english2german:
    (e, g) = line.split()
    englist.append(e)
    gerlist.append(g)
"""

for code in [code_zip, code_for]:
    t = timeit.Timer(stmt = code)
    try:
        times = t.repeat(10, 10000)
    except:
        t.print_exc()
    else:
        print "Code:"
        print code
        print "Time:"
        print times
        print "Average:"
        print sum(times) / len(times)
        print "-" * 20

dbr 2009-04-13 14:37:42

ansaurus

tags:

views:

answers:

Something wrong with output from list in python

related questions