tags:

views:

279

answers:

6

I want python to import a list of words from a text file and print out the content of the text file as two lists. The data in the text file is on this form:

A Alfa
B Betta
C Charlie

I want python to print out one lists with A,B,C and one with Alfa, Betta, Charlie

This is what i've written:

english2german = open('english2german.txt', 'r')
englist = []
gerlist = []

for i, line in enumerate(english2german):
    englist[i:], gerlist[i:] = line.split()

This is making two lists, but will only prints out the first letter in each word. How can I make my code to print out the whole word?

+6  A: 

You want something like this:

english2german = open("english2german.txt")
englist = []
gerlist = []

for line in english2german:
    (e, g) = line.split()
    englist.append(e)
    gerlist.append(g)

The problem with your code before is that englist[i:] is actually a slice of a list, not just a single index. A string is also iterable, so you were basically stuffing a single letter into several indices. In other words, something like gerlist[0:] = "alfa" actually results in gerlist = ['a', 'l', 'f', 'a'].

mipadi
+1 for readability.
tgray
+1  A: 

Like this you mean:

english2german = open('k.txt', 'r')
englist = []
gerlist = []

for i, line in enumerate(english2german):
    englist.append(line.split()[0])
    gerlist.append(line.split()[1])

print englist
print gerlist

which generates:

['A', 'B', 'C'] ['Alfa', 'Betta', 'Charlie']

amo-ej1
+5  A: 

and even shorter, and likely faster:

In [1]: english2german = open('english2german.txt')
In [2]: eng, ger = zip(*( line.split() for line in english2german ))
In [3]: eng
Out[3]: ('A', 'B', 'C')
In [4]: ger
Out[4]: ('Alfa', 'Betta', 'Charlie')

if you're using python 3.0 or from future_builtins import zip, this is memory-efficient too. otherwise replace zip with izip from itertools if english2german is very long.

Autoplectic
That's.. horrible. It might be faster, but I really doubt it's "usefully-faster", and it's far harder to read (the * especially)
dbr
it's the 'unzip' operation, it's a fairly common idiom to join up pairs of things.
Autoplectic
I've benchmarked the zip method against the code in mipadi's answer. zip is slightly slower with a small set of data, but slightly quicker with 10,000 lines... but the difference is about 0.05 on each..
dbr
+1  A: 

The solutions already posted are OK if you have no spaces in any of the words (ie each line has a single space). If I understand correctly, you are trying to build a dictionary, so I would suggest you consider the fact that you can also have definitions of multiple word expressions. In that case, you'd better use some other character instead of a space to separate the definition from the word. Something like "|", which is impossible to appear in a word.

Then, you do something like this:

for line in english2german:
    (e, g) = line.split("|")
    englist.append(e)
    gerlist.append(g)
ionut bizau
-1: changing the file format. Use parition instead of split -- same effect--no change to the file format.
S.Lott
Oh well, I didn't say he *has* to change the file format! I just *suggested*. I don't really see how partition can fix the problem I described, anyway.
ionut bizau
+2  A: 

just an addition: you're working with files. please close them :) or use the with construct:

with open('english2german.txt') as english2german:
  englist, gerlist = zip(*(line.split() for line in english2german))
ZeD
+1  A: 

Slightly meta-answer(?) to Autoplectic's suggestion of using zip()

With 3 lines in the input file (from the supplied data in the question):

The zip() method takes an average of 0.404729390144 seconds, compared to 0.341339087486 with the simple for loop constructing two lists (the code from mipadi's currently accepted answer).

With 10,000 lines in the input file (random generated 3-12 character words. I reduced the timeit.repeat() values to 100 times, repeated twice):

zip() took an average of 1.43965339661 seconds, compared to 1.52318406105 with the for loop.

Both benchmarks were done using Python version 2.5.1

Hardly a huge difference.. Given how much more readable the simple for loop is, I would recommend using it.. The zip code might be a bit quicker with large files, but the difference is about 0.083 seconds with 10,000 lines..

Benchmarking code:

import timeit

# http://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743313#743313
code_zip = """english2german = open('english2german.txt')
eng, ger = zip(*( line.split() for line in english2german ))
"""

# http://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743268#743268
code_for = """english2german = open("english2german.txt")
englist = []
gerlist = []

for line in english2german:
    (e, g) = line.split()
    englist.append(e)
    gerlist.append(g)
"""

for code in [code_zip, code_for]:
    t = timeit.Timer(stmt = code)
    try:
        times = t.repeat(10, 10000)
    except:
        t.print_exc()
    else:
        print "Code:"
        print code
        print "Time:"
        print times
        print "Average:"
        print sum(times) / len(times)
        print "-" * 20
dbr