views:

374

answers:

7

I'm a python newb and am having trouble groking nested list comprehensions. I'm trying to write some code to read in a file and construct a list for each character for each line.

so if the file contains

xxxcd
cdcdjkhjasld
asdasdxasda

The resulting list would be:

[
['x','x','x','c','d']
['c','d','c','d','j','k','h','j','a','s','l','d']
['a','s','d','a','s','d','x','a','s','d','a']
]

I have written the following code, and it works, but I have a nagging feeling that I should be able to write a nested list comprehension to do this in fewer lines of code. any suggestions would be appreciated.

data = []
f = open(file,'r')
for line in f:
    line = line.strip().upper()
    list = []
    for c in line:
        list.append(c)
    data.append(list)
+1  A: 
data = [list(line.strip().upper()) for line in open(file,'r')]
interjay
+1, but I'd rather leave each entry in the list as a string. Since strings can be indexed, so you can still treat it like a two-dimensional array of characters. The `list(...)` part is only needed if the OP plans to make changes to the rows.
Chris Lutz
+1  A: 

Here is one level of list comprehension.

data = []
f = open(file,'r')

for line in f:
    data.append([ch for ch in line.strip().upper()])

But we can do the whole thing on one go:

f = open(file, 'rt')
data = [list(line.strip().upper()) for line in f]

This is using list() to convert a string to a list of single-character strings. We could also use nested list comprehensions, and put the open() inline:

data = [[ch for ch in line.strip().upper()] for line in open(file, 'rt')]

At this point, though, I think the list comprehensions are detracting from easy readability of what is going on.

For complicated processing, such as lists inside lists, you might want to use a for loop for the outer layer and a list comprehension for the inner loop.

Also, as Chris Lutz said in a comment, in this case there really isn't a reason to explicitly split each line into character lists; you can always treat a string as a list, and you can use string methods with a string, but you can't use string methods with a list. (Well, you could use ''.join() to rejoin the list back to a string, but why not just leave it as a string?)

steveha
+5  A: 

This should help (you'll probably have to play around with it to strip the newlines or format it however you want, but the basic idea should work):

f = open(r"temp.txt")
[[c for c in line] for line in f]
Edan Maor
A: 

First off you could combine the line.strip().upper() part with your outer for-loop, like this:

for line in [l.strip().upper() for l in f]:
    # do stuff

Then you could make the iteration over the characters into a list comprehension, but it wouldn't be shorter or clearer. The neatest way to do what you do there is this:

list(someString)

Thus you could do:

data = [list(l.strip().upper()) for l in f]

I don't know if it states your intentions that well though. Error handling is also an issue, the whole expression will die if there is a problem on the way.


If you don't need to store the whole file and all the lines in memory, you could make it into a generator expression. This is very useful when processing huge files and you only need to process a chunk at a time. Generator expressions use parentheses instead, like so:

data = (list(l.strip().upper()) for l in f)

data will become a generator which runs the expression for each line in the file, but only when you iterate over it; compare that to a list comprehension which will create a huge list in memory. Note that data is not a list, but a generator, and more a kin to a iterator in C++ or IEnumerator in C#.

A generator can be fed into a list easily: list(someGenerator) That would defeat the purpose somewhat but is sometimes a necessity.

Skurmedel
A: 
>>> f = file('teste.txt')
>>> print map(lambda x: [c for c in x][:-1], f)
[['x', 'x', 'x', 'c', 'd'], ['c', 'd', 'c', 'd', 'j', 'k', 'h', 'j', 'a', 's', 'l', 'd'], ['a', 's', 'd', 'a', 's', 'd', 'x', 'a', 's', 'd']]
jbochi
+2  A: 

In your case, you can use the list constructor to handle the inner loop and use list comprehension for the outer loop. Something like:

f = open(file)
data = [list(line.strip().upper()) for line in f]

Given a string as input, the list constructor will create a list where each character of the string is a single element in the list.

The list comprehension is functionally equivalent to:

data = []
for line in f:
    data.append(list(line.strip().upper()))
Whisty
+1  A: 

The only really significant difference between strings and lists of characters is that strings are immutable. You can iterate over and slice strings just as you would lists. And it's much more convenient to handle strings as strings, since they support string methods and lists don't.

So for most applications, I wouldn't bother converting the items in data to a list; I'd just do:

data = [line.strip() for line in open(filename, 'r')]

When I needed to manipulate strings in data as mutable lists, I'd use list to convert them, and join to put them back, e.g.:

data[2] = ''.join(sorted(list(data[2])))

Of course, if all you're going to do with these strings is modify them, then go ahead, store them as lists.

Robert Rossney