tags:

views:

173

answers:

7

I am new to python,can anybody please explain the following syntax,

               for i in [line.split('"') for line in open('a.txt')]:
                    ......
                    ......
                    ......
+1  A: 

Try run blocks in parts like this sample < http://diveintopython.org/power_of_introspection/index.html#apihelper.divein >

Felipe Cardoso Martins
Thanks for the explanation
Rajeev
+4  A: 

The file a.txt is opened and read line by line
For each line from the file, the line is split on the " characters --> we'll call these tokens.
The lines of code in the indented block presumably use these tokens somehow

In a nutshell, tis would parse the contents of a file into token delimited with either newline characters or quotation mark characters.

If the input file is:

ab"cdef"g
h"ijk"lmno"p
q

.. the program would would return the tokens:

ab
cdef
g\n
h
ijk
lmno
p\n
q\n
bguiz
Thanks for the explaination
Rajeev
this isn't quite correct. It would return `["ab", "cdef", "g"] ["h", ...] ...`
cobbal
... and, importantly, it'll read the entire file into memory instead of reading one line at a time. Don't do it.
Glenn Maynard
more than "isn't quite correct"; see my answer
John Machin
@John Machin : I've added the newline characters, thanks for pointing that out
bguiz
@John Machin : Also here I have used the concept of tokens, which I am aware are in a 2-dimensional data structure; I have flattened it for simplicity - good enough if all you are concerned with is the value of `i` in the block.
bguiz
@bguiz: you need to fix "delimited with either newline characters or quotation mark characters". "Flattened for simplicity" is simply wrong.
John Machin
+6  A: 
afile = open('a.txt')
for line in afile:
    for field in line.split('"'):
        # do something

There really isn't much good reason for crowding such a simple concept in such a hard to read expression.

msw
Thanks for the explanation
Rajeev
+1  A: 

Well, first off, what do you get if you run the code with a file a.txt as say:

"This", is a "test"
Here we "GO" again
"Quoted Line"
and no quotes?

['', 'This', ', is a ', 'test', '\n']
['Here we ', 'GO', ' again\n']
['', 'Quoted Line', '\n']
['and no quotes?\n']

So, you are getting a list for each line, where the line lists consist of the contents of the string divided by the quote " character.

So try running these:

for line in open('a.txt'):
    print line

So, it will iterate over the lines of the file a.txt, which has been called with open.

for line in open("a.txt"):
    line_parts = [line.split('"') for line in open('a.txt')]
    print line_parts

That second line is what is referred to as a List Comprehension and will run the split method on each line of the file.

So, now that you see the output of these, you can hopefully see why it is doing what is it doing. Let me know if this makes any sense. I had a couple of drinks :)

sberry2A
A: 

It will read the file 'a.txt' line by line and split it using the delimiter '"'. The split will result in a list. Once reading is done the result of splitting is assigned to the 'i' i.e. a list of list.

Example:
Personal firewall "software may warn about the connection IDLE
Personal firewall "software" may warn about the connection IDLE
Personal "firewall" "software may warn about the" connection IDLE
Output:
['Personal firewall ', 'software may warn about the connection IDLE\n'] ['Personal firewall ', 'software', ' may warn about the connection IDLE\n'] ['Personal ', 'firewall', ' ', 'software may warn about the', ' connection IDLE\n']
Favonius
+1  A: 

Other than judgment calls on whether that obfuscates or not, to understand it you need to look up list comprehension in python, and the functions used there.

List comprehension is a way to create lists that's both performance and character footprint efficient, at the cost of readability in some cases.

IE: you have a list [1,2,3], and you want one containing the same elements with 1 added to each, you could do something like

originalList = [1,2,3,4]
newList = [x+1 for x in originalList]
print newList

This gets even more interesting (and occasionally unreadable) when you introduce lambdas and multidimensionality.

With that in mind, to puzzle out that line of code you have to look at it backwards. open() will get you the contents of the text file in this case. So For line in open() catches the return, and iterates over it, giving you a line of the file for each iteration, which is the "x in originalList" part of the example basically.

The content of your iterator, in that list comprehension, being a string, has a split method, which is used there.

At the end of that segment between square brackets, you have a list of elements (coming from split) that were separated by '"' originally, and each of those is an entry in the list created by your list comprehension.

The result of that list comprehension is then iterated over again in that "for i in[]".

ThE_JacO
+4  A: 

After changing the extremely badly named i to fields, the original code is equivalent to:

file_handle = open('a.txt') # open the file
for line in file_handle: # iterate over lines in the file
    fields = line.split('"') # split line into fields
    # === End of equivalent code ===
    # Now do something with fields, for example:
    for field in fields:
        # Now do something with field

Using a list comprehension in the manner of the original code is both obfuscatory and inefficient. As shown by the above equivalent code, there is no need for a list comprehension at all. The original code builds a temporary list, which is immediately iterated over and finally discarded, perhaps after occupying a large chunk of memory for a long time.

Note: the currently accepted answer is incorrect. (1) The original code does NOT produce a field/token at a time; each i will refer to a list of fields (2) fields are NOT separated by newlines; in fact the last field for each line (except perhaps the last line) will include a newline. See below.

If the input file is:

ab"cdef"g
h"ijk"lmno"p
q

then the "values" of i will be

['ab', 'cdef', 'g\n']
['h', 'ijk', 'lmno', 'p\n']
['q\n']

Aside: having fields separated by quotes is rather unusual, isn't it?

John Machin