views:

92

answers:

3

I am very new to python and beautifulsoup.

In the for statement, what is incident? Is it a class, type, variable? The line following the for.. totally lost.

Can someone please explain this code to me?

for incident in soup('td', width="90%"):
    where, linebreak, what = incident.contents[:3]
    print where.strip()
    print what.strip()
    break
print 'done'
+3  A: 

The first statement starts a loop which parses an HTML document looking for td elements with width set to 90%. The object representing the td element is bound to the name incident.

The second line is a multiple assignment and can be rewritten as follows:

where = incident.contents[0]
linebreak = incident.contents[1]
what = incident.contents[2]

In other words it extracts the contents from the td tag and gives each element a more meaningful name.

The final line in the loop causes the loop to break after checking only the first element. The code could have been rewritten to not use a loop which would have made it more clear.

Mark Byers
A: 

First off, Python cares about where newlines and spaces are, so you should use the code tag to present Python code. As is, I have to guess at how your code was originally formatted.

for incident in soup('td', width="90%"): 
    where, linebreak, what = incident.contents[:3] 
    print where.strip()
    print what.strip() 
    break 
print 'done'

The 'for x in y:' statement assumes that 'y' is some kind of iterable (list-like) thing - an ordered collection of objects. Then, for each element in the list, it assigns the element to the name 'x', and runs the indented block.

In this case, there appears to be a function, soup(), which returns a list of incidents. Each incident is an object which contains an attribute, called 'contents', which is itself a list; [:3] means 'the first three elements of the list'. So that line is taking the first three things in the contents of the incident and assigning them the names 'where', 'linebreak', and 'what'. The strip() function removes whitespace off the start and end of a string. So we print the 'where' and the 'what'. 'break' exits from the for-loop, so in this case it only runs once, which is a little odd.

Russell Borogove
`ordered collection of objects` It doesn't have to be ordered
Falmarri
Thanks. I left out the top part of the code
NewB
page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")soup = BeautifulSoup(page)
NewB
The soup method is from BeautifulSoup. When I look thru the class browser I thought I would see a method for the class that matched the parameters (like in C++). What is the best way to find the method being invoked.
NewB
"It doesn't have to be ordered" - pedantry. By the time the keys of a dict hit the for statement, they're in an order.
Russell Borogove
+1  A: 

Welcome to Stack Overflow! Let's take a look at what's happening. I've added links to further reading along the way, do take a look at them before asking further questions.

    for incident in soup('td', width="90%"): 

incidentis just an arbitrary local variable for the iterable returned by soup. Generally speaking, the local variable in a for statement is probably a list, but may be a tuple or even a string. If it's possible to iterate over something, like a file, then Python will probably accept for to go through the items.

In this case, soup is returning a list of td HTML elements with a width of 90%. We can see this because of what happens on the next line:

        where, linebreak, what = incident.contents[:3]

where, linebreak and what are all arbitrary local variables as well. They are all being assigned in a single statement. In Python, this is known as multiple assignment. Where do those three elements come from?incident.contents[:3] is asking for the first three elements, using slice notation.

        print where.strip()
        print what.strip()

These two lines print where and what onto the screen.¹ But what is strip doing? It's removing white space. So, " some text " become "some text".

        break

break is just breaking the for loop after its first cycle. It doesn't break the whole program. Instead, it returns the program's flow to the next line after the loop.

    print 'done'

This is just doing what it says, sending the words 'done' to the screen. If you are using this program, you know it is complete when you see 'done' (without the quotes) appear on the screen.

¹ To be more technically precise, they send the bytes to standard out (normally known as stdout).

Tim McNamara
`break` is necessary in this case because it ensures that the `for` loop only runs once.
aaronasterling
thanks @Aaron, will edit answer.
Tim McNamara