views:

64

answers:

2

I am trying the sample code for the piracy report. The line of code for incident in soup('td', width="90%"):

seraches the soup for an element td with the ad=ttribute width="90%" correct? it invokes the class BeautifulStoneSoup(Tag, SGMLParser):

method

def __init__(self, markup="", parseOnlyThese=None, fromEncoding=None,
             markupMassage=True, smartQuotesTo=XML_ENTITIES,
             convertEntities=None, selfClosingTags=None, isHTML=False):

which eventually invokes

SGMLParser.__init__(self)

AM I correct with the class flow above?

The soup looks like this in the report now

<td class="fabrik_row___jos_fabrik_icc-ccs-piracymap2010___narrations" ><p>22.09.2010: 0236 UTC: Posn: 03:49.9N – 006:54.6E: Off Bonny River: Nigeria.<p/>
<p>About 21 armed pirates in three crafts boarded a pipe layer crane vessel undertow. All crew locked themselves in accommodations. Pirates were able to take one crewmember as hostage. Master called Nigerian naval vessel in vicinity. Later pirates released the crew and left the vessel. All crew safe.<p/></td>

there is no width markup in the text. I changed the line of code to look for for incident in soup('td', class="fabrik_row_jos_fabrik_icc-ccs-piracymap2010_narrations"):

it appears that class is a reserved word maybe?

How do I get the current example code to run, and has more changed in the appliction than just the html ourput?

The url I am using tableid=534&calculations=0&Itemid=82")

tableid=534&calculations=0&Itemid=82")> page =
urllib2.urlopen("http://www.icc-ccs.org/index.php?option=com_fabrik&amp;view=table&amp;tableid=534&amp;calculations=0&amp;Itemid=82")tableid=534&amp;calculations=0&amp;Itemid=82")tableid=534&amp;calculations=0&amp;Itemid=82")
A: 

class is a reserved word and will not work with that method.

This method works but does not return the list

soup.find("tr", { "class" : "fabrik_row_jos_fabrik_icc-ccs-piracymap2010_narrations" })

And I confrmed the class flow for the parse. The example will run but the html must be parsed with diffeernt methods because the width='90%' is no longer in the html.

Still working on the proper methods will post back when I get it working.

NewB
A: 

There must be a better way....

import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen("http://www.icc-ccs.org/index.php?option=com_fabrik&amp;view=table&amp;tableid=534&amp;calculations=0&amp;Itemid=82") soup = BeautifulSoup(page) soup.find("table",{"class" : "fabrikTable"}) list1 = soup.table.findAll('p', limit=50) i = 0 imax = 0 for item in list1 : imax = imax +1 while i < imax: Itime = list1[i] i=i+2 Incident = list1[i] i=i+1 Inext = list1[i] print "Time ", Itime print "Incident", Incident print " " i=i+1

NewB