views:

185

answers:

1

Working code: Google dictionary lookup via python and beautiful soup -> simply execute and enter a word.

I've quite simply extracted the first definition from a specific list item. However to get plain data, I've had to split my data at the line break, and then strip it to remove the superfluous list tag.

My question is, is there a method to extract the data contained within a specific list without doing my above string manipulation - perhaps a function in beautiful soup that I have yet to see?

This is the relevant section of code:

# Retrieve HTML and parse with BeautifulSoup.
    doc = userAgentSwitcher().open(queryURL).read()
    soup = BeautifulSoup(doc)

# Extract the first list item -> and encode it.
    definition = soup('li', limit=2)[0].encode('utf-8')

# Format the return as word:definition removing superfluous data.
    print word + " : " + definition.split("<br />")[0].strip("<li>")
+1  A: 

I think you are looking for findAll(text=True) this will extract the text from the tags

definitions = soup('ul')[0].findAll(text=True)

Will return a ist of all the text contents broken at the tag boundaries

Jason Culverhouse