tags:

views:

185

answers:

1

So I asked a question earlier about retrieving high scores form an html page and another user gave me the following code to help. I am new to python and beautifulsoup so I'm trying to go through some other codes piece by piece. I understand most of it but I dont get what this piece of code is and what its function is:

    def parse_string(el):
       text = ''.join(el.findAll(text=True))
       return text.strip()

Here is the entire code:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import sys

URL = "http://hiscore.runescape.com/hiscorepersonal.ws?user1=" + sys.argv[1]

# Grab page html, create BeatifulSoup object
html = urlopen(URL).read()
soup = BeautifulSoup(html)

# Grab the <table id="mini_player"> element
scores = soup.find('table', {'id':'mini_player'})

# Get a list of all the <tr>s in the table, skip the header row
rows = scores.findAll('tr')[1:]

# Helper function to return concatenation of all character data in an element
def parse_string(el):
   text = ''.join(el.findAll(text=True))
   return text.strip()

for row in rows:

   # Get all the text from the <td>s
   data = map(parse_string, row.findAll('td'))

   # Skip the first td, which is an image
   data = data[1:]

   # Do something with the data...
   print data
+3  A: 

el.findAll(text=True) returns all the text contained within an element and its sub-elements. By text I mean everything not inside a tag; so in <b>hello</b> then "hello" would be the text but <b> and </b> would not.

That function therefore joins together all text found beneath the given element and strips whitespace off from the front and back.

Here's a link to the findAll documentation: http://www.crummy.com/software/BeautifulSoup/documentation.html#arg-text

Eli Courtwright
use backticks for HTML. :)
Paolo Bergantino
why is there a '' with nothing in it to open the text = ? and what do the commands join and strip do exactly? And why did this have to be defined as a function before it was applied to data? Thanks.
Alex
''.join means join each item with an empty string (so there's no delimiter).
Jacob
What is its purpose then if ''.join('hello world') = 'hello world'
Alex
The purpose is to join *sequences* of strings into a single string: so ''.join(["hel", "low", "ord"]) gives "helloworld", for example.
Alex Martelli