views:

38

answers:

2

I know this is the basic.

I'm just wondering what is the elegant way to do it.

For example:

I want the the 'python01.wav' and 'py*thon' strings from this list

The list is like this:

[
[('name', 'entry')],
[('class', 'entry')],
[('type', 'text/javascript'), ('src', '/term_added.php?hw=python')],
[('type', 'text/javascript')],
[('class', 'headword')],
[('class', 'hw')],
[],
[('class', 'pr')],
[('class', 'unicode')],
[('class', 'unicode')],
[('class', 'unicode')],
[('class', 'unicode')],
[],
[('href', '#'), ('onclick', "playAudio('python01.wav', 'py*thon'); return false;"), ('class', 'audio_link'), ('target', '_blank')],
[('src', '/images/audio.gif'), ('alt', 'Listen to audio'), ('title', 'Listen to audio')],
[],
[('class', 'fl')],
[],
[('class', 'in')],
[('class', 'il')],
[('class', 'if')],
[],
[('class', 'def')],
[('class', 'gram')],
[],
]

Thank you for your help!

+1  A: 
return ('python01.wav', 'py*thon')

This satisfies your specification perfectly.

But if I had to guess, I don't think it's what you want.

So why don't you give us enough information that we can actually figure out what strings you want to get? Is it everything between single-quotes in one of the strings? Everything between single-quotes that contains the letters p,y,t,h,o,n in that order? The arguments to a playAudio call?

Without knowing what you want, we can't give you a solution that solves your problem.

Anon.
Sorry. I mean the arguments to a playAudio call
zjk
+2  A: 

Perhaps not the greatest solution, but appears to do what you want:

l = [huge list from your example]
for e in l: # for each list
    for t in e: # for each tuple
        for s in t: # each string
            if 'playAudio' in s:
                args = s[9:].split(',') #skip 'playAudio' split on comma
                print "%s,%s" % (args[0].strip('('),
                                 args[1].lstrip(" ")[0:args[1].find(')')]

I leave 'optimizing' this an exercise to you. If you could explain where this data is coming from and what sort of characteristics it has (can playAudio only be attached to things with an HREF attribute?), we could give you a better solution.

EDIT:

Personally for your specific example, I would do this:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re
import urllib2

doc = urllib2.urlopen("http://www.learnersdictionary.com/search/python").read()
doc = doc.replace('</SCR', '')
audioLinks = SoupStrainer('a', onclick=re.compile(r'^playAudio'))
soup = [str(elm) for elm in BeautifulSoup(doc, parseOnlyThese=audio)]
for elm in soup:
    print re.search(r'playAudio\((.*[^)])\)', elm).group(1)
    # prints 'python01.wav', 'py*thon'
Nick Presta
Thanks.This is from http://www.learnersdictionary.com/search/pythonI'm learning English. So I want to hear how to pronounce words.
zjk
I get this data from HTMLParser. It's in the python's lib
zjk