views:

50

answers:

2

Let's say I have a structure like this:

<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>

If I point to bookmark, what would be the command to just extract all of the folder lines? For example,

bookmarks = soup.findAll('bookmark')

then beautifulsoupcommand(bookmarks[0]) would return:

[<folder name="folder1">,<folder name="folder2">]

I'd also want to know when the ending tags hit too. Any ideas?

Thanks in advance!

+2  A: 

bookmarks[0].findParents('folder') will return you a list of all parent nodes. You can then iterate over them and use their name attribute.

eumiro
@eumiro: BS only returns the first (immediate) parent as the name of the parent and grandparent are the same. It does return two generations of parents if the parent and grandparent are differently named.
Manoj Govindan
+3  A: 

Here is my stab at it:

>>> from BeautifulSoup import BeautifulSoup
>>> html = """<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>
"""
>>> bookmarks = soup.findAll('bookmark')
>>> [p.get('name') for p in bookmarks[0].findAllPrevious(name = 'folder')]
[u'folder2', u'folder1']

The key difference from @eumiro's answer is that I am using findAllPrevious instead of findParents. When I tested @eumiro's solution I found that findParents only returns the first (immediate) parent as the name of the parent and grandparent are the same.

>>> [p.get('name') for p in bookmarks[0].findParents('folder')]
[u'folder2']

>>> [p.get('name') for p in bookmarks[0].findParents()]
[u'folder2', None]

It does return two generations of parents if the parent and grandparent are differently named.

>>> html = """<folder name="folder1">
     <folder_parent name="folder2">
          <bookmark href="link.html">
     </folder_parent>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.findAll('bookmark')
>>> [p.get('name') for p in bookmarks[0].findParents()]
[u'folder2', u'folder1', None]
Manoj Govindan