views:

33

answers:

1

i have code, which does something like this:

item.previous.parent.parent.aTag['href']

now i would like to be able to add filters fast, so hardcoding is no longer an option. how can i access the same tags with a path coded in a string?

of course i could invent some format like [('getattr', 'previous'), ('getattr', 'parent'), ..., ('getitem', 'href)] and parse it with _getattr_ and _getitem_.

Now the question: Is there already a more beautiful way of doing it, or do i need to implement it myself?

+1  A: 

It seems to me that you would be better off using a XPATH expression. This discussion has some information about an XPATH plugin for BeautifulSoup called BSXPath. I haven't used it so I do not know if it will serve your purpose.

If you are willing to replace BeautifulSoup then lxml has a really powerful XPATH implementation.

update

See @llasram's comment to this answer:

These days lxml has an html module which the BeautifulSoup implementer himself recommends over BeautifulSoup. It even has a soupparser sub-module which will use BeautifulSoup to do the parsing!

Manoj Govindan
i thought about xpath, but the cool thing about beautifulsoup is, that i do not need to rely on correct and stable html, but can parse anything that looks like html.
allo
@allo: valid point. I think in that case you will have to implement it yourself.
Manoj Govindan
These days lxml has an `html` module which the BeautifulSoup implementer himself recommends over BeautifulSoup. It even has a `soupparser` sub-module which will use BeautifulSoup to do the parsing!
llasram
lxml seems good.
allo