I'm working on a web parser using urllib. I need to be able to only save lines that lie within a certain div tag. for instance: I'm saving all text in the div "body." This means all text within the div tags will be returned. It also means if there are other divs inside of it thats fine, but as soon as I hit the parent it stops. Any ideas?
My Idea
search for the div you're looking for.
Record the position.
Keep track of any divs in the future. +1 for new div -1 for end div.
when back to 0, your at your parent div? Save location.
Then save data from beginnning number to end number?