Hello!
I am brand new to python, and I need some help with the syntax for finding and iterating through html tags using lxml. Here are the use-cases I am dealing with:
- HTML file is fairly well formed (but not perfect). Has multiple tables on screen, one containing a set of search results, and one each for a header and footer. Each result row contains a link for the search result detail.
1) I need to find the middle table with the search result rows. - this one I was able to figure out: self.mySearchTables = self.mySearchTree.findall(".//table") self.myResultRows = self.mySearchTables[1].findall(".//tr")
2) I need to find the links contained in this table. - this is where I'm getting stuck: for searchRow in self.myResultRows: searchLink = patentRow.findall(".//a")
It doesn't seem to actually locate the link elements
3) I need the plain text of the link. I imagine it would be something like searchLink.text if I actually got the link elements in the first place.
Finally, in the actual API reference for lxml, I wasn't able to find information on the find and the findall calls. I gleaned these from bits of code I found on google. Am I missing something about how to effectively find and iterate over HTML tags using lxml?
Thanks in advance for your help!
Shaheeb Roshan