I'm trying to take a string of text, and "extract" the rest of the text in the paragraph/document from the html.
My current is approach is trying to find the "parent tag" of the string in the html that has been parsed with lxml. (if you know of a better way to tackle this problem, I'm all ears!)
For example, search the tree for "TEXT STRING HERE" and return the "p" tag. (note that I won't know the exact layout of the html beforehand)
<html>
<head>
...
</head>
<body>
....
<div>
...
<p>TEXT STRING HERE ......</p>
...
</html>
Thanks for your help!