How to find out the content between two words or two sets of random characters?
The scraped page is not guaranteed to be Html only and the important data can be inside a javascript block. So, I can't remove the JavaScript.
consider this:
<html>
<body>
<div>StartYYYY "Extract HTML", ENDYYYY
</body>
Some Java Scripts code STARTXXXX "Extract JS Code" ENDXXXX.
</html>
So as you see the html markup may not be complete. I can fetch the page, and then without worrying about anything, I want to find the content called "Extract the name" and "Extract the data here in a JavaScript".
What I am looking for is in python:
Like this:
data = FindBetweenText(UniqueTextBeforeContent, UniqueTextAfterContent, page)
Where page is downloaded and data would have the text I am looking for. I rather stay away from regEx as some of the cases can be too complex for RegEx.