I'm working on a project that requires me to write some code to pull out some text from a html file in python.
<tr>
<td>Target binary file name:</td>
<td class="right">Doc1.docx</td>
</tr>
^Small portion of the html file that I'm interested in.
#! /usr/bin/python
import os
import re
if __name__ == '__main__':
f = open('./results/sample_result.html')
soup = f.read()
p = re.compile("binary")
for line in soup:
m = p.search(line)
if m:
print "finally"
break
^Sample code I wrote to test if I could extract data out. I've written several programs similar to this to extract text from txt files almost exactly the same and they have worked just fine. Is there something I'm missing out with regards to regex and html?