Somewhat related to my earlier question. I'm making a simple html parser to play around with in Python 2.7. I would like to have multiple parse types, IE can parse for links, script tags, images, ect. I'm using the HTMLParser module, so my initial thoughts were just make a separate class for each thing I want to parse. But that seemed rather silly. Is there a way to go about doing this without creating multiple classes? I am more familar with C#, so I figured I'd just pass a parameter on the init method to specify what exactly to parse for, just like I would in .Net, however I don't seem to be doing it correctly. It doesn't work, and it just doesn't 'look' right. Here's the current working code: How would I modify this to I can just have the one class, and the parameters that are passed indicate the type of HTML tags to parse?
class LinksParser(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
req = urllib2.urlopen(url)
self.feed(req.read())
def handle_starttag(self, tag, attrs):
if tag !='a': return
for name, value in attrs:
print("Found Link --> [{0}]{1}".format(name, value))