ansaurus

Question

Answer 1

+1 A:

You can raise an exception and wrap your .feed() call in a try block.

You can also call self.reset() when you decide, that you are done (I have not actually tried it, but according to documentation "Reset the instance. Loses all unprocessed data.", - this is precisely what you need).

shylent 2010-01-02 07:46:49

An exception doesn't sound like a nice idea here - exceptions should be used only for exceptional conditions, and in this case you just propose it to be used as a control-flow tool. As for the 'reset' method, I've considered it too but I can't figure out if it's really relevant here

Eli Bendersky 2010-01-02 07:48:30

re: "exceptions .. for exceptional conditions" - not so true for python. Do you know, that StopIteration is raised whenever an iterator "runs out of" iterations? That's not much of an "exceptional condition", now is it? In fact it is distinctly similar to the condition, that the questioner wants to handle, - a "break now" kind of condition.

shylent 2010-01-02 07:51:02

@shylent: true about StopIteration, but that is rarely handled manually, but rather is wrapped so that the user almost never sees it directly. Nevertheless, you're making a good point.

Eli Bendersky 2010-01-02 07:54:08

Answer 2

A:

If you use pyparsing's scanString method, you have more control over how far you actually go through the input string. In your example, we create an expression that matches a <meta> tag, and add a parse action that ensures that we only match the tag with name="description". This code assumes that you have read the page's HTML into the variable htmlsrc:

from pyparsing import makeHTMLTags, withAttribute

# makeHTMLTags creates both open and closing tags, only care about the open tag
metaTag = makeHTMLTags("meta")[0]
metaTag.setParseAction(withAttribute(name="description"))

try:
    # scanString is a generator that returns each match as it is found
    # in the input
    tokens,startloc,endloc = metaTag.scanString(htmlsrc).next()

    # attributes can be accessed like object attributes if they are 
    # valid Python names
    print tokens.content

    # if the attribute name clashes with a Python keyword, or is 
    # otherwise unsuitable as an identifier, use dict-like access instead
    print tokens["content"]

except StopIteration:
    print "no matching meta tag found"

Paul McGuire 2010-01-02 23:29:42

Thanks for the answer. I'm sure this works as well and I appreciate having somewhat of an introduction to pyparsing. I would mark both correct if I could.

Michael Mior 2010-01-02 23:46:42

ansaurus

tags:

views:

answers:

Abort HTMLParser processing in Python

related questions