tags:

views:

52

answers:

1

I only want to parse an interested element of xml (e.g. see below: class element with name equals to math) and I want to stop once the first element hitting this condition is parsed. (since There is only one class whose name is math, it is unnecessary to continue once the element is already found).

However, if I implement as follows, the code continues to read the whole file after it found the element i am interested (the xml file is very long so it takes long time). my question is how to stop it once the first class element with name = math is parsed?

my $twig = new XML::Twig(TwigRoots => {"class[\@name='math']" => \&class}); $twig->parsefile( shift @ARGV );

besides, I also want to delete this class from xml file (not only from memory) after it is parsed so that next time when parsing a class with other names, the class element will not be parsed. Is it possible to do that?

+4  A: 

It seems what you're looking for are XML::Twig's finish_print and finish_now :

finish_print

Stops twig processing, flush the twig and proceed to finish printing the document as fast as possible. Use this method when modifying a document and the modification is done.

finish_now

Stops twig processing, does not finish parsing the document (which could actually be not well-formed after the point where finish_now is called). Execution resumes after the Lparse> or parsefile call. The content of the twig is what has been parsed so far (all open elements at the time finish_now is called are considered closed).

DVK
Here's an example of using finish_now: http://cpansearch.perl.org/src/MIROD/XML-Twig-3.35/tools/xml_grep/xml_grep
DVK
thx DVK. it seems I have to install perl 5.10.X to use finish_now while my sys. is 5.8.4. is it easy to install 5.10.X? Besides, but even with finish_now, if a class is behind a class with lots of content, it still takes time to locate there. Can I indicate a line number from where twig starts to parse elements? I can use grep to get the line number of all class elements. why let twig to look for the interested element line by line, which is so slow?
in summary, suppose class with name=math start from line 2000, can I have twig parse the xml from line 2000, without go from the beginning of the xml file? I do not understand why twig spend so much time tp parse my xml file even if I set TwigRoots => {"class[\@name='math']. I think it should have some way skip reading the sub-element under other classes. maybe I am wrong, it still parse line by line by just do not store them in the memory.
it works now. should use finish_now. not finish_now()