I need to parse several large size XML files (one is ~8GB, others are ~4MB each) and merge them. Since both SAX and Tie::File
are not suitable due to memory and time issues, I decided to try Twig.
Suppose each XML file is composed of several elements as follows:
<class name=math>
<student>luke1</student>
... (a very very long list of student)
<student>luke8000000</student>
</class>
<class name=english>
<student>mary1</student>
...
<student>mary1000000</student>
</class>
As you see, even if I use TwigRoots => {"class[\@name='english']" => \&counter}
I still need to wait a long time for Twig to start to parse class=english
because it needs to go over each line of class=math
first (correct me if it does not need to go over each line).
Is there any way to let Twig start the parsing from a line number, rather than the beginning of a file? I can get the line number of <class name = english>
using grep, which is much faster.
Thanks in advance.