I am working on a wrapper that parses a data file for an existing web tool. However, I am using large data files so I cannot load the whole thing into memory so I have to walk through it loading it in small chunks. The existing web tool expects data in a style similar to SimpleXML ($obj->parentnode->childnode->childnode returns a string or a node object of some sort). Thankfully the structure is similar to XML but the syntax is odd. And I can't just translate it to a sane format because of extenuating circumstances. So I have to emulate it on the fly.
As I walk through the file I won't be needing to parse the whole tree, just the sub-node names of the current node. Each sub-node name and associated offset will be stored in the parent node. If contents of a sub-node need to be accessed then the parent-node object will be cloned, offset values will be updated and the sub-node object will begin parsing it's content until it finds the requested subnode.
The questions I have are:
- Cloning the parent node object will give child clones the file handle. Should all the clones use the same handle and use fseek to jump around the file if needed (and that is a pretty big if)?
- Do I need to close the file? Or will garbage collection at the end of script execution close it? What dangers do I face if I don't?
- Will I need to create handles for each clone, or should I stick with them sharing one? If so is there an upper limit?
- Is there a way for a cloned object to hold a reference to the original object? If I am putting the handle close in the object destructor I probably shouldn't close it if the object is a clone. And being able to trace upwards may come in handy, possibly.