views:

126

answers:

2

I've got a large XML file, which takes over 40 seconds to parse with XML::Simple.

I'd like to be able to cache the resulting parsed object so that on the next run I can just retrieve the parsed object and not reparse the whole file.

I've looked at using Data::Dumper but the documentation is a bit lacking on how to store and retrieve its output from disk files. Other classes I've looked at (e.g. Cache::Cache appear designed for storage of many small objects, not a single large one.

Can anyone recommend a module designed for this?

EDIT. The XML file is ftp://ftp.rfc-editor.org/in-notes/rfc-index.xml, and I went with Storable for speeding up subsequent runs. Changing the XML parser would have required very significant code changes.

On my Mac Pro benchmark figures for reading the entire file with XML::Simple vs Storable are:

      s/iter  test1  test2
test1   47.8     --  -100%
test2  0.148 32185%     --
+8  A: 

Data::Dumper is actually VERY simple. If your object is a hashref $HashRef:

# Write
open(FILE, ">your_filename") || die "Can not open: $!";
print FILE Data::Dumper->Dump([$HashRef],["HashRef"]);
close(FILE) || die "Error closing file: $!";

# Read
my $HashRef;
$HashRef = eval { do "your_filename" };
   # Might need "no strict;" before and "use strict;" after "do"
die "Error reading: $@" if $@;
# Now $HashRef is what it was before writing

Another good option is using Storable. From POD:

use Storable;
store \%table, 'file';
$hashref = retrieve('file');

For a very good guide on various options (as well as a better example of Data::Dumper usage) see Chapter 14 "Persistence" of brian d foy's "Mastering Perl" book

DVK
thanks for the `Data::Dumper` example - it was the `$var = eval { do "file" }` bit I was missing. However `Storable` is even better - it's exactly what I was looking for :)
Alnitak
Curiously, when I look at that Google Books link, exactly the pages you need for the Dumper stuff are the ones they decided not to show.
brian d foy
@brian - that's weird - I can see them perfectly fine... 4 pages down from where the link opens for me.
DVK
Maybe they block out different pages at different times or for different sessions. As I recall, I was able to get all of Learning Perl in a couple of days a year ago (which is why I think Google blocking any pages at all is just lame).
brian d foy
@brian - slightly tangentially - I always wondered what an author's perspective on having their book on Google Books is... do you feel comfortable about that? cheated out of income? Opposite? (free promotion)?
DVK
Maybe if you run into me at a conference I'll tell you, but I don't have very nice thoughts about it and it's a lot more complicated than most people think it is. :)
brian d foy
what's the best way to purchase to increase the return to you?
Alnitak
@Alnitak: use brian's Amazon affiliate link in his user profile or on his web page ;)
Ether
Well, I think the best return to me is to buy multiple copies directly from O'Reilly at full price so you never have to carry it around. Just leave a copy everywhere you will ever be. That's the worst for you though. An author gets a percentage of the wholesale rate from Amazon, which is about 50% (or lower) of the cover price (which is why most books are 30% off, or whatever).
brian d foy
Also, if you can check out www.effectiveperlprogramming.com, the website for my latest book. There are various ways to give us money through that, along with a lot of Perl content. :)
brian d foy
+5  A: 

Storable. That's the lazy answer. (Prefer nstore over store.)

The opposite of data dumping is eval.

The good answer is: You really want to learn to use an XML module suitable for heavy processing such as XML::Twig or XML::LibXML to speed up parsing, so you do not need this caching monkey code.

daxim
@daxim - (1) learning to use Storable or equivalent is a good tool regardless; (2) If he does some heavy (pre)processing of XML, the savings onm cached raw data structure can be substantial resource-wise.
DVK
⑴ I did not omit it from my answer precisely for this reason. ☺ ⑵ Speculation, we don't know the use case.
daxim
In a file, the opposite of dumping is `do`
ysth