views:

397

answers:

2

The only example code I have found so far is so old it won't work anymore (uses deprecated classes). All I need is something basic that demonstrates:

  1. Loading and parsing the XML from a file

  2. Defining the SAX event handler(s)

  3. Reading the attributes or text values of the element passed to the event handler

+3  A: 

How about the distribution itself?

Go to XML::LibXML distribution page and click browse.

Note the following caution in the documentation:

At the moment XML::LibXML provides only an incomplete interface to libxml2's native SAX implementation. The current implementation is not tested in production environment. It may causes significant memory problems or shows wrong behaviour.

There is also XML::SAX which comes with nice documentation. I used it a few times and worked well for my purposes.

Sinan Ünür
Thanks. I never noticed the Browse link before. I did see the warning you cite, however. Would you recommend a different Perl SAX parser? I'm not picky. It is not for large files, but I prefer the event-driven approach for this problem because I am reading sparse data output by Excel.
Paul Chernoch
The XML::SAX documentation looks intelligible. You get my vote.
Paul Chernoch
@Paul Thank you.
Sinan Ünür
+3  A: 

Sinan's suggestion was good, but it didn't connect all the dots. Here is a very simple program that I cobbled together:

file 1: The handlers (MySAXHandler.pm)

  package MySAXHandler;
  use base qw(XML::SAX::Base);

  sub start_document {
    my ($self, $doc) = @_;
    # process document start event
  }

  sub start_element {
    my ($self, $el) = @_;
    # process element start event
    print "Element: " . $el->{LocalName} . "\n";
  }

1;

file 2: The test program (test.pl)

#!/usr/bin/perl

use strict;
use XML::SAX;
use MySAXHandler;

my $parser = XML::SAX::ParserFactory->parser(
        Handler => MySAXHandler->new
);

$parser->parse_uri("some-xml-file.xml");

Note: How to get the values of an element attribute. This was not described in a way that I could use. It took me over an hour to figure out the syntax. Here it is. In my XML file, the attribute was ss:Index. The namespace definition for ss was xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet". Thus, in order to get the silly Index attribute, I needed this:

my $ssIndex = $el->{Attributes}{'{urn:schemas-microsoft-com:office:spreadsheet}Index'}{Value};

That was painful.

Paul Chernoch