tags:

views:

615

answers:

2

I am trying to parse XML in Perl using XML::SAX parser. My query is regarding generating attributes values. Right now I am able to generate only values present inside the tag elements but my goal is to generate:

Element Name: Element Value:
     Element Attribute Name: Element Attribute Value:
   Element Child Name: Element Child Value
     Element Child Attribute Name: Element Child Attribute Value

Here is my books1.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="urn:books"
            xmlns:bks="urn:books">

  <xsd:element name="books" type="bks:BooksForm"/>

  <xsd:complexType name="BooksForm">
    <xsd:sequence>
      <xsd:element name="book"
                   type="bks:BookForm"
                   minOccurs="0"
                   maxOccurs="unbounded"/>
      </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="BookForm">
    <xsd:sequence>
      <xsd:element name="author"   type="xsd:string"/>
      <xsd:element name="title"    type="xsd:string"/>
      <xsd:element name="genre"    type="xsd:string"/>
      <xsd:element name="price"    type="xsd:float" />
      <xsd:element name="pub_date" type="xsd:date" />
      <xsd:element name="review"   type="xsd:string"/>
    </xsd:sequence>
    <xsd:attribute name="id"       type="xsd:string"/>
  </xsd:complexType>
</xsd:schema>

Here is my sample Books.xml:

 <?xml version="1.0" encoding="UTF-8"?>
<!--Sample XML file generated by XMLSpy v2009 sp1 (http://www.altova.com)--&gt;
<bks:books xsi:schemaLocation="urn:books Untitled1.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:bks="urn:books">
        <book id="String">
                <author>String</author>
                <title>String</title>
                <genre>String</genre>
                <price>3.14159E0</price>
                <pub_date>1967-08-13</pub_date>
                <review>String</review>
        </book>
</bks:books>

Here is my parser.pl file:

#!usr/bin/perl -w

use XML::SAX::ParserFactory;
use MyHandler;

my $handler = MyHandler->new();
my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);
$parser->parse_uri("books1.xml")

Here is my MyHandler.pm module:

package MyHandler;

use base qw(XML::SAX::Base);

my $in_books = 0;

sub start_element {
        my ($self,$data) = @_;
        if($data->{Name} eq 'bks:books'){
           $in_books++;

        }
}

sub end_element {
        my($self,$data) = @_;
        if($data->{Name} eq 'bks:books){
                $in_books--;
                print "\n";
        }
}

sub characters{
        my($self,$data) = @_;
        if($in_books){
                print $data->{Data};
        }
}
1;
+3  A: 

I can see a couple of things that might be wrong with your code segment:

  • In your start_element method, you refer to an undeclared variable $in_books. This should probably be $in_productOffering. Tip: if you include use strict; at the top of your module, perl will give an error if you accidentally use an undeclared variable
  • Your start_element method checks for books, but the XML file only has bks:books or book elements
  • Your script starts with #!usr/bin/perl -w, but this probably requires a slash as third character, i.e. #!/usr/bin/perl -w
  • The SAX parser does not require an XSD file
Andomar
I have rectified the errors but my main question is how can i get attributes name and their values using SAX parser ?
Rachel
If your edit to the question reflects your changes, you're still declaring `$in_productOffering` but using `$in_books`. Include `use strict;` right after `package MyHandler;`
Andomar
i have modified the changes as suggested, how can we get attribute name and their value along with element names and their values ? Right now am only getting Element Nodes Value
Rachel
In the `start_element` method, the attributes are in `$data->{Attributes}`. You can access the id element like `$data->{Attributes}->{'{}id'}->{Value}`. The `{}` stands for "no namespace"
Andomar
Do we every time need to add the attribute which we need to get printed or is there any way using SAX that as it traverses it generated Attribute Names - Attribute Values and Element Names - Element Values ?
Rachel
The attributes are a regular hashtable :) You can iterate over them, for example `my %attribs = %{$data->{'Attributes'}}; foreach( keys( %attribs )) { print " $_ = " . $attribs{$_}->{Value} . "\n"; }`
Andomar
Thank you Andomar for your guidance...one more thing I wanted to ask is that in my xml baseversion attribute value is 187 and after printing attribute values I get baseverion as baseVersion = HASH(0x8fc479c), which is an hash value and this is happening for all attribute values, how can I generate real value from this hash value.
Rachel
Try `$baseVersion->{Value}` to get its value. Have a look at the `Data::Dumper` module; it allows you to print the "inside" of the hash like `print Dumper($baseVersion);`
Andomar
Instead of doing it separately for each and every attribute is there way we get actual value for all attribute.
Rachel
I tried following the same logic to get element names as with the attributes but it didn't worked. I believe attribute and Name both are stored at Hash
Rachel
my %attribs = %{$data->{'Attributes'}}; foreach(keys(%attribs)){ print (Dumper("$_ = ".$attribs{$_}))->{Value}; }I have tried doing this but as output am getting $VAR1 = '{}baseVersion = HASH(0x8fd8a84)'; but not the actual value.
Rachel
I was able to generate actual values for all attributes. I had initially used a block to print values of attributes and because of that it was printing hash values but when I used command as single line, it printed the actual values for the attributes. I do not know this could be the reason but it worked. What do you think could be the reason.
Rachel
Might be related to the position of the `)`, not sure!
Andomar
A: 

It looks like you want to print a subset of the DOM tree. Use XML::DOM. See also Why does my XSD file fail to parse with XML::LibXML?

Sinan Ünür
I do not want to print subset of DOM Tree. I wanted to parse through an XML using SAX parser and generate Element Name->Value pairs displaying all attributes and their pairs. I have used approach Andomar suggested and I am able to get attributes value but the thing is that it is displayed in Hash Value and not their actual form. So out of my goal of generating element name, element value pair with attribute name and attribute value pair with their actual details am just able to generate element values and attributes values in terms on hash value.
Rachel