views:

38

answers:

2

Hi guys, I am fairly new to XML dev.. I had a few questions regarding XML parsing with XPATH and libxml.

I have an XML structured as :

<resultset>
        <result count=1>
           <row>
               <name> He-Man! </name> 
               <home> Greyskull </home>
           <row>
        </result>
        <result count=2>
           <row>
               <name> Spider-Man</name> 
               <home> Some downtown apartment </home>
           <row>
           <row>
               <name> Disco-Man!</name> 
               <home> The 70's dance floor </home>
           <row>
        </result>
<resultset>

I need to pick out the names from this XML , but where the count is 2 , i need it only from the first record. I ran through a few tutorials, but i am unable to come up with an XPATH query which would serve this purpose.

/name will select all name elements.

/result[@count > 1 ]/row[1]/name | /result[@count =1 ]/row/name

Is this possible to be done with XPATH ? Is this better to be done via XPATH or by walking the XML tree?

Can some one point me to some complex searches through out XML's ?

Edit : The actual scenario requires select a subset of the XML row , which are nested at 2 levels at times. This sounds like i need to OR '|' many paths to select the nodes i require... I am not sure if that would be efficient as opposed to walking a tree... The above is typed to replicate the problem :)

Thanks!

A: 

I'd probably keep my xpath simpler and just extract both cases, then loop over both node sets.

If you do need to go down the single xpath route, you should try out your xpath expressions in something that lets you enter them live, rather than having to recompile C/C++ code. You should be able to do that by loading your XML into firefox and using firebug - for example typing $x('//name') in the firebug console gives three nodes.

NOTE however that your XML is invalid... You have a bunch of "<row>"s that should be "</row>" and the same for "<resultset>" and your counts need to be

<result count="1">

i.e. with quote marks around the value.

Michael Anderson
Thanks, I just typed up a simplified version of the one i am trying to parse :)
Ricko M
+1  A: 

Try this XPath -

/resultset/result[@count=2]/row/name

This will give a list of all nodes falling under this XPath. From this just take the first element (as you needed only the first record).

MovieYoda
@movieyoda: +1 Good answer. Your wrote: *From this just take the first element*. In XPath that would be: `(/resultset/result[@count=2]/row/name)[1]`.
Alejandro
@alejandro. Yupp, that's what you do to get the first element. Also as far as possible try to use absolute path in XPaths. That way you are sure to get what you want instead of some spurious nodes getting selected.
MovieYoda
@movieyoda: About *some spurious nodes getting selected*, that is because axis have more precedence than predicates and position predicates follows axis direction. That is why I've complete your answer with the *take the first* expression
Alejandro
@movieyoda: Conceptually , yes.. I am not sure though how the implementation of libxml2 actually does it , if i had to select elements ( selective ones) which are at a different level in hierarcy, i'd assume i would have to "|" multiple paths. I was not sure how this would perform. Also unclear on how to handle missing nodes in the xml ( if any due to error handling on the source)... any way.... i chose to go the manually walking path for my problem as it offered greater flexibility. Thanks movieyoda , Alejandro..
Ricko M