tags:

views:

21

answers:

2

I am navigating this office open xml file using XPath 1.0 (extract):

<sheetData ref="A1:XFD108">
    <row spans="1:3" r="1">
        <c t="s" r="A1">
            <is>
                <t>FirstCell</t>
            </is>
        </c>
        <c t="s" r="C1">
            <is>
                <t>SecondCell</t>
            </is>
        </c>
    </row>
    <row spans="1:3" r="2">
        <c t="s" r="A2">
            <is>
                <t>ThirdCell</t>
            </is>
        </c>
        <c t="s" r="C2">
            <is>
                <t>[persons.ID]</t>
            </is>
        </c>
    </row>
</sheetData>

I need to find the cell that says "[persons.ID]", which is a variable. Technically, I need to find the first <row> containing a descendant::t that starts with [ and closes with ]. I currently have:

.//row//t[starts-with(text(), '[') and 
substring(text(), string-length(text())) = ']']/ancestor::row

So I filter and then go up again. It works, but I'd like to understand XPath better here - I found no way filter the predicate. Can you point me to a valid equivalent of doing something like .//row[descendant::t[starts-with()...]].

Any help is greatly appreciated.

+1  A: 

One option:

.//row[starts-with(descendant::t/text(),'[') and substring(descendant::t/text(), string-length(descendant::t/text())) = ']' ]

This will give you the row, however one significant problem could be if your row has two t elements that would satisfy different conditions, but not both conditions. e.g. one t starts with [, and another ends with ]

Obvsiously, what you have doesn't have this problem

Another option: use translate

.//row[translate(descendant::t/text(),"0123456789","") = "[]"]

That will strip the numeric characters and then it's a simple comparison to the [] characters

Jonathan Fingland
Thanks for highlighting the potential issue for the first option.
PeerBr
Thanks for highlighting the potential issue for the first option. However, when testing the first option even with a subset of your solution (.//row[starts-with(descendant::t/text(),'[')]), I get a "too many items" error in XMLSpy and no results in my prog. Probably because there can be >1 descendant::t? Regarding the second option, I think that's ok if the within [] are integers, but they are in practice [a-zA-Z0-9\.] and could possibly be any char, so I think that's not suited to my particular problem.
PeerBr
@PeerBr: you *could* add the additional character possibilities to the 2nd parameter of translate though that could, obviously, get quite large.
Jonathan Fingland
+2  A: 

Technically, I need to find the first containing a descendant::t that starts with [ and closes with ].

/sheetData/row[c/is/t[starts-with(.,'[')]
                     [substring(.,string-length(.))=']']]
              [1]

or

/sheetData/row[.//t[starts-with(.,'[') and
                    substring(.,string-length(.))=']']][1]

or

(//row[.//t[starts-with(.,'[') and
            substring(.,string-length(.))=']']])[1]
Alejandro
3) is exactly what I was looking for - I didn't know I could nest predicates, and the . operator still puzzles me as well sometimes :-) Thank you.
PeerBr
@Alejandro: +1 for a good answer.
Dimitre Novatchev
@PeerBr: You are wellcome. Also, do note that last expression hasn't the best performance (starting `//` operator, forcing expression evaluation first with `()` maybe loosing optimization). If you know your input schema, you should go with the first expression.
Alejandro