tags:

views:

61

answers:

2

<td></td><td>foo</td>

I would like to return ['', 'foo'] but libxml's xpath //td/text() returns just ['foo']. How do I find the empty tag as '' instead of (not matched)?

+4  A: 

As long as you are selecting text nodes specifically, you can't. Because there simply is no text node in the first <td>.

When you change your XPath expression to '//td', you get the two <td> nodes. Use their text value in further processing.

Tomalak
I wound up finding all the `td` nodes and calling .text on them. Not as cool as doing everything in one big XPath ;-) but it works.
joeforker
@joeforker: As long as you don't have access to the all-shiny XPath 2.0, that's your only option. :-)
Tomalak
+2  A: 

While @Tomalak is perfectly right, in XPath 2.0 one can use:

//td/string(.)

and this produces a sequence of strings -- each one containing the string value of a corresponding td element.

So, in your case the result will be the desired one:

"", "foo"

Dimitre Novatchev
+1. This does exactly what my solution does for XPath 1.0 - it takes the `<td>` nodes and then uses their respective text value.
Tomalak