tags:

views:

432

answers:

2

I have piece of HTML like this:

<dt>name</dt>
<dd>value</dd>
<dt>name2</dt>
<dd>value2</dd>

I want to find all places, when the structure is incorrect, meaning there os no dd tag after dt tag.

I tried this:

//dt/following-sibling::dt

but this doesn't work. Any suggestions?

+4  A: 

EDIT as noted by @Gaim, my original version failed to capture a terminal dt

string xml = @"
    <root>
    <dt>name</dt>
    <dd>value</dd>
    <dt>name2</dt>
    <dt>name3</dt>
    <dd>value3</dd>
    <dt>name4</dt>
    <dt>name5</dt>
    <dd>value5</dd>
    <dt>name6</dt>
    </root>
    ";

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);

XmlNodeList nodes = 
    doc.SelectNodes("//dt[not(following-sibling::*[1][self::dd])]");

foreach (XmlNode node in nodes)
{
    Console.WriteLine(node.OuterXml);
}

Console.ReadLine();

Output is those dt nodes that do not have a dd immediately following them:

<dt>name2</dt>
<dt>name4</dt>
<dt>name6</dt>

What we are doing here is saying:

//dt

All dt nodes, anywhere....

[not(following-sibling::*[1]

....such that it's not the case that their first following sibling (whatever it is called)....

[self::dd]]

...is called dd.

AakashM
+1 -- The XPath expression can be molten down to `//dt[following-sibling::*[1][self::dt]]`
Tomalak
@Tomalak Your XPath doesn't match all cases, look at my answer, you match only the first.
Gaim
@Gaim: You are right. The `not()` approach is the correct one, I did not think about the case where a `<dt>` is last sibling.
Tomalak
+3  A: 

I am not sure that I understand you but there is my solution. This XPath matches ALL <dt> which are not followed by <dd> directly. So There is test structure

<xml>
  <dt>name</dt> <!-- match -->

  <dt>name2</dt>
  <dd>value2</dd>

  <dt>name</dt>
  <dd>value</dd>

  <dt>name2</dt>  <!-- match -->
</xml>

There is the XPath

//dt[ name( following-sibling::*[1] ) != 'dd' ]

or

//dt[  not( following-sibling::*[1]/self::dd ) ]

they do same thing

Gaim
+1 better than my original, which failed to capture a terminal dd-lacking dt
AakashM