tags:

views:

191

answers:

1

I'm trying to select all nodes with text that contain a certain word (ex: Company) because the word needs to have a register mark.

Here is part of the XHTML (this <p> is inside a table cell).

<p>
  <strong>
    <a style="color:#0E5A8B; text-decoration:none" target="_blank" href="http://www.trekk.com"&gt;
      <span class="title">
        A Company Content Title
      </span>
    </a>
    <br />
    <span style="color:#000000">
      February 23, 2010 10:00 A.M. PT<br />
    </span>
  </strong>
  Sample Content<br />
  <a style="color:#000" target="_blank" href="http://www.trekk.com"&gt;
    Register now
  </a>
</p>

I load the XHTML into a System.Xml.XmlDocument and try to select the nodes using

NewsletterHtmlDoc.SelectNodes("//*[contains(text(),'Company')]")

The resulting XmlNodeList contains 2 XmlNodes.

  1. <p> with InnerText = A Company Content Title February 23, 2010 10:00 A.M. PT Sample Content Register now

  2. <span class="title"> with InnerText = A Company Content Title

My goal is to just select the 2nd one, the <span> tag and am not sure why <p> tag is also being selected. If it selects <p>, but why wouldn't it also select the <strong> or <a>, and why not the <table> or <td> that contain the <p>?

A: 

I figured it out. The Sample Content text was part of the InnerText of the <p> element and caused the entire InnerText of the <p> element to be found. Putting Sample Content in its own element removed any immediate InnerText from the <p> element and fixed the problem.

<p>
  <strong>
    <a style="color:#0E5A8B; text-decoration:none" target="_blank" href="http://www.trekk.com"&gt;
      <span class="title">
        A Company Content Title
      </span>
    </a>
    <br />
    <span style="color:#000000">
      February 23, 2010 10:00 A.M. PT<br />
    </span>
  </strong>
  <span>
    Sample Content
  </span>
  <br />
  <a style="color:#000" target="_blank" href="http://www.trekk.com"&gt;
    Register now
  </a>
</p>
mr.moses