tags:

views:

44

answers:

2

I have a document which when simplified looks like this:

<?xml version="1.0"?>
<document>
    <br/>
    <div class="Heading">Introduction</div>
    <div class="Text">Sed quis malesuada ligula. Aliquam eu felis nulla, ac tempus purus.</div>
    <br/>
    <div class="Heading">Background</div>
    <div class="Text">Curabitur adipiscing tortor ipsum. In gravida congue tincidunt. Aliquam</div>
    <br/>
    <div class="Heading">Summary</div>
    <div class="Text">Pellentesque consequat scelerisque urna, sit amet consequat quam lacinia ac.</div>
    <br/>
</document>

What I would like to do is obtain the text of the introduction: "Sed .... puris.", so what I need is an xpath expression something like this:

(//div[@class="Text"])[0]/following-sibling::node(0)

Clearly this is rubbish; what I'm looking for is some expression that means "select the text of the div node that has a class of Text where the previous div node has a class of heading and the text of the previous node is Introduction".

I'm thinking about LINQ to Xml as well.

What XPath expression will do this?

+1  A: 

I think this might do it:

//div
  [@class='Text']
  [preceding-sibling::div
    [@class='Heading'
     and text() = 'Introduction']
  ]

Works for me in testing. Let me explain it. Comments begin with #.

# Select all divs
//div 
  # With class 'Text'
  [@class='Text']
  # Whose preceding div sibling
  [preceding-sibling::div
    # Has the class 'Heading'
    [@class='Heading'
    # Contains the text 'Introduction'
    and text() = 'Introduction']
  ]
Welbog
What tool are you using to test this, and is preceding-sibling an xpath 2.0-ism? I'm using Liquid XML Studio and C# .NET and I'm getting an invalid token error.
IanT8
@IanT8: I used Visual Studio 2005 (which I believe internally uses `System.Xml` stuff for its parsing), which uses XPath 1.0. Worked just fine for me.
Welbog
Ah syntax error on my part. Any idea why this expression selects all "<div class="Text">" nodes: "//div[@class='Text'][preceding-sibling::div[@class='Heading' and text() = 'Introduction']]" however, if I move the introduction div and it's text to the end of the document, it does the right thing. In the end I used Joseph's "[1]/text()" addition to get the result regardlesss of where the introduction occurs.
IanT8
+1  A: 

Here is a C# console app to get you exactly what you are looking for. Notice I leveraged Weblog's xpath but extended it to get the precise data you were looking for.

The output to the console is: Sed quis malesuada ligula. Aliquam eu felis nulla, ac tempus purus.

namespace StackOverflow
{
    class Program
    {
        static void Main(string[] args)
        {
            string xPathStatement =
                "document/div[@class='Text']" +
                "[preceding-sibling::div" +
                "[@class='Heading' and text() = 'Introduction']][1]/text()";

            string xmldoc =
                "<?xml version='1.0'?>" +
                "<document>" +
                "<br/>" +
                "<div class='Heading'>Introduction</div>" +
                "<div class='Text'>Sed quis malesuada ligula. Aliquam eu felis nulla, ac tempus purus.</div>" +
                "<br/>" +
                "<div class='Heading'>Background</div>" +
                "<div class='Text'>Curabitur adipiscing tortor ipsum. In gravida congue tincidunt. Aliquam</div>" +
                "<br/>" +
                "<div class='Heading'>Summary</div>" +
                "<div class='Text'>Pellentesque consequat scelerisque urna, sit amet consequat quam lacinia ac.</div>" +
                "<br/>" +
                "</document>";

            XPathDocument doc = new XPathDocument(new StringReader(xmldoc));

            XPathNavigator nav = doc.CreateNavigator();

            XPathNodeIterator iter = nav.Select(xPathStatement);

            if (iter.MoveNext())
            {
                Console.WriteLine(iter.Current.OuterXml);
            }
        }
    }
}
Joseph DeCarlo