tags:

views:

1992

answers:

4

I use the following XPATH Query to list the object under a site. "ListObject[@Title='SomeValue']". SomeValue is dynamic. This query works as long as SomeValue does not have an apostrophe ('). I use C#. Tried using escape sequence also. Didn't work.

What am I doing wrong?

A: 

If you're not going to have any double-quotes in SomeValue, you can use escaped double-quotes to specify the value you're searching for in your XPath search string.

ListObject[@Title=\"SomeValue\"]
48klocs
That's not how you escape characters in XML.
Welbog
That's true. But an XPath query isn't XML text, and at any rate he's not escaping the quotation marks for XPath anyway, he's escaping them for C#. The actual, literal XPath is ListObject[@Title="SomeValue"]
Robert Rossney
A: 

EDIT: After a heavy unit testing session, and checking the XPath Standards, I have revised my function as follows:

public static string ToXPath(string value) {

    const string apostrophe = "'";
    const string quote = "\"";

    if(value.Contains(quote)) {
        if(value.Contains(apostrophe)) {
            throw new XPathException("Illegal XPath string literal.");
        } else {
            return apostrophe + value + apostrophe;
        }
    } else {
        return quote + value + quote;
    }
}

It appears that XPath doesn't have a character escaping system at all, it's quite primitive really. Evidently my original code only worked by coincidence. My apologies for misleading anyone!

Original answer below for reference only - please ignore

For safety, make sure that any occurrence of all 5 predefined XML entities in your XPath string are escaped, e.g.

public static string ToXPath(string value) {
    return "'" + XmlEncode(value) + "'";
}

public static string XmlEncode(string value) {
    StringBuilder text = new StringBuilder(value);
    text.Replace("&", "&");
    text.Replace("'", "'");
    text.Replace(@"""", """);
    text.Replace("<", "&lt;");
    text.Replace(">", "&gt;");
    return text.ToString();
}

I have done this before and it works fine. If it doesn't work for you, maybe there is some additional context to the problem that you need to make us aware of.

Christian Hayter
You shouldn't even have to treat XML as a plain string. Things like escaping and unescaping are abstracted away for you by the built-in XML libraries. You're reinventing the wheel here.
Welbog
If you could point me to a BCL class that abstracts away the process of building an XPath query string, I would gladly ditch these functions.
Christian Hayter
+11  A: 

This is surprisingly difficult to do.

Take a look at theXPath Recommendation, and you'll see that it defines a literal as:

Literal ::=   '"' [^"]* '"' 
            | "'" [^']* "'"

Which is to say, string literals in XPath expressions can contain apostrophes or double quotes but not both.

You can't use escaping to get around this. A literal like this:

'Some&apos;Value'

will match this XML text:

Some&amp;apos;Value

This does mean that it's possible for there to be a piece of XML text that you can't generate an XPath literal to match, e.g.:

<elm att="&quot;&apos"/>

But that doesn't mean it's impossible to match that text with XPath, it's just tricky. In any case where the value you're trying to match contains both single and double quotes, you can construct an expression that uses concat to produce the text that it's going to match:

elm[@att=concat('"', "'")]

So that leads us to this, which is a lot more complicated than I'd like it to be:

/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
/// 
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value.  If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
static string XPathLiteral(string value)
{
    // if the value contains only single or double quotes, construct
    // an XPath literal
    if (!value.Contains("\""))
    {
        return "\"" + value + "\"";
    }
    if (!value.Contains("'"))
    {
        return "'" + value + "'";
    }

    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    //    concat("foo", '"', "bar")
    StringBuilder sb = new StringBuilder();
    sb.Append("concat(");
    string[] substrings = value.Split('\"');
    for (int i = 0; i < substrings.Length; i++ )
    {
        bool needComma = (i>0);
        if (substrings[i] != "")
        {
            if (i > 0)
            {
                sb.Append(", ");
            }
            sb.Append("\"");
            sb.Append(substrings[i]);
            sb.Append("\"");
            needComma = true;
        }
        if (i < substrings.Length - 1)
        {
            if (needComma)
            {
                sb.Append(", ");                    
            }
            sb.Append("'\"'");
        }

    }
    sb.Append(")");
    return sb.ToString();
}

And yes, I tested it with all the edge cases. That's why the logic is so stupidly complex:

    foreach (string s in new[]
    {
        "foo",              // no quotes
        "\"foo",            // double quotes only
        "'foo",             // single quotes only
        "'foo\"bar",        // both; double quotes in mid-string
        "'foo\"bar\"baz",   // multiple double quotes in mid-string
        "'foo\"",           // string ends with double quotes
        "'foo\"\"",         // string ends with run of double quotes
        "\"'foo",           // string begins with double quotes
        "\"\"'foo",         // string begins with run of double quotes
        "'foo\"\"bar"       // run of double quotes in mid-string
    })
    {
        Console.Write(s);
        Console.Write(" = ");
        Console.WriteLine(XPathLiteral(s));
        XmlElement elm = d.CreateElement("test");
        d.DocumentElement.AppendChild(elm);
        elm.SetAttribute("value", s);

        string xpath = "/root/test[@value = " + XPathLiteral(s) + "]";
        if (d.SelectSingleNode(xpath) == elm)
        {
            Console.WriteLine("OK");
        }
        else
        {
            Console.WriteLine("Should have found a match for {0}, and didn't.", s);
        }
    }
    Console.ReadKey();
}
Robert Rossney
Excellent work, thanks Rob. I'll copy that for my code if you don't mind. :-)
Christian Hayter
Please do. I actually have no use for it myself; I only did this because at first I found the problem interesting and then as I dug in its difficulty started to annoy me. My ADHD is your gain.
Robert Rossney
A: 

I had this problem a while back and seemingly the simplest, but not the fastest solution is that you add a new node into the XML document that has an attribute with the value 'SomeValue', then look for that attribute value using a simple xpath search. After the you're finished with the operation, you can delete the "temporary node" from the XML document.

This way, the whole comparison happens "inside", so you don't have to construct the weird XPath query.

I seem to remember that in order to speed things up, you should be adding the temp value to the root node.

Good luck...

Gyuri
BTW, this solution might solve your problem, too, that pretty much states the same thing as you do:http://stackoverflow.com/questions/642125/encoding-xpath-expressions-with-both-single-and-double-quotes
Gyuri