tags:

views:

292

answers:

4

We are getting an XML document from a vendor that we need to perform an XSL transform on using their stylesheet so that we can convert the resulting HTML to a PDF. The actual stylesheet is referenced in an href attribute of the ?xml-stylesheet definition in the XML document. Is there any way that I can get that URL out using C#? I don't trust the vendor not to change the URL and obviously don't want to hardcode it.

The start of the XML file with the full ?xml-stylesheet element looks like this:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.fakeurl.com/StyleSheet.xsl"?&gt;
+1  A: 

Linq to xml code:

XDocument xDoc = ...;

var cssUrlQuery = from node in xDoc.Nodes()
        where node.NodeType == XmlNodeType.ProcessingInstruction
        select Regex.Match(((XProcessingInstruction)node).Data, "href=\"(?<url>.*?)\"").Groups["url"].Value;

or linq to objects

var cssUrls = (from XmlNode childNode in doc.ChildNodes
                   where childNode.NodeType == XmlNodeType.ProcessingInstruction && childNode.Name == "xml-stylesheet"
                   select (XmlProcessingInstruction) childNode
                   into procNode select Regex.Match(procNode.Data, "href=\"(?<url>.*?)\"").Groups["url"].Value).ToList();

xDoc.XPathSelectElement() will not work since it for some reasone cannot cast an XElement to XProcessingInstruction.

Mikael Svenson
I would prefer to use the DOM or LinqToXml, but the more I dig the more it looks like this might be the only option.
AJ
Yea, I've been struggling with that, too. If there were some way I could treat the ProcessingInstruction like an Element, it would be simpler.
AJ
+1  A: 
Pent Ploompuu
+2  A: 

You can also use XPath. Given an XmlDocument loaded with your source:

XmlProcessingInstruction instruction = doc.SelectSingleNode("//processing-instruction(\"xml-stylesheet\")") as XmlProcessingInstruction;
if (instruction != null) {
    Console.WriteLine(instruction.InnerText);
}

Then just parse InnerText with Regex.

Ishmael
Using this XPATH expression and you don't need to do any Regex: `translate(substring-after(processing-instruction('xml-stylesheet'),'href='),'"','')`
Mads Hansen
+1  A: 

As a processing instruction can have any contents it formally does not have any attributes. But if you know there are "pseudo" attributes, like in the case of an xml-stylesheet processing instruction, then you can of course use the value of the processing instruction to construct the markup of a single element and parse that with the XML parser:

    XmlDocument doc = new XmlDocument();
    doc.Load(@"file.xml");
    XmlNode pi = doc.SelectSingleNode("processing-instruction('xml-stylesheet')");
    if (pi != null)
    {
        XmlElement piEl = (XmlElement)doc.ReadNode(XmlReader.Create(new StringReader("<pi " + pi.Value + "/>")));
        string href = piEl.GetAttribute("href");
        Console.WriteLine(href);
    }
    else
    {
        Console.WriteLine("No pi found.");
    }
Martin Honnen