tags:

views:

61

answers:

2

Problem: The XML file I want to retrieve has a reference to a XSLT and thus the response contains only the transformed XHTML content.

For example: URL:http://armory.wow-europe.com/arena-ladder.xml?ts=2&b=Blackout If you open up this URL with Firefox you can actually see the original XML that is retrieved from the webserver. And you can see the transformed XHTML if you take a look at the source code via firebug or another browser (f.e. Opera).

I would like to retrieve the original XML with C# and parse it. (I did some parsing with jdom in java before, something similar would be very nice)

Thanks for your help :)

+1  A: 

This site's apparently checking the user agent in the request header and serving up XML only to Firefox. So if you set the User-agent header in the request to Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11, the response will contain XML.

Robert Rossney
A: 

You can use an overriden XmlUrlResolver to get around the serverside redirection, e.g.:

using System;
using System.Collections.Generic;
using System.Text;
using System.Xml;
using System.Xml.Xsl;
using System.IO;
using System.Xml.XPath;
using System.Net;

class Program
{
    static void Main(string[] args)
    {
        var html = GetHtml(@"http://armory.wow-europe.com/arena-ladder.xml?ts=2&b=Blackout");
        Console.WriteLine(html);
    }

    public static string GetHtml(string url)
    {
        NonRedirectingXmlUrlResolver resolver = new NonRedirectingXmlUrlResolver();
        XmlReaderSettings pagesettings = new XmlReaderSettings();
        pagesettings.XmlResolver = resolver;

        XmlReader page = XmlReader.Create(url, pagesettings);
        XmlDocument doc = new XmlDocument();
        doc.Load(page);
        string xslhref = resolver.ResolveUri(new Uri(url), GetXsltHref(doc)).OriginalString;

        StringBuilder sb = new StringBuilder();
        StringWriter sw = new StringWriter(sb);
        XmlWriter writer = XmlWriter.Create(sw);
        XslCompiledTransform transform = new XslCompiledTransform();

        page = XmlReader.Create(url, pagesettings);
        transform.Load(xslhref, new XsltSettings(true, true), new XmlUrlResolver());
        transform.Transform(page, null, writer, new NonRedirectingXmlUrlResolver());
        return sb.ToString();
    }

    public static string GetXsltHref(XmlDocument doc)
    {
        XmlProcessingInstruction styleSheet = doc.SelectSingleNode("processing-instruction('xml-stylesheet')") as XmlProcessingInstruction;
        if (styleSheet == null)
            return null;
        XmlDocument pidoc = new XmlDocument();
        pidoc.LoadXml(string.Format("<xsl {0}/>", styleSheet.Data));
        return pidoc.DocumentElement.GetAttribute("href");
    }
}

public class NonRedirectingXmlUrlResolver : XmlUrlResolver
{
    public NonRedirectingXmlUrlResolver() {}

    public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
    {
        HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(absoluteUri);
        httpRequest.UserAgent = @"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";
        HttpWebResponse httpResponse = (HttpWebResponse)httpRequest.GetResponse();
            return httpResponse.GetResponseStream();
    }
}
leakyboat