tags:

views:

390

answers:

1

I'm trying to load an XHTML document into an XDocument but I'm getting "reference to undeclared entity" exceptions thrown at me. I need to resolve entities like ® and ».

I believe my document is properly formed, here is the head:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml"&gt;

When I do an XDocument.Load(<StringReader>) is when I'm getting these exceptions thrown.

+2  A: 

This is a collaboration of msdn and blog postings.

        XDocument document;

        using (var stringReader = new StringReader(output))
        {
            var settings = new XmlReaderSettings
            {
                ProhibitDtd = false,
                XmlResolver = new LocalXhtmlXmlResolver(bool.Parse(ConfigurationManager.AppSettings["CacheDTDs"]))
            };

            document = XDocument.Load(XmlReader.Create(stringReader, settings));
        }

    private class LocalXhtmlXmlResolver : XmlUrlResolver
    {
        private static readonly Dictionary<string, Uri> KnownUris = new Dictionary<string, Uri>
        {
            { "-//W3C//DTD XHTML 1.0 Strict//EN", new Uri("http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd") },
            { "-//W3C XHTML 1.0 Transitional//EN", new Uri("http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd") },
            { "-//W3C//DTD XHTML 1.0 Transitional//EN", new Uri("http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd") },
            { "-//W3C XHTML 1.0 Frameset//EN", new Uri("http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd") },
            { "-//W3C//DTD XHTML 1.1//EN", new Uri("http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd") }
        };

        private bool enableHttpCaching;
        private ICredentials credentials;

        public LocalXhtmlXmlResolver(bool enableHttpCaching)
        {
            this.enableHttpCaching = enableHttpCaching;
        }

        public override Uri ResolveUri(Uri baseUri, string relativeUri)
        {
            Debug.WriteLineIf(!KnownUris.ContainsKey(relativeUri), "Could not find: " + relativeUri);

            return KnownUris.ContainsKey(relativeUri) ? KnownUris[relativeUri] : base.ResolveUri(baseUri, relativeUri);
        }

        public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
        {
            if (absoluteUri == null)
            {
                throw new ArgumentNullException("absoluteUri");
            }

            //resolve resources from cache (if possible)
            if (absoluteUri.Scheme == "http" && this.enableHttpCaching && (ofObjectToReturn == null || ofObjectToReturn == typeof(Stream)))
            {
                var request = WebRequest.Create(absoluteUri);

                request.CachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.Default);

                if (this.credentials != null)
                {
                    request.Credentials = this.credentials;
                }

                var response = request.GetResponse();

                return response.GetResponseStream();
            }

            //otherwise use the default behavior of the XmlUrlResolver class (resolve resources from source)
            return base.GetEntity(absoluteUri, role, ofObjectToReturn);
        }
    }
Dave
Resolving DTDs from the Web is generally a bad idea - apart from the fact that you're unnecessarily hitting W3C servers with requests, this is quite slow, and relies on Internet connection be available and reliable. A much better approach is to store local copies of those DTDs as resources, and load them via `Assembly.GetManifestResourceStream`; or as local files in the same directory as your executable.
Pavel Minaev
@Pavel thanks for the insight!
Dave