views:

351

answers:

8

It feels dirty. But maybe it isn't... is it ok to use a StringBuilder for writing XML? My gut instinct says "although this feels wrong, it's probably pretty darn performant because it's not loading extra libraries and overhead it's not doing whatever extra method calls XmlWriter invokes." It also seems like it's just less code in general. What's the benefit in XmlWriter?

Here's what it looks like. I'm building an OpenSearch XML doc based on the domain you come in from.

public void ProcessRequest(HttpContext context)
{
    context.Response.ContentType = "text/xml";

    string domain = WebUtils.ReturnParsedSourceUrl(null); //returns something like www.sample.com
    string cachedChan = context.Cache[domain + "_opensearchdescription"] as String;

    if (cachedChan == null)
    {
        StringBuilder sb = new StringBuilder();
        sb.Append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
        sb.Append("<OpenSearchDescription xmlns=\"http://a9.com/-/spec/opensearch/1.1/\" xmlns:moz=\"http://www.mozilla.org/2006/browser/search/\"&gt;");
        sb.Append("    <ShortName>Search</ShortName>");
        sb.Append("    <Description>Use " + domain + " to search.</Description>");
        sb.Append("    <Contact>[email protected]</Contact>");
        sb.Append("    <Url type=\"text/html\" method=\"get\" template=\"http://" + domain + "/Search.aspx?q={searchTerms}\" />");
        sb.Append("    <moz:SearchForm>http://" + domain + "/Search.aspx</moz:SearchForm>");
        sb.Append("    <Image height=\"16\" width=\"16\" type=\"image/x-icon\">http://" + domain + "/favicon.ico</Image>");
        sb.Append("</OpenSearchDescription>");

        cachedChan = sb.ToString();

        context.Cache.Insert(domain + "_opensearchdescription", cachedChan, null, DateTime.Now.AddDays(14), TimeSpan.Zero);
    }

    context.Response.Write(cachedChan);
}
+8  A: 

That's very wrong. Use one of the .NET APIs which understand XML to write XML.

Using a System.Xml.XmlWriter will not cause any performance problem by loading "any extra libraries".


The reason to use the XML APIs is that they understand the rules of XML. For instance, they'll know the set of characters that need to be quoted inside an element, and the different set that need to be quoted inside an attribute.

This might not be an issue in your case: maybe you're certain that domain will not have any characters in it that will need to be quoted. In any broader situation, it's best to let the XML APIs do XML - which they know how to do - so you don't have to do it yourself.


Here's an example of how easy it is to produce valid XML using LINQ to XML:

public static string MakeXml()
{
    XNamespace xmlns = "http://a9.com/-/spec/opensearch/1.1/";
    XNamespace moz = "http://www.mozilla.org/2006/browser/search/";
    string domain = "http://localhost";
    string searchTerms = "abc";
    var doc = new XDocument(
        new XDeclaration("1.0", "UTF-8", "yes"),
        new XElement(
            xmlns + "OpenSearchDescription",
            new XElement(xmlns + "ShortName", "Search"),
            new XElement(
                xmlns + "Description",
                String.Format("Use {0} to search.", domain)),
            new XElement(xmlns + "Contact", "[email protected]"),
            new XElement(
                xmlns + "Url",
                new XAttribute("type", "text/html"),
                new XAttribute("method", "get"),
                new XAttribute(
                    "template",
                    String.Format(
                        "http://{0}/Search.aspx?q={1}",
                        domain,
                        searchTerms))),
            new XElement(
                moz + "SearchForm",
                String.Format("http://{0}/Search.aspx", domain)),
            new XElement(
                xmlns + "Image",
                new XAttribute("height", 16),
                new XAttribute("width", 16),
                new XAttribute("type", "image/x-icon"),
                String.Format("http://{0}/favicon.ico", domain))));
    return doc.ToString(); // If you _must_ have a string
}
John Saunders
yes and no. this is not very wrong, although not due to loading.
Pavel Radzivilovsky
What's the benefit to refactoring to using XmlWriter for a small file that won't be changing besides "it uses the .NET APIs"? I think the code I have is short and clean, and it seems that the XmlWriter would introduce a lot of extra cruft.
Jack Lawson
John is right. Using SB class will work until you build the first invalid XML, most likely due to a missed XML escape sequence. But the issue is more subtle: the OP code is *already* wrong. It declares UTF-8 encoding but it doe snot actually use a correct UTF8 text writer. Right now you don't pass the XML to any consumer that will care to validate this, but pass the resulted string to an SQL Server XML parameter, watch the thing go kaboom.
Remus Rusanu
+1 for mentioning escapes. I prefer the System.Linq.Xml API. It's very readable and refactorable.
Jim Schubert
Alright. I think I've found my answer between these 8 answers and all the comments. Here's what my plan is: Refactor a bit and use Jim Schubert's code. However, this is absolutely the answer too, because I think that in every other case, you're absolutely right. However, I'm not worried about the value of "domain", I've already validated the XML, and this is a file that will almost certainly never change in structure. And if it does, that'll be when I refactor this to use XmlWriter, as that proves that it's more fluid than I thought.
Jack Lawson
Thanks for the example; it helps to see the code. I do agree that this is best in most cases, where the values and structure may change; but for mine, that shows all of the extra "cruft" I was trying to avoid. I only have a single dynamic value: domain. I think the LINQ to SQL is actually harder to follow; maybe it's because I don't work with XML enough, but the structure is hard to visualize. There's tabbing and whitespace, but the way the elements and attributes work feels awkward and forced; the flow isn't there. But I can promise I'll follow this pattern for more dynamic XML in the future.
Jack Lawson
+1  A: 

Well, this is subtle. Like all other optimizations in life, you break abstraction boundaries and pay the price for that, in order to gain efficiency.

From my experience, it is indeed significantly faster, not because of loading libraries of course (if anything, that would make it slower) but because it saves on string allocations. I don't remember exactly how much faster, sorry. Measuring it with a profiler will be hard because you also save on garbage collection costs.

But, don't blame me when you will have to deal with encodings and escaping, and hell knows what else, and remember to read the XML standard carefully before getting these XMLs out anywhere.

Pavel Radzivilovsky
Perhaps. It's a pretty small document, and it's not something that's going to change, though.
Jack Lawson
A: 

You could create a strongly typed object and use XmlSerialization classes to generate the xml data

Anthony Shaw
That seems like a little overkill for such a small file that won't be used elsewhere, no?
Jack Lawson
really? a down vote for that? It was an acceptable solution. Maybe not your ideal solution, but it would have worked
Anthony Shaw
A: 

Your gut is wrong. Whether you hand-write the XML or use an XmlWriter, the most efficient way to send XML to an HttpResponse would be appent text directly to the Response. Building the whole string and then sending it wastes resources.

Randolpho
is this at all possible with the page lifecycle structure of asp.net?
Pavel Radzivilovsky
@Pavel Radzivilovsky: Response.Write will break the page lifecycle and write directly to the network connection (in a streamed fashion) immediately when it's called. Not that it matters in this case, since the page lifecycle doesn't exist; he's writing a custom HTTP Handler.
Randolpho
+2  A: 

I wouldn't use StringBuilder for this, because you have to call the Append method for every line. You could use XmlWriter and that won't hurt performance.

You can reduce the amount of IL code generated by doing the following:

private const string XML_TEMPLATE = @"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<OpenSearchDescription xmlns=\"http://a9.com/-/spec/opensearch/1.1/\" xmlns:moz=\"http://www.mozilla.org/2006/browser/search/\"&gt;
    <ShortName>Search</ShortName>
    <Description>Use {0} to search.</Description>
    <Contact>[email protected]</Contact>
    <Url type=\"text/html\" method=\"get\" template=\"http://{0}/Search.aspx?q={searchTerms}\" />
    <moz:SearchForm>http://{0}/Search.aspx&lt;/moz:SearchForm&gt;
    <Image height=\"16\" width=\"16\" type=\"image/x-icon\">http://{0}/favicon.ico&lt;/Image&gt;
</OpenSearchDescription>";

And in your method:

    if (cachedChan == null)
    {
        cachedChan = String.Format(XML_TEMPLATE, domain);

        context.Cache.Insert(domain + "_opensearchdescription", 
               cachedChan, null, DateTime.Now.AddDays(14), TimeSpan.Zero);
    }

That should work well for you, because the method as you have it now will have to create a new string for every StringBuilder.Append() call, then call that method. The String.Format call only generates 17 lines of IL code, compared to StringBuilder generating 8 lines of ctor code, then 6 lines for every Append call. Although, with today's technology, an extra 50 lines of IL won't be noticeable.

Jim Schubert
I can't believe I brainfarted and didn't even think about using string replacements. Thanks :DI'm trying to walk the line between microoptimizing and maintaining readable code. I think yours hits that.
Jack Lawson
Also, I just noticed you have `{searchterms}` in brackets. For this to work as I've posted it, you'd have to change this to either not have another set of brackets, or change the String.Format to pass these brackets into the template.
Jim Schubert
Replacing {searchterms} with {1} and passing in "{searchterms}" as a parameter, correct?
Jack Lawson
@Jack: that's correct.
Jim Schubert
+1  A: 

Well, there's nothing wrong with manually writing XML strings per se, but it is far more error prone. Unless you have a compelling performance reason to do this (that is, you've measured and found that the XML formatting is a bottleneck) I'd use the XML classes instead. You'll save a lot in debugging and development time.

As an aside, why are you mixing dynamic string operations with your builder calls? Instead of:

sb.Append("    <Description>Use " + domain + " to search.</Description>"); 

try this:

sb.Append("    <Description>Use ").Append(domain).Append(" to search.</Description>");
Peter Ruderman
Lack of coffee, probably. I'll, at a minimum, follow Jim Schubert's suggestion of string replacements.I guess one other thing I was thinking about is just the extra cruft of using the XmlWriter class. It seems like there's just a lot of extra, verbose code I'd be adding in order to use XmlWriter, when I can pretty simply just dump out the XML in a string; it's small and extremely unlikely to ever change structure.
Jack Lawson
I know what you mean... it's a lot of extra typing up front. If you're sure that maintenance won't be an issue, then go ahead and use a string.
Peter Ruderman
A: 

Will the domain variables ever return "&" characters, or another character that needs to be encoded? You may want to spend the time to do defensive programming and validate your input.

bryanjonker
It won't. There will be A LOT of broken things if WebUtils.ReturnParsedSourceUrl(null) ever changes. I could check for every possible exception, but it seems a little like overkill for a small, unobtrusive page like this.
Jack Lawson
+1  A: 

Please do not use StringBuilder. Anyone who tells you that it is significantly faster hasn't presented you with any real data. The difference in speed is inconsequential, and you will have a nightmare of maintenence ahead of you.

Have a looK: http://stackoverflow.com/questions/2478427/stringbuilder-vs-xmltextwriter

Josh