tags:

views:

89

answers:

3

I wrote a custom XML reader because I needed something that would not read ahead from the source stream. I wanted the ability to have an object read its data from the stream without negatively affecting the stream for the parent object. That way, the stream can be passed down the object tree.

It's a minimal implementation, meant only to serve the purpose of the project that uses it (right now). It works well enough, except for one method -- ReadString. That method is used to read the current element's content as a string, stopping when the end element is reached. It determines this by counting nesting levels. Meanwhile, it's reading from the stream, character by character, adding to a StringBuilder for the resulting string.

For a collection element, this can take a long time. I'm sure there is much that can be done to better implement this, so this is where my continuing education begins once again. I could really use some help/guidance. Some notes about methods it calls:

Read - returns the next byte in the stream or -1.

ReadUntilChar - calls Read until the specified character or -1 is reached, appending to a string with StringBuilder.

Without further ado, here is my two-legged turtle. Constants have been replaced with the actual values.

public string ReadString() {
    int level = 0;
    long originalPosition = m_stream.Position;
    StringBuilder sb = new StringBuilder();
    sbyte read;
    try {
        // We are already within the element that contains the string.
        // Read until we reach an end element when the level == 0.
        // We want to leave the reader positioned at the end element.
        do {
            sb.Append(ReadUntilChar('<'));
            if((read = Read()) == '/') {
                // End element
                if(level == 0) {
                    // End element for the element in context, the string is complete.
                    // Replace the two bytes of the end element read.
                    m_stream.Seek(-2, System.IO.SeekOrigin.Current);
                    break;
                } else {
                    // End element for a child element.
                    // Add the two bytes read to the resulting string and continue.
                    sb.Append('<');
                    sb.Append('/');
                    level--;
                }
            } else {
                // Start element
                level++;
                sb.Append('<');
                sb.Append((char)read);
            }
        } while(read != -1);

        return sb.ToString().Trim();
    } catch {
        // Return to the original position that we started at.
        m_stream.Seek(originalPosition - m_stream.Position, System.IO.SeekOrigin.Current);
        throw;
    }
}
+1  A: 

Your implementation assumes the Stream is seekable. If it is known to be seekable, why do anything? Just create an XmlReader at your position; consume the data; ditch the reader; and seek the Stream back to where you started?

How large is the xml? You may find that throwing the data into a DOM (XmlDocument / XDocument / ec) is a viable way of getting a reader that does what you need without requiring lots of rework. In the case of XmlDocument, XmlNodeReader would suffice, for example (it would also provide xpath support if you want to use non-trivial queries).

Marc Gravell
In the case of a file stream, would I be losing performance gains by doing the repeated seeking backwards by possibly large amounts?
oakskc
@oakskc - there is only one way to find that out; try it... I wouldn't *expect* this to be *horribly* expensive, though.
Marc Gravell
A: 

Why not use an existing one, like this one?

RC
I'm trying to avoid loading a while DOM. Partly for resource reasons and partly to force myself into a different way of doing it.
oakskc
+3  A: 

Right off the bat, you should using a profiler for performance optimizations if you haven't already (I'd recommend SlimTune if you're on a budget). Without one you're just taking slightly-educated stabs in the dark.

Once you've profiled the parser you should have a good idea of where the ReadString() method is spending all its time, which will make your optimizing much easier.

One suggestion I'd make at the algorithm level is to scan the stream first, and then build the contents out: Instead of consuming each character as you see it, mark where you find <, >, and </ characters. Once you have those positions you can pull the data out of the stream in blocks rather than throwing characters into a StringBuilder one at a time. This will optimize away a significant amount of StringBuilder.Append calls, which may increase your performance (this is where profiling would help).

You may find this analysis useful for optimizing string operations, if they prove to be the source of the slowness.

But really, profile.

ShZ
+1: for profiling advice. @oakskc: Most science isn't about thinking, it's about measurement and observation (then you think). The ancient philosophers "proved" that a horse must have one foot on the ground at all times, even when galloping. In the early 19th Century they captured a runnin horse on camera and slowed the images down, and - through observation - they proved otherwise. You can't solve performance problems by ONLY thinking about them :)
Binary Worrier
+1 for recommending SlimTune. Profilers are WAY out of my budget, but SlimTune showed me that Stream.Length was a costly operation in my code. Once I stopped calling that repeatedly (my stream length doesn't change during this), it helped a lot.
oakskc