tags:

views:

362

answers:

4

Is there a way to access a XmlReader asynchronously? The xml is coming in off the network from many different clients like in XMPP; it is a constant stream of <action>...</action> tags.

What i'm after is to be able to use a BeginRead/EndRead-like interface. The best solution I've managed to come up with is to do an asynchronous read for 0 bytes on the underlying network stream, then when some data arrives, call Read on the XmlReader- this will however block until all of the data from the node becomes available. That solution looks roughly like this

private Stream syncstream;
private NetworkStream ns;
private XmlReader reader;

//this code runs first
public void Init()
{
    syncstream = Stream.Synchronized(ns);
    reader = XmlReader.Create(syncstream);
    byte[] x = new byte[1];
    syncstream.BeginRead(x, 0, 0, new AsynchronousCallback(ReadCallback), null);
}

private void ReadCallback(IAsyncResult ar)
{
    syncstream.EndRead(ar);
    reader.Read(); //this will block for a while, until the entire node is available
    //do soemthing to the xml node
    byte[] x = new byte[1];
    syncstream.BeginRead(x, 0, 0, new AsynchronousCallback(ReadCallback), null);
}

EDIT: This is a possible algorithm for working out if a string contains a complete xml node?

Func<string, bool> nodeChecker = currentBuffer =>
                {
                    //if there is nothing, definetly no tag
                    if (currentBuffer == "") return false;
                    //if we have <![CDATA[ and not ]]>, hold on, else pass it on
                    if (currentBuffer.Contains("<![CDATA[") && !currentBuffer.Contains("]]>")) return false;
                    if (currentBuffer.Contains("<![CDATA[") && currentBuffer.Contains("]]>")) return true;
                    //these tag-related things will also catch <? ?> processing instructions
                    //if there is a < but no >, we still have an open tag
                    if (currentBuffer.Contains("<") && !currentBuffer.Contains(">")) return false;
                //if there is a <...>, we have a complete element.
                //>...< will never happen because we will pass it on to the parser when we get to >
                if (currentBuffer.Contains("<") && currentBuffer.Contains(">")) return true;
                //if there is no < >, we have a complete text node
                if (!currentBuffer.Contains("<") && !currentBuffer.Contains(">")) return true;
                //> and no < will never happen, we will pass it on to the parser when we get to >
                //by default, don't block
                return false;
            };
+1  A: 

This is really tricky, because XmlReader doesn't provide any asynchronous interface.

I'm not really sure how much asynchronously does the BeginRead behave when you ask it to read 0 bytes - it might as well invoke the callback immediately and then block when you call Read. This could be the same thing as calling Read directly and then scheduling the next Read in a thread pool for example using QueueWorkItem.

It may be better to use BeginRead on the network stream to read data for example in 10kB chunks (while the system waits for the data, you wouldn't be blocking any thread). When you receive a chunk, you would copy it into some local MemoryStream and your XmlReader would be reading data from this MemoryStream.

This still has a problem though - after copying 10kB of data and calling Read several times, the last call would block. Then you would probably need to copy smaller chunks of data to unblock the pending call to Read. Once that's done, you could again start a new BeginRead call to read larger portion of data asynchronously.

Honestly, this sounds pretty complicated, so I'm quite interested if anybody comes up with a better answer. However, it gives you at least some guaranteed asynchronous operations that take some time and do not block any threads in the meantime (which is the essential goal of asynchronous programming).

(Side note: You could try using F# asynchronous workflows to write this, because they make asynchronous code a lot simpler. The technique I described will be tricky even in F# though)

Tomas Petricek
I threw together a quick test and BeginRead'ing 0 bytes is perfectly fine, the callback isn't invoked until some data is ready. I'll have a shot at your algorithm now
KJ Tsanaktsidis
Also, if i knew the message length, the problem you describe wouldn't exist, would it?
KJ Tsanaktsidis
If BeginRead makes it wait for at least some data then it's probably okay (if you're downloading small chunks). If you knew the message (one <action> item) length, then you could read exactly the ammount of bytes needed to perform the next `Read` call. But this may be still problematic (e.g. with different text encodings, etc.)
Tomas Petricek
Yeah, Ideally what i'd want is some way of buffering up received data until there's enough to read the next node without blocking.I'll post what i'm thinking as an edit to my question...
KJ Tsanaktsidis
+2  A: 

The easiest thing to do is just put it on another thread, perhaps a ThreadPool depending on how long it stays active. (Don't use thread pool threads for truly long-running tasks).

kyoryu
I thought one-thread-per-client didn't scale very well?
KJ Tsanaktsidis
It doesn't. I didn't necessarily say one thread per client :)
kyoryu
So if every client had it's own xml stream for the life of the connection, how would you avoid having each XmlReader in it's own thread?
KJ Tsanaktsidis
Job queues? How are these streams supposed to combine?
kyoryu
A: 

Are you looking for something like the XamlReader.LoadAsync method?

An asynchronous XAML load operation will initially return an object that is purely the root object. Asynchronously, XAML parsing then continues, and any child objects are filled in under the root.

Rob Fonseca-Ensor
I don't think XamlReader fires events when new nodes become available, only when it has completed loading the markup, which, in my case, would be when the connection is closed. Would be an interesting use of xaml though :P
KJ Tsanaktsidis
Thought as much. Leaving my answer up though in case it helps someone else later...
Rob Fonseca-Ensor
+1  A: 

XmlReader buffers in 4kB chunks, if I remember from when I looked in to this a couple of years ago. You could pad your inbound data to 4kB (ick!), or use a better parser. I fixed this by porting James Clark's XP (Java) to C# as a part of Jabber-Net, here:

http://code.google.com/p/jabber-net/source/browse/#svn/trunk/xpnet

It's LGPL, only handles UTF8, isn't packaged for use, and has almost no documentation, so I wouldn't recommend using it. :)

Joe Hildebrand
Could you give me a quick rundown on how to use this parser? Will multiple instances parse different sockets asynchronously without requiring their own thread? (like in xmpp?)
KJ Tsanaktsidis
See:http://code.google.com/p/jabber-net/source/browse/trunk/jabber/protocol/AsynchElementStream.csfor an example. Create a UTF8Encoding, throw bytes at it with tokenizeContent or tokenizeCdataSection, look at the tokens that come out. Where the bytes come from, and the synchronization to ensure that you aren't modifying one parser's state on different threads is up to you.If you want to do XMPP, you could just use all of Jabber-Net, and save yourself some hassle.
Joe Hildebrand
So, it would seem that the *general* solution is to find an xml parser with an interface that lets me put bytes into it myself at my leisure instead of supplying a stream. The parser will parse content as i supply it, keeping bytes that it has not parsed yet due to it not being a complete xml node. Sound about right?
KJ Tsanaktsidis
Also, it doesn't look like xpnet is LGPL; the copying.txt file pretty much says you can do whatever you like with it. Am I missing something?
KJ Tsanaktsidis
For your first question: yes. Expat is a great example of such a parser.For your second, the intent was that the C# port was LGPL, but I included James Clark's copying.txt due to this text: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."The port is close enough to the original XP code that I figured I'd keep the copying.txt file to be safe.
Joe Hildebrand
Ahk. Well, LGPL is fine for me, so I'll have a shot at using your c# port of expat. I suppose that makes this the answer i was looking for :P Thank you very much :)
KJ Tsanaktsidis