views:

98

answers:

3

I need to process loaded from ResponseStream on Async callbacks XML progressively.

Reply is have:

  <root>
     <node ...>
        .....
     </node>
     <node />
     ...
  </root>

schema, and i need to have ability process <node>'s before they arrive complete.

Is there normal way to parse it using standard .NET?

A: 

Yes, there is a reader that you can use. Basicaly goes along a stream adnd throws n event for every element it identifies (element, attribute etc.).

TomTom
Any more detail than, yes, you can do this?
Oded
Yes, please describe which exact Reader is.I have tried to pass incoming into StreamReader and pull it into XmlReader from it, but it throws exceptions about not-complete XML instead of allow to read currently ready part.I know about SAX parsers, that can help be -- but that not core .NET.
datacompboy
+3  A: 
System.Xml.XmlTextReader 

"Represents a reader that provides fast, non-cached, forward-only access to XML data."

http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.aspx

Edit: This is a quick hack, but it does demonstrate that the reader is in fact lazy.

 public class XmlTextReaderTest
    {
        public void RunTest()
        {
            var fs = new XmlTextReader(new Fs(@"c:\TestXml.xml"));
            while (fs.Read())
                File.AppendAllText(@"c:\xLog.txt", "Processing node..." + Environment.NewLine);
        }
    }

    public class Fs : FileStream
    {
        public Fs(string path)
            : base(path, FileMode.Open)
        {

        }

        public override int Read(byte[] array, int offset, int count)
        {
            File.AppendAllText(@"c:\xLog.txt", "Reading from stream..." + Environment.NewLine);
            var ans = base.Read(array, offset, count);
            return ans;
        }
}
Ani
I have not found way to call it progressively. I issues exceptions if called on non-complete streamCan you describe how to use it in progressive loaded context?
datacompboy
So, how i should put readed text into MemoryStream to allow .Read() return false instead of exception while there no new info?I have read with responseStream.BeginRead() data to process, and need to have in parallel paritally decoded XML.May be i was unclear in question top -- but really, how to push new piece of data to XMLReader ?
datacompboy
Can you tell me where the data is coming from? If you are creating the data on the fly through a custom process, you may have to write a custom Stream or TextReader implementation.
Ani
Data is come from long-poll call to server.
datacompboy
What is the transport? Are you currently using the NetworkStream class?
Ani
`HttpWebRequest`
datacompboy
+1  A: 

Don't call it on async callbacks, you don't need to (trust me, this will become clearer...).

The ResponseStream will load as information is available. In the case of a small (for quite large values of "small" I'm afraid) stream that is not sent chunked this will be when the entire stream has been downloaded. However if the stream is sent with a chunked transfer-encoding (this is what happens in ASP.NET if buffering is turned off or Response.Flush() is called, other web-server technologies have their equivalents) then the stream will be available at the first chunk.

Create your XMLReader from ResponseStream when the GetResponse() has returned. It will start processing from the first chunk being available, and obtain subsequent chunks as they arrive quite transparently to your code.

Make sure that your dealing with these nodes on an as-available basis actually benefits the code further along the line. E.g. if you are outputting to console or a form, do so as each node is processed (or a small batch of nodes), whereas if you are creating objects from these nodes, then yield return them rather than building up a collection.

Now, the big thing here is clearly the matter of whether the web stream is chunked, rather than your processing code. If the producer is another party that cannot be persuaded to do this, then you will need to drop to a lower level in your processing. However if this is the case then doing so is quite likely a false optimisation as the whole processing will be done on their end before they send the first byte, and that is were the biggest delay will be. Really, if the delay to get the entire response downloaded is a problem for your code, then you need them to start sending chunked as the delay with even the most efficient approach on your part will still be too great.

For the record, I've quite recently confirmed that in such a use of XmlReader on a WebResponse dealing with chunked data (where I controlled both the client and server code, and could have both running in a debugger and check on the order of operation), the processing is indeed done as each chunk is available.

Jon Hanna
Well, so if server sent data in one _very large_ stream (non-chunked), there no way to push() data into XMLReader?I don't want to create new thread for every download stream being processed. Usage of async callbacks was OK, but it get too large delay before processing start if stream is too big.
datacompboy
You can by dropping below the level of WebResponse, but in this case if the data from the web is produced on the fly, then the bigger problem is still probably going to be the wait between sending the request and receiving the first byte of data, so I'd lobby heavily for chunked data (which would also reduce their memory overheads on their server as well).
Jon Hanna