views:

336

answers:

2

I'm running into an issue where reading from a HttpResponseStream fails because the StreamReader that I'm wrapping around in reads faster that the Response Stream gets the actual response. I'm retrieving a reasonably small sized file (around 60k) but the Parser which processes the response into an actual object fails because it hits an unexpected character (Code 65535) which from experience I know to be the character produced when you read from a StreamReader and there are no further characters available.

For the record I know that the content being returned is valid and will parse correctly since the failure occurs at different points in the file each time I run the code. It's the parser.Load() line in the following where it fails.

Is there a way to ensure I've read all the content before attempting to parse it short of copying the response stream into a MemoryStream or string and then processing it?

    /// <summary>
    /// Makes a Query where the expected Result is an RDF Graph ie. CONSTRUCT and DESCRIBE Queries
    /// </summary>
    /// <param name="sparqlQuery">SPARQL Query String</param>
    /// <returns>RDF Graph</returns>
    public Graph QueryWithResultGraph(String sparqlQuery)
    {
        try
        {
            //Build the Query URI
            StringBuilder queryUri = new StringBuilder();
            queryUri.Append(this._endpoint.ToString());
            queryUri.Append("?query=");
            queryUri.Append(Uri.EscapeDataString(sparqlQuery));

            if (!this._defaultGraphUri.Equals(String.Empty))
            {
                queryUri.Append("&default-graph-uri=");
                queryUri.Append(Uri.EscapeUriString(this._defaultGraphUri));
            }

            //Make the Query via HTTP
            HttpWebResponse httpResponse = this.DoQuery(new Uri(queryUri.ToString()),false);

            //Set up an Empty Graph ready
            Graph g = new Graph();
            g.BaseURI = this._endpoint;

            //Parse into a Graph based on Content Type
            String ctype = httpResponse.ContentType;
            IRDFReader parser = MIMETypesHelper.GetParser(ctype);
            parser.Load(g, new StreamReader(httpResponse.GetResponseStream()));

            return g;
        }
        catch (UriFormatException uriEx)
        {
            //URI Format Invalid
            throw new Exception("The format of the URI was invalid", uriEx);
        }
        catch (WebException webEx)
        {
            //Some sort of HTTP Error occurred
            throw new Exception("A HTTP Error occurred", webEx);
        }
        catch (RDFException)
        {
            //Some problem with the RDF or Parsing thereof
            throw;
        }
        catch (Exception)
        {
            //Other Exception
            throw;
        }
    }

    /// <summary>
    /// Internal Helper Method which executes the HTTP Requests against the SPARQL Endpoint
    /// </summary>
    /// <param name="target">URI to make Request to</param>
    /// <param name="sparqlOnly">Indicates if only SPARQL Result Sets should be accepted</param>
    /// <returns>HTTP Response</returns>
    private HttpWebResponse DoQuery(Uri target, bool sparqlOnly)
    {
        //Expect errors in this function to be handled by the calling function

        //Set-up the Request
        HttpWebRequest httpRequest;
        HttpWebResponse httpResponse;
        httpRequest = (HttpWebRequest)WebRequest.Create(target);

        //Use HTTP GET/POST according to user set preference
        if (!sparqlOnly)
        {
            httpRequest.Accept = MIMETypesHelper.HTTPAcceptHeader();
            //For the time being drop the application/json as this doesn't play nice with Virtuoso
            httpRequest.Accept = httpRequest.Accept.Replace("," + MIMETypesHelper.JSON[0], String.Empty);
        }
        else
        {
            httpRequest.Accept = MIMETypesHelper.HTTPSPARQLAcceptHeader();
        }
        httpRequest.Method = this._httpMode;
        httpRequest.Timeout = this._timeout;

        //HTTP Debugging
        if (Options.HTTPDebugging)
        {
            Tools.HTTPDebugRequest(httpRequest);
        }

        httpResponse = (HttpWebResponse)httpRequest.GetResponse();

        //HTTP Debugging
        if (Options.HTTPDebugging)
        {
            Tools.HTTPDebugResponse(httpResponse);
        }

        return httpResponse;
    }

Edit

To clarify what I already stated this is not a bug in the Parser, this is an issue of the StreamReader reading faster than the Response Stream provides data. I can get around this by doing the following but would like suggestions of better or more elegant solutions:

            //Parse into a Graph based on Content Type
            String ctype = httpResponse.ContentType;
            IRDFReader parser = MIMETypesHelper.GetParser(ctype);
            Stream response = httpResponse.GetResponseStream();
            MemoryStream temp = new MemoryStream();
            Tools.StreamCopy(response, temp);
            response.Close();
            temp.Seek(0, SeekOrigin.Begin);
            parser.Load(g, new StreamReader(temp));

Edit 2

BlockingStreamReader class as per Eamon's suggestion:

/// <summary>
/// A wrapper to a Stream which does all its Read() and Peek() calls using ReadBlock() to handle slow underlying streams (eg Network Streams)
/// </summary>
public sealed class BlockingStreamReader : StreamReader
{
    private bool _peeked = false;
    private int _peekChar = -1;

    public BlockingStreamReader(StreamReader reader) : base(reader.BaseStream) { }

    public BlockingStreamReader(Stream stream) : base(stream) { }

    public override int Read()
    {
        if (this._peeked)
        {
            this._peeked = false;
            return this._peekChar;
        }
        else
        {
            if (this.EndOfStream) return -1;

            char[] cs = new char[1];
            base.ReadBlock(cs, 0, 1);

            return cs[0];
        }
    }

    public override int Peek()
    {
        if (this._peeked)
        {
            return this._peekChar;
        }
        else
        {
            if (this.EndOfStream) return -1;

            this._peeked = true;

            char[] cs = new char[1];
            base.ReadBlock(cs, 0, 1);

            this._peekChar = cs[0];
            return this._peekChar;
        }
    }

    public new bool EndOfStream
    {
        get
        {
            return (base.EndOfStream && !this._peeked);
        }
    }
}
+1  A: 
Eamon Nerbonne
I was aware of that, I'm not sure that it necessary counts as a bug in StreamReader more just appears to be how it behaves when the underlying stream may be slow. The Parser is not the issue, if I use the second code fragment (added to the original question) which reads the entire Stream before parsing it parses fine
RobV
This _is_ a bug in the parser with very high likelihood. It is by design that if the underlying stream is "slow", streamreader returns fewer characters than requested. Using a memorystream as an underlying stream causes streamreader to always return the full number of characters - working around the bug in the parser.
Eamon Nerbonne
The parser uses an underlying tokeniser which reads character by character using the Read() method hence you are most likely right, I'll test the ReadBlock() thing and accept your answer if that proves to solve the issue
RobV
The ReadBlock() method proves to not solve my issue entirely since even if I use it I still need to do a lot of calls to Peek() which runs into the same issue as Read()
RobV
Then you have three options: (1) just precache the entire stream in a memorystream; (2) implement your own TextReader subclass wraps another TextReader and blocks on Peek() and Read() (this is actually quite simple; you only need to implement Peek+Read in terms of ReadBlock), or (3) replace calls to Peek with a local one-character lookahead char filled using ReadBlock (which you'll need to ensure is then manually included the next time you read.I'd prefer option (2).
Eamon Nerbonne
(oh and if you subclass - don't forget that TextReader's are IDisposable)
Eamon Nerbonne
option (2) is rather elegant and exactly the kind of thing I was hoping for, there's a slight complication with the fact that you can't override EndOfStream and have to shadow it instead which means where you need the StreamReader to be blocking you have to ensure you've typed it as the subclass rather than StreamReader or you'll hit end of stream prematurely. I've posted my code as an edit to the question
RobV
A: 

To support a blocking read scenario, rather than subclassing StreamReader, you can subclass TextReader: this avoids issues with EndOfStream, and it means you can make any reader blocking - not just StreamReaders:

public sealed class BlockingReader : TextReader
{
 bool hasPeeked;
 int peekChar;
 readonly TextReader reader;

 public BlockingReader(TextReader reader) { this.reader = reader; }

 public override int Read()
 {
  if (!hasPeeked)
   return reader.Read();
  hasPeeked = false;
  return peekChar;
 }

 public override int Peek()
 {
  if (!hasPeeked)
  {
   peekChar = reader.Read();
   hasPeeked = true;
  }
  return peekChar;
 }

 public override int Read(char[] buffer, int index, int count)
 {
  if (buffer == null)
   throw new ArgumentNullException("buffer");
  if (index < 0)
   throw new ArgumentOutOfRangeException("index");
  if (count < 0)
   throw new ArgumentOutOfRangeException("count");
  if ((buffer.Length - index) < count)
   throw new ArgumentException("Buffer too small");

  int peekCharsRead = 0;
  if (hasPeeked)
  {
   buffer[index] = (char)peekChar;
   hasPeeked = false;
   index++;
   count--;
   peekCharsRead++;
  }

  return peekCharsRead + reader.ReadBlock(buffer, index, count);
 }

 protected override void Dispose(bool disposing)
 {
  try
  {
   if (disposing)
    reader.Dispose();
  }
  finally
  {
   base.Dispose(disposing);
  }
 }
}
Eamon Nerbonne