tags:

views:

115

answers:

3

Is it possible to read a file over FTP, as a System.IO.Stream ?

using (Stream s = Ftp.OpenFile(url....))
{
    s.Seek(offset, SeekOrigin.Begin);
    int n = s.Read(...);
}

and similarly, with HTTP ?

using (Stream s = Http.OpenFile(url....))
{
    s.Seek(offset, SeekOrigin.Begin);
    int n = s.Read(...);
}
+1  A: 

Yes and no.

You cannot seek in network streams, but you can open URL streams using a WebRequest/WebResponse, see the WebRequest.Create() method.

Lucero
A: 

This is actually the expected way of reading data from a WebResponse:

WebRequest request = HttpWebRequest.Create("http://example.com/file.txt");
using (WebResponse response = request.GetResponse())
{
    using (StreamReader reader = new
        StreamReader(response.GetResponseStream()))
    {
        // Read the stream here
    }
}

You don't need (and in fact can't use) the Seek method unless you wrap the stream with a buffered stream. As you might expect with a network stream - once the bytes are transmitted, you can't go back and "reread" them unless you've already saved them in memory or on disk. But in most cases you'll want to use the StreamReader anyway.

For FTP it's exactly the same, but using FtpWebRequest instead of HttpWebRequest. Both return a WebResponse from the GetResponse method.

Certain FTP servers also support the REST (restart) command, which would start the transfer from a certain byte offset in the file - there's a post here about downloading a partial file that way (i.e. resuming a broken transfer). If you want to do this for HTTP, you need to use the HttpWebRequest.AddRange method to set the Range header (HTTP 1.1 only).


Here is an example of a wrapper you could use to do this for HTTP:

public class RangedHttpWebStream : Stream
{
    private Stream realStream;
    private long startPosition;
    private long? requestedLength;
    private HttpWebRequest request;

    public RangedHttpWebStream(HttpWebRequest request)
    {
        if (request == null)
            throw new ArgumentNullException("request");
        this.request = request;
    }

    public override bool CanRead
    {
        get { return true; }
    }

    public override bool CanSeek
    {
        get { return (realStream == null); }
    }

    public override bool CanWrite
    {
        get { return false; }
    }

    public override void Flush()
    {
    }

    public override long Length
    {
        get { return requestedLength ?? -1; }
    }

    public override long Position
    {
        get { return startPosition; }
        set { Seek(value, SeekOrigin.Begin); }
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        if (realStream == null)
        {
            UpdateRange();
            WebResponse response = request.GetResponse();
            realStream = response.GetResponseStream();
        }
        return realStream.Read(buffer, offset, count);
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        if (realStream != null)
            throw new InvalidOperationException("Seek cannot be performed " +
                "once reading has started.");
        switch (origin)
        {
            case SeekOrigin.Begin:
                startPosition = offset;
                break;
            case SeekOrigin.Current:
                startPosition += offset;
                break;
            default:
                throw new NotSupportedException("Seek can only be performed " +
                    "from the beginning of the stream or current position.");
        }
        return startPosition;
    }

    public override void SetLength(long value)
    {
        if (value < 0)
            throw new ArgumentOutOfRangeException("Parameter 'value' " +
                "cannot be less than zero.");
        if (value > Int32.MaxValue)
            throw new ArgumentOutOfRangeException("Parameter 'value' " +
                "cannot be greater than Int32.MaxValue.");
        requestedLength = value;
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        throw new NotSupportedException("The stream does not support writing.");
    }

    protected override void Dispose(bool disposing)
    {
        if ((disposing) && (realStream != null))
            realStream.Dispose();
        base.Dispose(disposing);
    }

    private void UpdateRange()
    {
        if (startPosition < 0)
            throw new IOException("Attempted to seek before " +
                "beginning of stream.");
        if (startPosition > Int32.MaxValue)
            throw new IOException("Attempted to seek past Int32.MaxValue.  " +
                "This is invalid for an HTTP stream.");
        if (requestedLength != null)
        {
            long endPosition = startPosition + requestedLength.Value;
            if (endPosition > Int32.MaxValue)
                throw new IOException("Attempted to read past " +
                    "Int32.MaxValue.  This is invalid for an HTTP stream.");
            request.AddRange((int)startPosition, (int)endPosition);
        }
        else
        {
            request.AddRange((int)-startPosition);
        }
    }
}

The vast majority of this is just error-checking - making sure that the seek and length offsets specify a valid range and that no further seeks/ranges are attempted after the request is actually sent.

This operates on an HttpWebRequest, so to make it easier to use you can write an extension method:

public static class HttpExtensions
{
    public static Stream GetSmartStream(this HttpWebRequest request)
    {
        return new RangedHttpWebStream(request);
    }
}

Test program (actually tested) looks like this:

static void Main(string[] args)
{
    var request = (HttpWebRequest)HttpWebRequest.Create(
        "http://localhost/test.txt");
    using (Stream stream = request.GetSmartStream())
    {
        stream.Seek(20, SeekOrigin.Begin);
        stream.Seek(1, SeekOrigin.Current);
        stream.SetLength(100);
        using (StreamReader reader = new StreamReader(stream))
        {
            string content = reader.ReadToEnd();
            Console.Write(content);
        }
    }
    Console.ReadLine();
}

It's mainly a copy-paste job to do this for FTP. The relevant request property to modify is FtpWebRequest.ContentOffset. Unlike HTTP, you can't set an end offset so you'll have to change the SetLength property to throw a NotSupportedException.

Aaronaught
Thanks for the answer. I don't agree that reading the data and dumping it is the same as a Seek(). Ideally a seek doesn't read anything, it just moves the stream position, so that subsequent calls to Read() start at the desired spot.
Cheeso
@Cheeso: For persistent data it is of course not the same; for a network stream it effectively is because the bits are transient data. The only exceptions are as I mentioned above, (a) if you buffer it (in which case it's your buffer that's seekable), or (b) if the underlying protocol exposes a specific method of skipping the start bytes (FTP `REST` or HTTP `Range`). Once the data transfer has started, all you really have is a socket, and the socket has no knowledge of what's come before or what is coming next.
Aaronaught
I understand. That's what I was after: Stream abstractions that take advantage of FTP/REST and HTTP/Range in order to provide efficiency.
Cheeso
@Cheeso: I see what you mean, however, such an abstraction would only be able to seek one time, since that's how the underlying protocols work. And in both cases, the underlying "seeks" need to be triggered before the stream is ever created; in the case of HTTP it's actually part of the *request*, and in the case of FTP it's a command issued on a different port/socket before the data transfer (and its corresponding stream) are created. It *would* be nice to be able to do what you suggest; I just don't think it's feasible given the order of operations inherent in these protocols.
Aaronaught
I suppose it wouldn't be impossible to implement a "smart stream" that uses some sort of lazy-loading of the "real" stream - it would just have to be *very* well-documented because its actual behaviour would likely be somewhat unintuitive. Now that I understand the question I could probably write one, but unfortunately my time is up today. :P (If there are no other answers when I get back, I'll see what I can do...)
Aaronaught
A: 

As an exercise, I wrote some examples:

  • FtpStream
    Reads a file over FTP, as a stream. Contiguous reads use a single data connection to the server. If you call Seek(), that connection is dropped and a new one is created.

  • HttpStream
    Reads a resource over HTTP (GET), as a stream. Supports Seek using the Range header. Each read is a new HTTP GET. This one should be optimized to avoid a new connection on a contiguous read.

Cheeso