views:

162

answers:

2

Hi everybody,

if I want to partially download a file and define a single range in the request Header, I get the byte-stream of the requested file in the response body.

But if i specify multiple ranges as below, I always get for each defined range an additional response header (wich describes the requested range) within the response body that corrupts the downloaded file.

static void Main(string[] args)
{

    Console.Write("Please enter target File: ");
    string Target = Console.ReadLine();
    string Source = @"http://mozilla-mirror.3347.voxcdn.com/pub/mozilla.org/firefox/releases/3.6/win32/de/Firefox%20Setup%203.6.exe";

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Source);
    request.Credentials = CredentialCache.DefaultCredentials;

    // define multiple Ranges
    request.AddRange(      0, 1000000);
    request.AddRange(1000000, 2000000);

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    using (Stream source = response.GetResponseStream())
    {
        using (FileStream target = File.Open(Target, FileMode.OpenOrCreate, FileAccess.Write, FileShare.ReadWrite))
        {
            byte[] buffer = new byte[4096];
            int BytesRead = 0;
            int TotalBytesRead = 0;

            while((BytesRead = source.Read(buffer, 0, buffer.Length)) > 0) 
            {
                target.Write(buffer, 0, BytesRead);
                TotalBytesRead += BytesRead;

                Console.WriteLine("{0}", TotalBytesRead);
            }
        }
    }

    Console.WriteLine("Downloading Finished!");
    Console.ReadLine();
}

Request as shown in Wireshark:

http://img197.imageshack.us/img197/8199/requesty.png

Response Body should only contain the Byte-Stream of the file, but additionally contains the unwanted Response-Header at the beginning of each defined Range:

http://img28.imageshack.us/img28/586/response.png

Is it possible to avoid the additional response header in the body without requesting each Range separately?

or

Is there a build-in way to filter the additional response header, whose size could vary depending on the HTTP-Server?

best regards

cap_Chap

+1  A: 

No, that's how multiple ranges in HTTP/1.1 work. See RFC 2616, Section 19.2.

Julian Reschke
A: 

Hi,

thanks for your help, as described in the link above it is the supposed way http responds to a request with multiple ranges.

so....

Is it possible to avoid the additional response header in the body without requesting each Range separately?

=> No.

Is there a build-in way to filter the additional response header, whose size could vary depending on the HTTP-Server?

=> I don't know but ...

maybe some of you could have a critical look at the following chunk of code wich filters the headers from the file data:

public void DoDownload(Range[] Ranges)
    {

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(m_Source);
        request.Credentials = CredentialCache.DefaultCredentials;

        foreach (Range r in Ranges)
        {
            request.AddRange(r.From, r.To);
        }

        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        string boundary = "";
        Match m = Regex.Match(response.ContentType, @"^.*boundary=(?<boundary>.*)$");

        if (m.Success)
        {
            boundary = m.Groups["boundary"].Value;
        }
        else
        {
            throw new InvalidDataException("invalid packet data: no boundary specification found.");
        }


        using (Stream source = response.GetResponseStream())
        {
            using (FileStream target = File.Open(m_TargetFile, FileMode.OpenOrCreate, FileAccess.Write, FileShare.ReadWrite))
            {
                // buffer for payload
                byte[] buffer = new byte[4096];
                // buffer for current range header
                byte[] header = new byte[200];
                // next header after x bytes
                int NextHeader = 0;
                // current position in header[]
                int HeaderPosition = 0;
                // current position in buffer[]
                int BufferPosition = 0;
                // left data to proceed
                int BytesToProceed = 0;
                // total data written without range-headers
                long TotalBytesWritten = 0;
                // count of last data written to target file
                int BytesWritten = 0;
                // size of processed header data
                int HeaderSize = 0;
                // count of last data read from ResponseStream
                int BytesRead = 0;


                while ((BytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
                {
                    BufferPosition = 0;
                    BytesToProceed = BytesRead;
                    HeaderSize = 0;

                    while (BytesToProceed > 0)
                    {
                        if (NextHeader == 0)
                        {
                            for (;HeaderPosition < header.Length; HeaderPosition++, BufferPosition++, HeaderSize++)
                            {
                                if (BytesToProceed > HeaderPosition && BufferPosition < BytesRead)
                                {
                                    header[HeaderPosition] = buffer[BufferPosition];

                                    if (HeaderPosition >= 4 &&
                                        header[HeaderPosition - 3] == 0x0d &&
                                        header[HeaderPosition - 2] == 0x0a &&
                                        header[HeaderPosition - 1] == 0x0d &&
                                        header[HeaderPosition] == 0x0a)
                                    {
                                        string RangeHeader = Encoding.ASCII.GetString(header, 0, HeaderPosition + 1);
                                        Match mm = Regex.Match(RangeHeader,
                                            @"^\r\n(--)?" + boundary + @".*?(?<from>\d+)\s*-\s*(?<to>\d+)/.*\r\n\r\n", RegexOptions.Singleline);

                                        if (mm.Success)
                                        {
                                            int RangeStart = Convert.ToInt32(mm.Groups["from"].Value);
                                            int RangeEnd = Convert.ToInt32(mm.Groups["to"].Value);

                                            NextHeader = (RangeEnd - RangeStart) + 1; 

                                            target.Seek(RangeStart, SeekOrigin.Begin);
                                            BufferPosition++;

                                            BytesToProceed -= HeaderSize + 1;

                                            HeaderPosition = 0;
                                            HeaderSize = 0;

                                            break;
                                        }
                                        else { throw new InvalidDataException("invalid header: missing range specification.");}
                                    }
                                }
                                else { goto READ_NEW; }
                            }

                            if (NextHeader == 0)
                                throw new InvalidDataException("invalid packet data: no range-header found.");
                        }

                        BytesWritten = (NextHeader > BytesToProceed) ? BytesToProceed : NextHeader;

                        target.Write(buffer, BufferPosition, BytesWritten);

                        BytesToProceed -= BytesWritten;
                        NextHeader -= BytesWritten;
                        BufferPosition += BytesWritten;

                        TotalBytesWritten += BytesWritten;
                    }

                READ_NEW:;
                }
            }
        }
    }

and could give me some hints if there is a another/better way to do that.

best regards cap_Chap

cap_Chap
Looks scary. Though it doesn't seem like there are any HTTP client libraries for C# that support multipart responses (probably because they're rarely used). Or maybe I didn't look very hard. Anyway, if you're stuck for an alternative, why not just make multiple HTTP requests with a single range?
Christopher