views:

287

answers:

2

i'd like to read binary data from a blob, using the Stream interface around it.

But i don't want the blob to have to be loaded entirely client side and stored in memory, or in a file.

i want the code that uses the blob to be able to seek and read, and only as much data that is needed to support seek/read is brought over the wire.

i.e. Pretend the blob is a 250MB Photoshop image. The thumbnailer code knows how to read the first 8 bytes of an image, recognize it's a PSD file, seek to the offset that will contain the 3k thumbnail, and read that.

So rather than trying to allocate 250MB of memory, or having to create a temporary file, and having to wait for 250MB to be brought over the wire: the hypothetical SQLServerBlobStreamServerCursor class knows how to data traffic to what which is actually asked for.


Research

HOW TO: Read and Write a File to and from a BLOB Column by Using Chunking in ADO.NET and Visual Basic .NET Which talks about being able to read, and write, in chunks. But the code is unreadable being cut off like that i can't stand it. i'll look at it later.

Also this guy mentioned a new SQL Server 2005 [column.Write()]3 T-SQL syntax to write data - could be used to write data in small chunks (to avoid consuming all your server's memory). Maybe there's a [column].Read() pseudo-method

Microsoft has an article: Conserving Resources When Writing BLOB Values to SQL Server

A: 

You will want to use ADO.NET's SqlDataReader object with the SequentialAccess CommandBehavior. This will allow you to define a buffer size and read the data in chunks.

See this article: http://msdn.microsoft.com/en-us/library/87z0hy49(VS.71).aspx

byte[] outbyte;
int bufferSize = 8;
SqlDataReader myReader = myCmd.ExecuteReader(CommandBehavior.SequentialAccess);
...
long returnBytes = myReader.GetBytes(1, 0, outbyte, 0, bufferSize);
Terrapin
But with sequential access am i not forced to perform sequential access, reading data in chunks? i need to seek around, for example seek to byte offset 237,539,182, and read n bytes starting there. Then seek back y bytes, reading m bytes.
Ian Boyd
The "sequential" part of SequentialAccess is that you have to read the columns sequentially because it reads the data one column at a time instead of grabbing the whole row.
Terrapin
I don't think it will be possible to seek around in the data. I believe SQL Server will want to return the entire field.
Terrapin
+1  A: 

With newer versions of SQL Server you can just use plain SQL with the SUBSTRING() function on binary data types as well as text. See http://msdn.microsoft.com/en-us/library/ms187748.aspx

To get the size of the image: select datalength(blobcolumn) from myimages where imgid = 12345;

To read the first 8 bytes: select substring(blobcolumn, 1, 8) from myimages where imgid = 12345;

To read 877 bytes, offset 115000 through 115876: select substring(blobcolumn, 115001, 877) from myimages where imgid = 12345;

Remember that the substring function is based on a 1-offset, not 0.

If you care about the column potentially changing between reading parts of it, you can put all the different selects within a transaction.

This is untested by me, but older versions of MS SQL Server apparently require the use of the (now deprecated) READTEXT verb and TEXTPTR() function. Such as:

select textptr(blobcolumn) from myimages where imgid = 12345;

grab the returned pointer value, say PTR, and use it in the subsequent queries:

readtext blobcolumn @PTR 115001 887

Deron Meranda