views:

587

answers:

7

We bought an "off the shelf" application a lonnng time ago that is capable of storing files as a blob within SQL Server. We've noticed that the database has more than doubled in size within the past six months due to more frequent usage of this blob field by one department. As a result, the application has become painfully slow.

I've been tasked with removing the blob field from the database and saving the file onto the actual file system. Unfortunately, the application does not store what the file type is within the database. Although I can read the file as it exists in the database, I don't know what extension to save the file as. The application's support desk no longer supports this version of the software and will not talk to us about extracting the data. Unfortunately, we do not have access to their source code.

Any suggestions would be greatly appreciated! Thanks in advance!

+1  A: 

you can look at the first few bytes and figure it out for the most common file types

http://www.garykessler.net/library/file_sigs.html

BlackTigerX
+2  A: 

Don't save it as any type. Save it as a file with no extension. If you don't know what it is, don't fake it. If the app that saved it requests it, return it from the filesystem the same way it would be returned from the database; as binary data. The database doesn't care what type of data the Binary Object is; neither should you.

McWafflestix
+2  A: 

You might try using TriD http://mark0.net/soft-trid-e.html

It will scan the files and try to determine the extension.

Rob Boek
Or use the windows port of the linx file commnad http://downloads.sourceforge.net/gnuwin32/file-5.03-setup.exe
jitter
A: 

How do you know that this information is human readable?

files extensions purposes are to associate a particular software for opening the file so that when you double click on a .txt file, notepad.exe handles it.

If you're only writing them to the filesystem for the sake of offloading the database, then there is no need for any specific extension. If you want to associate a software with it, then use the appropriate extension

Eric
+1  A: 

You could use the FindMimeFromData() function in UrlMon.dll (using pinvoke).

See this page for an example and this MSDN page for the documentation of the function.

M4N
A: 

Another option would be to physically partition your database. You could store the table that contains the blob field in a different database file that could be stored on a different hard disk/spindle. The disk that contains the blob table could be further optimized by making it RAID 0 or RAID 10.

This could speed things up. Also, this approach eliminates the need to change the application logic.

Matt Spradley
A: 

In your words "... I've been tasked with ..." - are you sure this is not a problem that can be solved by performing some optimisation on the database? Going down the path of hacking a legacy application for which you don't have the source code and for which there is no support is something that you want to avoid as much as possible.

So, look at the problem here - slowness of the application. So why is it slow? Something a database index couldn't solve?

If you have to extract the BLOBs to the filesystem, then why is the format of the file relevant? Surely the application is designed to do a query on the database in order to get the file anyway--how would extracting the BLOBs to the filesystem help you?

As an aside though - I usually use the Unix file command to identify files through magic numbers.

Eric Smith