views:

741

answers:

3

I'm being asked to add queryability to a database (Oracle) filled with mostly binary data. So I need to be able to query binary ranges within a blobs of a few kilobytes. I've never done this before, so I'm wondering what are some good practices and pitfalls to consider when starting a project like this.

Thanks

+2  A: 

Without knowing your exact requirements, I can make only some general comments.

BLOBS in oracle are not the most speedy of types. Ensure that you don't build to many performance bottlenecks into your design, and do run performance tests on the functionality you do build as soon as possible to ensure it'll meet requirements.

dbms_lob is your friend. In particular you may find the read and substr function (for reading parts of the blob) out useful.

Stay away from external C style procedures - they're likely to be much to slow. PL/SQL functions tend to be much faster. I don't know about Java procedures. As the java engine is more integrated into Oracle they may be very good to use. It may be worth doing an initial proof of concept to compare PL/SQL against Java.

With Java, you'll be able to read the data in as a byte [] stream and manipulate it to your hearts content using the world f Java. External procedures for Java are easy to do - you can even just give Oracle the Java source code.

For PL/SQL, one technique we've found very useful is to convert the blob to a raw, cast that to a varchar, and then convert that to hex, then manipulating the hex (strings) with standard Oracle string functions. I.e:

create or replace function retrieve_data_from_blob (
    b blob
    , tag_code
)
as
    lw long raw;
    data varchar(30000);
    result varchar(100);
    amount pls_integer := 30000;
begin
    -- covert blob to long raw.
    -- amount will hold total bytes read.
    dbms_lob.read(b, amount, 1, lw);
    data := util_raw.rawtohex(lw);
    -- retrieve_embedded retrieves data tagged with tag_code
    -- from internal binary structure by reading hex data
    return retrieve_embedded(data, tag_code);
end;
/

This would work for blobs up to 15Kb in size. The retrieve_embedded function may, for example, be able to read the first "byte" by doing a substr(data, 1, 8), converting that to a decimal via to_number(hexdata, 'xxxxxxxx') use it as an offset... etc.

Jamie Love
+1  A: 

Storage parameters can make quite a significant difference when it comes to both storing and retrieving relatively small BLOBs (< DB_BLOCK_SIZE * 2 or so). In general, you want to minimize row migration and row chaining, as well as minimize wasted free space.

Perhaps the greatest effect on the performance is enabling or disabling 'IN ROW' storage - it's definitely worth experimenting with.

Andrew from NZSG
+3  A: 

Add a column of MD5, which is the MD5 checksum of the BLOB data. Or, you can create a new table with the same primary key and the MD5 column.

Your cache module outside the database can make use of that column not to have to retrieve the BLOB column twice in a cache-hit.

OR, you could drop the BLOB data in the database and store it in a file system with the MD5 value as a filename with an http server as a network file server.

yogman