I am using and working on software which uses MySQL as a backend engine (it can use others such as PostgreSQL or Oracle or SQLite, but this is the main application we are using). The software was design in such way that the binary data we want to access is kept as BLOBs in individual columns (each table has one BLOB column, other columns have integers/floats to characterize the BLOB, and one string column with the BLOB's MD5 hash). The tables have typically 2, 3 or 4 indexes, one of which is always the MD5 column, which is made UNIQUE. Some tables already have millions of entries, and they have entered the multi-gigabyte in size. We keep separate per-year MySQL databases in the same server (so far). The hardware is quite reasonable (I think) for general applications (a Dell PowerEdge 2U-form server).
MySQL SELECT queries are relatively fast. There's little complaint there, since these are (most of the time) in batch mode. However, INSERT queries take a long time, which increases with table size (number of rows). Admittedly, this is because the MD5 column is of type UNIQUE and so each INSERT has to figure out whether each new row has a corresponding, already-inserted, MD5 string. And it's not too strange (I think) if the performance gets worse if there are other indexes (not unique). But I still can't put my mind to rest that this software architecture choice (I suspect keeping BLOBs in the table row instead of disk has a significant, negative impact) is not the best choice. Insertions are not critical, but it is an annoying feeling to have.
Does anyone have experience in similar situations? With MySQL, or even other (preferably Linux-based) RDBMes? Any insights you would care to provide, maybe some performance figures?
BTW, the working language is C++ (which wraps C calls to MySQL's API).