views:

1590

answers:

3

What is your advice on configuring PostgreSQL such that BLOB can be written quickly?

We are using PostgreSQL to insert BLOB at a high rate.

  1. We call a lo_write() roughly 220 times a second.
  2. We write roughly 30KB of binary data per lo_write().
  3. This is equivalent to roughly 6 MB/s.

Our computer is RAID-5, so the write speed is in the neighborhood of 200 MB/s.

We have already tuned the postgresql.conf to have the following:

  1. we changed shared_buffers = 1GB
  2. we turned off fsync
  3. logging_collector = off (pretty much everything related to logging is off)

We have made sure that if we don't store BLOB as part of our INSERT query, then PgSql keeps up fine. It's only when we store BLOB as part of our query that it slows down.

EDIT: I'm using Windows XP/Server. I'm using Pgsql 8.3 with PostGIS 1.3.6. The reason why I needed to store BLOB in the DB is because my application requires me to search for these BLOBs in real-time.

Background: Our application is high-performance real-time signal processing, where we store our signals into the database as BLOB.

EDIT: This is the C++ code that we used to perform the benchmark. Apparently, we are getting about 16 MB/s on our RAID config.

#include "libpq-fe.h"
#include "libpq/libpq-fs.h"
#include <sstream>
#include <iostream>
#include <tbb/tick_count.h>

void test()
{
  std::stringstream stst;
  stst << "user=postgres password=1234 dbname=test_db";
  PGconn* conn = PQconnectdb(stst.str().c_str());
  if (PQstatus(conn) != CONNECTION_OK)
  {
    throw std::exception("failed to connect To DB engine: ");
  }

  const int dataSize = 18512;
  char* data = (char*) malloc( dataSize );


  const int count = 10000;

  tbb::tick_count t0 = tbb::tick_count::now();
  for( int idx = 0; idx < count; idx++ )
  {
    // Insert acoustic binary into large object table "pg_largeobject" 
    Oid objectId;
    int lobj_fd;
    PGresult *res;

    res = PQexec(conn, "begin");
    PQclear(res);

    objectId = lo_creat(conn, INV_READ|INV_WRITE);
    if (objectId == 0)
    {
      throw std::exception("AddAcousticTable: Cannot create large object\n");
    }

    lobj_fd = lo_open(conn, objectId, INV_WRITE);

    const unsigned int writeBytes = lo_write(conn, lobj_fd, data, dataSize );
    if (writeBytes != dataSize )
    {
      std::stringstream msg;
      msg << "PsSasDataDB::AddToAcousticTable(): Incorrect number of bytes written for large object ";
      msg << writeBytes;
      throw std::exception(msg.str().c_str());
    }

    lo_close(conn, lobj_fd);

    res = PQexec(conn, "end");
    PQclear(res);
  }
  tbb::tick_count t1 = tbb::tick_count::now();

  std::cout << "Performance: " << (double(count*dataSize)/(t1-t0).seconds())/(1024.0*1024.0) << " MB/seconds";

  free( data );
}
int main()
{
  try
  {
    test();
  }
  catch( std::exception e )
  {
    std::cerr << e.what();
  }

  return 0;
}
+2  A: 

RAID5 :

It is great for reads, for writing lots of big data, and for cost. It sucks for small sandom writes.

Since you probably do not use your DB to store ONLY big blobs, you do also some database writing / updating / inserting, RAID5 is going to be a real PITA.

  • You INSERT or UPDATE a row in a table. -> a page in the table needs to be written -> for each index, at least one index page needs to be updated -> at commit, the log needs to be written, flushed to disk, and synced.

Each of those small 1 page (8kB) writes is smaller than your RAID5 stripe size, so the RAID controller will need to seek several (or all) of your RAID disks, read several stripes, and re-write the updated stripe and parity. For syncing, you gotta wait for ALL disks to sync, during which time they are not servicing any other requests... some RAID controllers are also particularly bad at doing this fast, it depends on your hardware.

-> for high random write throughput, it is a good idea to use RAID 1 (or 10 or 01) for data, and an extra RAID1 volume for the log on 2 separate disks.

So, if you have 3 disks in your RAID5, remove one, put data on a disk and log on the other, and benchmark. If it is good, put in an extra disk, and make 2x RAID1 volumes (log, data).

If your load was read-heavy instead of write-heavy, for the same budget (number of disks) it would be better to put the log on the same volume as the data and RAID10 it all.


The reason why I needed to store BLOB in the DB is because my application requires me to search for these BLOBs in real-time.

You put BLOBs in the database if you want to use stuff that the database does well (like transactions, security, or putting everything in 1 server available from anywhere, coherent backups, no headaches, etc).

If you want to do searches, you are not using the BLOBs for searching, you are using simple database tables and SQL queries to obtain the reference to the BLOB you want. Then, you get the BLOB, but you could as easily get a filesystem file instead. So, in your application, you could also probably use filesystem files instead, which would be a lot faster.

peufeu
by storing binary data to the disk, doesnt the data then become separated from the database, which would lead to maintenance problem? i.e. when a backup is being done, if a file is lost, etc...then we would not be able to retrieve that file. what's your thoughts?
ShaChris23
Who cares if the data is 'separated'? Just make sure you back up whatever directory you dump your data in; if you already have a cohesive backup system, adding one more directory to it should be very easy. BLOBs are typically more trouble than they're worth, and using the filesystem for them instead of the database should make lots of things *much* easier.
kquinn
where you can have a problem with backups is consistency...say you backup the files, then the DB, but a user added a file at the same time, so it is in the DB, but the file backup missed it...solution is to backup the DB first, then the files.Also only COMMIT the DB record after writing the filesystem files, and delete the file after COMMITing the delete from the DB.If an INSERT fails you might get files without DB records (not a problem : since the insert failed, the application knows it failed), but you need some weekly cron job to check it out and clean (or not).
peufeu
also you must never reuse a filename. postgresql makes this easy : since you use a SEQUENCE as your primary key, it gives you a nice source of unique non-repeating filenames. So if a record gets deleted, and another one inserted, the filenames will be different, and a concurrent backup won't mix them up.
peufeu
+1  A: 

Don't give up on RAID5. Modern RAID systems can give you 300-350 MB/s in a RAID-5 or RAID-6.

I've used pg_largeobject to store files before too. It only led to backup and OID headaches, though that was PostgreSQL 7.2/7.3.

I agree with the comments above though that storing the file on the file system and, if you must, putting a function-view frontend on it would be more efficient. An additional advantage of a function-view combo to store-fetch the files could be different storage types such as local file system, a grid file system, cloud, and so on.

That you can get 300+ MB/s of sequential performance isn't relevant when the concerns about RAID5 relate to its random performance.
Greg Smith
+2  A: 

The biggest win for increasing write performance on the database side it set checkpoint_segments to a much higher number. The default is 3, you want 30 or more at a minimum. The database logs on your system are likely filled with warnings about this if you look at them.

Beyond that, getting good write performance turns into more of a disk hardware problem than anything else. What RAID controller are you using? Is it setup optimally, with a working battery-backed write cache? That's the most important thing to get right--if you do that, RAID5 might be tolerable for this sort of situation, because the controller cache will coalesce some of the random writes.

P.S. Don't turn fsync off, that's dangerous. Instead you should use asynchronous commit, which accomplishes the same basic function (cut down on the number of disk flushes) but in a much safer way.

Greg Smith