tags:

views:

1525

answers:

6

I have been working with warehousing for a while now.

I am intrigued by Columnar Databases and the speed that they have to offer for data retrievals.

I have multi-part question:

  • How do Columnar Databases work?
  • How do they differ from relational databases?
  • Is there a trial version of a columnar database I can install to play around? (I am on Windows 7)
+2  A: 

Wikipedia provides a high level overview as a starting point and a list of open source and commercial column oriented databases.

Mark
+6  A: 
mjv
+1  A: 

Product information. This may help. These were to featured products on a Google search.

http://www.vertica.com/

http://www.paraccel.com/

http://www.asterdata.com/index.php

S.Lott
One thing to note about Vertica is that I have seen a reference on the web to the license cost being $150k/TB data stored. so it ain't cheap.
Mark
@Mark: Cost is not the issue. Information is the issue. If the OP wants information, vendors are often the best possible source.
S.Lott
+4  A: 

How do columnar databases work? The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:

SNO  STATUS CITY    SNAME
---  ------ ----    -----
S1       20 London  Smith
S2       10 Paris   Jones
S3       30 Paris   Blake
S4       20 London  Clark
S5       30 Athens  Adams

would be stored on disk or in memory something like:

S1S2S3S4S52010302030LondonParis Paris LondonAthensSmithJonesBlakeClarkAdams 

This is in contrast to a traditional rowstore which would store the data more like this:

S120LondonSmithS210Paris JonesS330Paris BlakeS420LondonClarkS530AthensAdams

From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.

How do they differ from relational databases? A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.

Paul Mansour
A: 

Thanks guys... especially to Paul Mansour - that explanation was just what was needed for a techie!

Azhar Chaudhary
@AzharChaudhary: Please put comments in the appropriate box for comments instead of the space for answers and upvote any answers that you like.
Raj More
+1  A: 

Also, Columnar DBs have a built in affinity for data compression, and the loading process is unique. Here's an article I wrote in 2008 that explains a bit more.

You may also be interested in a new report from IDC's Carl Olofson on 3rd generation DBMS technology. It discusses columnar, et al. If you're not an IDC client you can get it free on our site. He's doing a webinar on June 16th, too (also on our site).

(BTW, one comment above lists asterdata but I don't think they are columnar.)

kim stanick
You can get the IDC report at: http://paraccel.com/press/3rd_generation_database_technology/
kim stanick