views:

762

answers:

8

I heard my team leader say that in some past projects they had to do away with normalization to make the queries faster.

I think it may have something to do with table unions.

Is having more lean tables really less efficient than having few fat tables?

A: 

The reason why normalization has been known to hurt performance is because joins are fairly expensive. If there are N records in table X and M records in table Y, then a join of X and Y creates a temporary table with as many as N*M records. Though there are optimization tricks that the database uses to not generate the entire table if it's not needed, it nevertheless has to process all the records.

Denormalization is the process whereby you put data often used together in a single table to increase performance, at the sake of some database purity. Most find it to be an acceptable trade, even going so far as to design the schema intentionally denormalized to skip the intermediary step.

Kyle Cronin
A: 

Performance is inverse to the amount of normalization done on RDBMS. That being said, the more normal the tables are, the less likelihood there is for errors. There is a point to where a RDBMS performance can be hurt by denormalization, at the point to where all of the data is held in one table.

WolfmanDragon
Performance isn't inverse to the amount of normalization, updating and inserting of data is faster in a normalized system.
tuinstoel
+8  A: 

It depends ... joining tables is inherently slower than having one big table that is 'pre-joined' ie de-normalised. However, by denormalising you're going to create data duplication and your tables are going to be larger. Normalisation is seen as a good thing, because it creates databases that can answer 'any' question, if it is properly done you can build a select to get to your data. This is not the case in some other forms of DB, and those are now (mostly) historic irrelevancies, the normalised/relation DB won that battle.

Back to your question, using de-normalisation to make things go faster is a well accepted technique. It's normally best to run your DB for a while so you know what to de-normalise and what to leave alone, and it's also common to leave the data in its 'correct' normalised form and pull data into a set of de-normalised reporting tables on a regular basis. If that process is done as part of the report run itself then the data is always up to date too.

As an example of over-normalisation I've seen DBs in the past where the days of the week, and months of the year were pulled out into separate tables - dates themselves were normalised - you can go too far.

MrTelly
+7  A: 

You should do some research on the differences between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) databases.

In short, database systems which are concerned primarily with recording transactions (OLTP) are usually structured in a more normalized fashion, reducing data duplication and easing the creation and updating of records at the expense of optimized data retrieval.

Database systems which are more concerned with data retrieval and analysis (OLAP) are usually structured in a less normalized fashion, sacrificing data storage optimization so to maximize querying and analysis speed.

Database normalization and Denormalization are at the heart of this trade off.

Eric King
Just to emphasize, if you de-normalize, you will duplicate data which means an update to that data has to update more rows which will be slower.
Gary
Yes, indeed. Good point, Gary.
Eric King
+1  A: 

This question recurs altogether too often. The primary reason is that SQL, the most popular database language by a huge margin, and all of its most popular implementations, conflate logical table design with physical table design.

The eternal answer is that you should always normalize your logical tables, but the pragmatic answer is complicated by the fact that the only way to implement certain optimizations under existing SQL implementations is to denormalize your physical table design (itself not a bad thing) which, in those implementations, requires denormalizing your logical table design.

In short, it depends. Sometimes denormalization is important for performance, but like everything else performance-related you should measure, measure, measure before you even consider going down this route.

Doug McClean
+2  A: 

Jeff wrote about this, followed by a heated discussion. It is also subject of much discussion on SO, e.g. whats the better database design more tables or more columns. As others have pointed, use common sense and do not over-normalize.

gimel
+2  A: 

In my long experience with Oracle OLTP databases, some of them very large and busy, I can honestly say I can't remember ever having come across a case where "denormalisation for performance" was truly required. I have, however, seen many cases where someone has decided in advance that denormalisation should be applied because of their fear, uncertainty and doubt about potential performance issues. This has usually been done without any benchmarking, and invariably I find that no performance improvement has been achieved in fact - but the data maintenance code has become far more complex than it would have been.

OLAP is a very different animal, and I'm not in a position to comment about that.

Tony Andrews
A: 

About this topic: I have tables created by PowerDesigner and they are almost normalized. In many cases,my tables have simple columns, but when I need to pass all collumns of, for example, a supplier, what I do is

 create or replace function getAllSuppliersObjects RETURN FORNECTABLE AS
      suppliersList FORNECTABLE := FORNECTABLE();

    BEGIN
         FOR res IN (SELECT supplier, products FROM fornecprods) LOOP
            /*dbms_output.put_line('-----------------------');
            dbms_output.put_line('Existent suppliers: '
                                 || res.supplier.NOME_FORNECEDOR);*/
            suppliersList.extend;
            suppliersList(suppliersList.last):= res.supplier;
         END LOOP;
         return suppliersList;
    END;

I also have a table with supplier_id, supplier_object and a table of products (type prod_obj) to retrieve list of objects in a easy way. But is this faster? Should I change all my tables from something like this:

create table SUPPLIER
    (
       SUPPLIER_ID        NUMBER(6)            not null,
       SUPPLIER_NAME      VARCHAR2(100)          unique  not null,
       ADDRESS              VARCHAR2(300),
       warehouse             VARCHAR2(300),
       EMAIL                VARCHAR2(30),
       TLF                  NUMBER(30) ,
       TLM                  NUMBER(30),
       FAX                  NUMBER(30),
       constraint PK_SUPPLIER primary key (SUPPLIER_ID)
    );
     to something like that:
    create or replace type SUPPLIER_OBJ as object (
       SUPPLIER_ID        NUMBER(6) ,
       SUPPLIER_NAME      VARCHAR2(100),
       ADDRESS              VARCHAR2(300),
       warehouse             VARCHAR2(300),
       EMAIL                VARCHAR2(30),
       TLF                  NUMBER(30) ,
       TLM                  NUMBER(30),
       FAX                  NUMBER(30),
       );
    /   

    create table SUPPLIER of SUPPLIER_OBJ (constraint PK_SUPPLIER primary key (SUPPLIER_ID),SUPPLIER_NAME unique not null);
    /

My CDM is very simple, with typical relationships between suppliers, employees, clients, products and services, with strong entities being services, products, employees, suppliers and clients and some of the others being weak entities to order porducts from a supplier, etc. If this change from "normal" to "object or nested tables" doesn't pay off, how I can retrieve an object or list of them each time I do a select and want to send the resultSet to java? Do I have to iterate over each result of the select query and create, let's say, a supplier object with all column values and add it to a table os those objects?

neverMind