views:

523

answers:

8

Hi guys, i have a doubt (big).

Lets take as example a database for, dunno, a whatever company's orders.

Lets say that this company make around 2000 orders per month, so, around 24K order per year, and they dont ant to delete any order, even if 5 years old (hey, this is an example, numbers dont mean anything:)

In the meaning of have a good database query speed, its better have just one table, or will be faster having a table for every year?

My idea was to create a new table for the orders each year, calling such orders_2008, orders_2009, etc..

Can be a good idea to speed up db queryes?

usually the data that are used are those of the current year, so there are less lines the better is.. Obviously, this would give problems when I search in all the tables of the orders simultaneously, becose should I will to run some complex UNION .. but this happens in the normal activities very rare.

I think is better to have an application that for 95% of the query is fast and the remaining somewhat slow, rather than an application that is always slow.

My actual database is on 130 tables, the new version of my application should have about 200-220 tables.. of which about 40% will be replicated annually.

Any suggestion?

EDIT: the RDBMS will be probably Postgresql, maybe (hope not) Mysql

p.s: i home my english is good enaught to explain my needs.. and the one who will edit that will be plenty ;)

A: 

I think is better to have an application that for 95% of the query is fast and the remaining somewhat slow, rather than an application that is always slow.

Absolutely.

User
Why that downvote without explanation?
DaNieL
+7  A: 

Smaller tables are faster. Period.

If you have history that is rarely used, then getting the history into other tables will be faster.

This is what a data warehouse is about -- separate operational data from historical data.

You can run a periodic extract from operational and a load to historical. All the data is kept, it's just segregated.

S.Lott
+1  A: 

I would not split tables by year.

Instead I would archive data to a reporting database every year, and use that when needed.

Alternatively you could partition the data, amongst drives, thus maintaining performance, although i'm unsure if this is possible in postgresql.

Bravax
+2  A: 

If you use indexes properly, you probably need not split it into multiple tables. Most modern DBs will optimize access.

Another option you might consider is to have a table for the current year, and at the end append the data to another table which has data for all the previous years. ?

Abhinav
Yes, to have just 2 tables (current year and history) was my first idea, but i thought that the year system wuold be faster - just need a better code-writing
DaNieL
A: 

I agree that smaller tables are faster. But it depends on your business logic if it makes sense to split a single entity over multiple tables. If you need a lot of code to manage all the tables than it might not be a good idea.

It also depends on the database what logic you're able to use to tackle this problem. In Oracle a table can be partitioned (on year for example). Data is stored physically in different table spaces which should make it faster to address (as I would assume that all data of a single year is stored together)

An index will speed things up but if the data is scattered across the disk than a load of block reads are required which can make it slow.

Janco
I'll run on postgresql or.. mysql (probably postgresql)
DaNieL
+4  A: 

Before you worry about query speed, consider the costs.

If you split the code into separate code, you will have to have code that handles it. Every bit of code you write has the chance to be wrong. You are asking for your code to be buggy at the expense of some unmeasured and imagined performance win.

Also consider the cost of machine time vs. programmer time.

Andy Lester
Agree but - my application handle many companyes - everyone have his own database - so, i prefer to spend much time to write the handler code, but have a beep-beep database after :)
DaNieL
+1  A: 

For the volume of data you're looking at splitting the data seems like a lot of trouble for little gain. Postgres can do partitioning, but the fine manual [1] says that as a rule of thumb you should probably only consider it for tables that exceed the physical memory of the server. In my experience, that's at least a million rows.

  1. http://www.postgresql.org/docs/current/static/ddl-partitioning.html
In the real life, the order table have other 3 sub-tables, suited in a gerarchy scale.Every orders row, have at least 3 rows in the first 'child', avery child row have at least 4 rows in the child-of-the-child, and so on..I also know that duplicate a structure like that will be harder - specially to mantain
DaNieL
A: 

Look into partitioning your tables in time slices. Partitioning is good for the log-like table case where no foreign keys point to the tables.