views:

114

answers:

4

At my new company, they keep all data associated with the data warehouse, including import, staging, audit, dimension and fact tables, together in the same physical database.

I've been a database developer for a number of years now and this consolidation of function and form seems counter to everything I know.

It seems to make security, backup/restore and performance management issues more manually intensive.

Is this something that is done in the industry? Are there substantial reasons for doing or not doing it?

The platform is Netezza. The size is in terabytes, hundreds of millions of rows.

What I'm looking to get from answers to this question is a solid understanding of how right or wrong this path is. From your experience, what are the issues I should be focused on arguing if this is a path that will cause trouble for us down the road. If it is no big deal, then I'd like to know that as well.

+1  A: 

In general I would recommend using separate databases. This is the configuration I have always seen used in production and it really makes a lot of sense since - as you mentioned - both databases have fundamentally different purposes / usage patterns / etc.

Justin Ethier
Thanks, Justin, for your fast feedback. I'm looking for specific and concrete arguments that I can use to evaluate this move towards physical consolidation. If there is not a strong reason either way, then it's not worth me fighting the battle. It seems, however, that there are or should be a number of performance, maintainability and security reasons for keeping our data partitioned by usage, functionality and security.
drrollins
A: 

Edit

If you're using one physical server, the fewer instances on that server the simpler the management and the more efficient the process.

If you put TWO instances on the same Physical Server you get:

Negatives:

  1. Half the memory to use
  2. Twice the count of database process

Positives:

  1. You could take the entire staging db down without affecting the DW

So which is more precious to you, outage windows or CPU and Memory?

On the same the physical server multiple instances make performance management issues MUCH more manual to solve. If you look at the health of one of the instances, it might look fine but users are reporting poor performance, so you have to look at the next instance to see if the problem may be coming from there... and so on per instance.

Security is also harder with more than one instance. At best it's just as hard as a single instance but it's never easier. You'll have two admin accounts (SYS or something), Duplicate process accounts, etc.

Tell us why you think it's better to have more than one instance.

ORIGINAL POST

Can we be clear on terms. When you say "in the same Database" do you mean to say the same instance, or the same physical server. If you did move the staging to a new instance would it reside on the same physical hardware?

I think people get a little too hung up on instances. If you're going to put two instances on the same piece of hardware, you're only doubling the number of everything to very little advantage. All the server processes will be running twice... all the memory pools will be cut in half.

so let's say you really did mean two separate physical boxes...

Let's say you buy 2 12-way boxes (just say). When you're staging db server is done for the day, those 12 CPU's are wasting away. When your users pack up and go home, your prod DW CPUs are wasting away. CPU cycles are perishable, you can't get them back. BUT, if you had one 24 way box... then the staging DB COULD use 20 CPUs at night for some excellent Parallel Execution for building summary tables and your users will have double the capacity for processes during the day.

so let's say you meant the same hardware.

"It seems to make security, backup/restore and performance management issues more manually intensive."

Guaranteed that performance issues are harder to solve the more instances that share the same hardware. Guaranteed.

Security

What security do you do at the instance level?

Backup

What DW are you backing up at the instance level? You're not backing up tablespaces, but rather whole instances? Seems like that pattern will fail at a certain size.

PLATFORM: NETEZZA

Not familiar with the tool specifically. So if it's a single instance on a single box, then the division would seem more logical than physical and therefore the reasons they exist is for management, not performance. You don't increase your CPUs or memory by adding a database, right? So it doesn't seem like there's no performance upside to it. Each DB may be adding separate processes (performance hit), or it might be completely logical like schemas in Oracle. If each database is managed by new processes than data going between them will mean IPC.

Maybe the addition of the Netezza tag will get some traction.

Stephanie Page
great answer, Stephanie. I'll update my question to address yours.
drrollins
Stephanie, Netezza is a single-instance appliance. All databases reside within the single instance. I'm talking about putting our staging and audit tables in the same database as the production data.
drrollins
A: 

You will gain speed in the load and the output if the tables are in the same schema (database). Obvious...but hey, I said it.

There is more overhead the more tables you put into one schema. Backups time, size of backups, ease of use.

Where I am, we have many multiple TB databases within one data-warehouse. Our rule of thumb is that a single loading process or a single report query should NOT have to span database. This keeps "like" tables together but gives some allowances for our backups and contingency processes. It also makes it a bit easier to "find" data.

For those processes that need to break this rule, we will either move data from one database to the other or allow the process to join across schemas.

I'm not as familiar with Netezza, so I'm not 100% sure what your options might be.

Markus
A: 

We use databases for every segment (INVENTORY, CRM, BILLING...). There are no performance downsides and maintenance and overview is much better.

hugo24