views:

44

answers:

2

I am having a small problem trying to decide on database schema for a current project. I am by no means a DBA.

The application parses through a file based on user input and enters that data in the database. The number of fields that can be parsed is between 1 and 42 at the current moment.

The current design of the database is entirely flat with there being 42 columns; some have repeated columns such as address1, address2, address3, etc...

This says that I should normalize the data. However, data integrity is not needed at this moment and the way the data is shaped I'm looking at several joins. Not a bad thing but the data is still in a 1 to 1 relationship and I still see a lot of empty fields per row.

So my concerns are that this does not allow the database or the application to be very extendable. If they want to add more fields to be parsed (which they do) than I'd need to create another table and add another foreign key to the linking table.

The third option is I have a table where the fields are defined and a table for each record. So what I was thinking is to make a table that stores the value and then links to those two tables. The problem is I can picture the size of that table growing large depending on the input size. If someone gives me a file with 300,000 records than 300,000 x 40 = 12 million so I have some reservations. However I think if I get to that point than I should be happy it is being used. This option also allows for more custom displaying of information albeit a bit more work but little rework even if you add more fields.

So the problem boils down to: 1. Current design is a flat file which makes extending it hard and it is not normalized. 2. Normalize the tables although no real benefits for the moment but requirements change. 3. Normalize it down into the name value pair and hope size doesn't hurt.

There are a large number of inserts, updates, and selects against that table. So performance is a worry but I believe the saying is design now, performance testing later?

I'm probably just missing something practical so any comments would be appreciated even if it’s a quick sanity check.

Thank you for your time.

A: 

It's always good to get a sanity check.

The nugget in your statement is that:

they want to add more fields to be parsed

I'd initially go with your name-value pair design in that case. Maintenance will be easier, and performance shouldn't be an issue.

Perhaps keep those attributes which are singular to the entity... i.e. "CreatedBy" or "CreatedOn" right on the entity. For those that repeat, i.e. "Address1" through "AddressX" or "Phone1" through "PhoneX".

To ease the pain of joins, consider creating a View for the typical SELECT patterns or needs that you might have.

p.campbell
Thanks for the reply. After doing some testing it appears the name-value pair design works great on large sets but seems to slower on smaller statements due to having a few more joins. Even when adding indexs I still can not come close as I have 4 seconds for the denormalized table and 20 seconds for the name value pair.In the end just need to move forward and abstract it out enough where this can be changed if need be.
Aur
A: 

Regarding this part of your question

So my concerns are that this does not allow the database or the application to be very extendable. If they want to add more fields to be parsed (which they do) than I'd need to create another table and add another foreign key to the linking table.

You wouldn't need to add another table if they just add additional fields. These would just go into the main table representing whatever entity it is.

You would only need to add new tables if the new fields they added were for a new type of repeating group or weren't functionally dependant on the entity (so you ended up with the same information being repeated, risk of update anomalies etc.).

Martin Smith