views:

36

answers:

3

Here the deal: I've got a table Billing, which is basically a receipt (for different types of transaction). The app has a feature that you can create new charges (well, all charges except for tax and other constants). Since there would be a dynamic number of charges, we decided to store the charges for a billing on a single text field with a json structure. So the Charges column contains stuff like this:

{"CrateFee":50,"DeliveryFee":90,"PackagingFee":20}
{"DeliveyFee":90,"ServiceCharge":200}

Our alternative would be to create a separate table for these charges, with this structure:

Charges
BillingId | ChargeName |  ChargeValue
1           CrateFee      50
1           DeliveryFee   90
1           PackagingFee  20
2           DeliveryFee   90
2           ServiceCharge 200

But we decided to use the first one because if we use the second method, it will be populated by tens of thousands of rows in just a single day (estimate is about a thousand transactions in a day). I know we'll be limited with what we can do with the data if we use the first one, so I really want to push the separate table method. But I have no idea on scaling, optimizing, etc when it comes to databases, so I need some help with this.

So tell me, is it ok to use the second method? How about the performance hit? Are there other alternatives?

+4  A: 

The first implementation might make it easier to sstore the data in a single line, but you are opening yourself to a whole world of hurt.

By correctly using indexes on the fields, you should not have major issues, so i would recomend the second approach.

Also, at some later stage you can try implementing archiving, which should also help with the size of the second table.

astander
+1: Just say "no" to comma separated lists =)
OMG Ponies
Yeah, actually we were also discussing archiving. But no one here has any experience with archiving yet. Great links on the matter would help. :)
mives
Have a look at this link http://stackoverflow.com/questions/65093/best-way-to-archive-live-mysql-database
astander
+2  A: 

is it ok to use the second method?

I've yet to meet a DBA that would allow for the JSON format to be stored in their database, myself included.

The app has a feature that you can create new charges

In order to be properly normalized, I'd suggest a separate table containing the charge types. Users can still add charge types to it, and you'd use a foreign key to refer to the charge type - just like what you're doing with the billingid.

The biggest reason to break out the charges into rows would be for ease of accessing the data for reporting/etc. You could still do it with the JSON format, but you'd be looking at string manipulation and because it's freeform text there's a risk that you couldn't group by charge names. Nevermind the performance hit of doing that string manipulation. It's not worth the hassle - do it right, use option 2.

You might want to consider storing the tax (at least the percentage) in the table as well - taxes do change over time, so you'd want to know what tax was at the time of purchase in order to accurately reproduce a bill at a later date.

OMG Ponies
The Charge types are indeed saved on a separate table. But referencing them would not be advisable since the charges change over time. We need to save the charge value and name on the billing itself so the charges of the billing remains the same the time it's created. Tax and the other constants are saved on separate tables.
mives
A: 

If you use the first suggestion, your application will be horrible to work with and the performance will be something that will amaze you in it's awfulness and users will come to your cubical to strangle you. (Ok so there's a bit of hyberbole there).

The very first rule of database design is store only one piece of information in each field.

With the first design when you need to know how much each customer has spent by charge type, you won't be able to easily or quickly get this data.

What you need is a table called charge types that hold the type definitions. If the user adds a new one, it goes into this table. Then you have your charges table simliar to what you have in the second example. By storing standard types and making it painful but possible to add one, it becomes less likely that you will 4317 charge types that all mean the same thing. This is important later when you want to do reporting on the data in your database.

You will have significantly better performance on querying the second possiblity than the first as long as you index correctly.

If your organization honestly even considered using the first structure, then you need to hire an experienced dba immediately as there is a clear misunderstanding as to how to design a database. If you expect thousands of daily charges, you need to be considering performance inthe design not tacking it on later. Permature optimization does not mean don't consider optimization at all. In database performance is one of the three most critical things you need to design for (the other two are data integrity and security).

HLGEM