ansaurus

Question

How much of the data in a database is string/text data?

Answer 1

+3 A:

I don't know if it's exactly what you're looking for, but there are a few large datasets publicly available online that you could look at to analyze, and should be fairly representative.

For example, the entire wireless licensing data for the U.S. FCC is available here. That's probably fairly representative of a customer database, which tends a little heavier on the string side.

Another example is the Bureau of Labor Statistics Data. That's probably fairly representative of a statistical database, which tends a little heavier on the numerical side.

As far as how those stack up against every database in the world, you probably aren't going to be able to do any better than an educated guess.

Karl Bielefeldt 2010-07-13 03:59:42

Yes, this goes in the right direction. However, I'm preferably looking for a study that has already done analysis on such data sets.

inflagranti 2010-07-13 12:54:07

Answer 2

A:

Its important to realise that since Event is a VARCHAR field rather than CHAR it only takes up as much space as the number of characters actually in the fields (the VAR here stands for Variable). There may be a byte stuck onto the end as a flag.

Infact the VARCHAR field is probably more efficient than the int and date fields since they will always take up 4 or 6 bytes whereas the VARCHAR field can take up as little as 2 bytes.

I dont know of any studies though.

Toby Allen 2010-07-16 21:23:58

Answer 3

+1 A:

It really depends on the database. 80% of a record being text sounds like a good estimate for an average.

It would help to know the background as to why you are doing this research. I'm not so sure that finding a good answer to your question will really help you solve a problem.

If you really need some sort of answer you could collect the data yourself. Set-up a form where people can upload data and ask developers to upload their schemas (perhaps as .sql dump).

andyuk 2010-07-16 21:25:19

ansaurus

tags:

views:

answers:

How much of the data in a database is string/text data?

related questions