I was wondering if there has been any research about how much of the data stored in a databases consists of string data. Also, how much of that string data is free-text data (i.e. completly unstructered) and how much of it consists of identifiers such as proper names. My intuitive feeling is that often the size of a record is mainly defined by some large varchar fields, for instance a simple table containing an event:
Column | Type | Size
------------------------
ID | Integer | 4
Date | Date | 6
Event | Varchar(50) | 50
Even though only one column is a string column (i.e 33% of the columns), this one column makes up for 80% of a record's size (the actual data stored my be smaller of course). From my experience a lot of tables have such form. It would be even more extreme, if Event
was some free text field that can be up to 2000 characters for instance.
So does anybody have some hard facts about this in real-world databses? Or some reliable estimate on that?