The entities to be stored have 25+ properties (table columns). The entities are pretty diverse, meaning that, most of the columns are empty. On average, I'd say, less than 20% (<5) properties have a value in any particular item. So, I have a lot of redundant empty columns for most of the table rows. Almost all of the columns are decimal numbers.
Given this scenario, would you suggest serializing the columns instead, or perhaps, create another table named "Property", which would contain all the possible properties and then creating yet another table "EntityProperty" which would map an property to an entity using foreign keys? Or would you leave it as it is?
An example scenario where this kind of redundancy might occur could be the following:
We have an imaginary universe with lots of planets. We are creating a space mining game and each planet has 30 different mineral contents. Most of the planets have only 2-3 minerals.
The simplest solution would be to create a single table 'Planets' with 30 columns, one for each mineral. The problem here is that most of rows in the 'Planets' table have 25+ columns, in which each of one the value is null or zero. Thus we have lot of redundant data. Say, we would have 500k-1M records. I would guess it costs a byte at most to save a null or zero decimal value. Thus, we waste space 500,000-1,000,000 bytes, ie. one megabyte at most.
The other solution would be to create two additional tables. Instead of storing all the minerals in the 'Planets' table, we take them all out and create a table for the minerals called 'Minerals'. This would contain only 30 rows, one for each different mineral type. Then, we create a table called 'PlanetMineral' which contains a reference to a planet row and to a mineral row, and additionally this table would have a column telling the amount of the mineral the planet has. Apparently, in many database systems this complicates queries since you have to do possible several joins. I'm using SQL server with LINQ to SQL, which scaffolds the foreign key constraint into class object property, accessible through code. (ie. I can simply access the minerals a planet has with planet.Minerals) So, from this perspective it doesn't add complexity. The redundancy is a small portion (like 1/15) of the first solution. The reason there is still some overhead is because of the foreign keys we need to store.
As for the data query efficiency, I really don't know how the costs of the queries would compare between these two solutions.