views:

15

answers:

1

We are using a table with a structure imposed upon us more than 10 years ago. We are allowed to add columns, but urged not to change existing columns.

Certain columns are meant to represent dates, but are put in different format. Amongst others:

 * CHAR(6): YYMMDD
 * CHAR(6): DDMMYY
 * CHAR(8): YYYYMMDD
 * CHAR(8): DDMMYYYY
 * DATE
 * DATETIME

Since we now would like to do some more complex queries, using advanced date functions, my manager proposed to duplicate those problem columns to a proper FORMATTED_OLDCOLUMNNAME column using a DATE or DATETIME format.

Is this the way to go? Couldn't we just use the STR_TO_DATE function each time we accessed the columns? To avoid every query having to copy-paste the function, I could still work with a view or a stored procedure, but duplicating data to avoid recalculation sounds wrong.

Solutions I see (I guess I prefer 2.2.1)

1. Physically duplicate columns
1.1 In the same table
1.1.1 Added by each script that does a modification (INSERT/UPDATE/REPLACE/...)
1.1.2 Maintained by a trigger on each modification
1.2 In a separate table
1.2.1 Added by each script that does a modification (INSERT/UPDATE/REPLACE/...)
1.2.2 Maintained by a trigger on each modification
2. On-demand transformation
2.1 Each query has to perform the transformation
2.1.1 Using copy-paste in the source code
2.1.2 Using a library
2.1.3 Using a STORED PROCEDURE
2.2 A view performs the transformation 
2.2.1 A separate table replacing the entire table
2.2.2 A separate table just adding the date-fields for the primary keys

Am I right to say it's better to recalculate than to store? And would a view be a good solution?

A: 

I've been testing this all morning. I've duplicated the table 2 times:

  • once for using a view which transforms the VARCHAR field into a DATE field on each query, using STR_TO_DATE
  • once for adding a column and doing an UPDATE using STR_TO_DATE

Afterwards I let loose a lot of different queries - the view performs a bit slower, as expected, but only tens of milliseconds.

Since it does take me 7Mb extra to store the extra column, (and that has its implications on disk usage and RAM usage), and our server has CPU power to spare but not RAM/io, I'm leaning towards the "transform the CHAR into DATE on each query" solution.

I'm not sure about using a VIEW, a Stored Procedure or just putting the STR_TO_DATE in each query I'll be writing - but that's more a "coding best practice" than an optimization.

Konerak