views:

220

answers:

2

I have a database which stores (among other things), the following pieces of information:

  • Hardware IDs BIGINTs
  • Storage Capacities BIGINTs
  • Hardware Names VARCHARs
  • World Wide Port Names VARCHARs

I'd like to be able to capture a more refined definition of these datatypes. For instance, the hardware IDs have no numerical significance, so I don't care how they are formatted when displayed. The Storage Capacities, however, are cardinal numbers and, at a user's request, I'd like to present them with thousands and decimal separators, e.g. 123,456.789. Thus, I'd like to refine BIGINT into, say ID_NUMBER and CARDINAL.

The same with Hardware Names, which are simple text and WWPNs, which are hexstrings, e.g. 24:68:AC:E0. Thus, I'd like to refine VARCHAR into ENGLISH_WORD and HEXSTRING.

The specific datatypes I made up are just for illustrative purposes.

I'd like to keep all this information in one place and I'm wondering if anybody knows of a good way to hold this all in my MySQL table definitions. I could use the Comment field of the table definition, but that smells fishy to me.

One approach would be to define the data structure elsewhere and use that definition to generate my CREATE TABLEs, but that would be a major rework of the code that I currently have, so I'm looking for alternatives.

Any suggestions? The application language in use is Perl, if that helps.

+5  A: 

A good way to do this is to use views. For example, to insert commas into a cardinal number, you can use:

mysql> create table foo (id int);
Query OK, 0 rows affected (0.12 sec)

mysql> insert into foo (id) values ( 123456789);
Query OK, 1 row affected (0.00 sec)

mysql> create view v_foo as select format(id, 0) as id from foo;
Query OK, 0 rows affected (0.10 sec)

mysql> select * from v_foo;
+---------------+
| id            |
+---------------+
| 123,456,789   |
+---------------+
1 row in set (0.02 sec)

You can use other string functions to format your other fields, and store them in the view definition.

friedo
+1  A: 

I'll propose an answer that questions the question.

One of the mantras that people who model the databases like to hum is the separation of the presentation layer (formatting) and data and I believe that relevant part from like goes something like:

'Thou shall not store formatted data in your databases, nor shall you discriminate against any formatting choice. Thou shall store the data in the native supported data types. Thou applications shall provide presentation layer and format your columns.'

Well, friedo's answer does not go directly against this - data is only presented through a view, the storage is still native.

Still, it depends how do you define presentation layer there - if the view and the server settings are considered part of presentation layer then all is fine, otherwise there is potential trouble as I, possible user of your system, will not be able to specify the fact that my thousand separator is a single quote (and it is, at least at the place of my current residence).

Also, once you go that road, how long do you think it will pass until you will have to deal with requests to re-parse the data back from text into a number and possibly end up in situations where this might be ambiguous (such as DD/MM/YY vs MM/DD/YY)?

The above rant is only regarding formatting, determining the number of decimal digits defines the domain of your data and is a good thing as it limits down possibility of inconsistent data entering your database.

EDIT: (entertaining the purist point of view a bit further, regarding number bases) Saying that hexadecimal number data has no meaning in other bases is generally a false statement. Number values have no base and can be represented in any base. Their domain (the set of allowed values) is the same.

The choice of hexadecimal for MAC address is a natural one due to historical reasons and the fact that it is for example easy to read the vendor part in that format. The choice of 'funny' format for IPv4 addresses is a historical with probably an anecdotal reason.

But both are only a choice and internally a good system will store them without bias (for example storing IPv4 as text is not a good thing). When RDBMS present you the results of a query (on a screen) it already takes a role of an application and format the results in some way.

This is not significant and the format you'll use in your application should not influence how you store the Storage Capacities or other entity properties.

So I am saying that this is application configuration data (metadata to the core date) and of course it can/should be stored in the database, but with MySQL (which is not so rich in defining custom types) it can't fit in the table definition and should be simply stored in another table that application will read and apply to your columns when presenting data to the user and not in some hackish way which will not be portable.

For example the view idea works, but can you query the view easily to get the formats that are applied to fields? Or lets say you want to change the formatting in all occurrences of field WWPN in all queries that use it (hexstring also sounds as already wrong), would that be easy? Or if there are other queries that transform the data and write it down in another table will you write it down with applied format or without it (re-parsing)? Etc...

Now if you had a table that stores application configuration data such as FieldFormatting: Table, Field, Format, CheckRules, LongFormat (or whatever makes most sense in your situation) then the above questions become a bit easier to deal with and you get to choose extra options for your application and business logic.

If you really (really, really) have to provide direct access to the database and the native types would make data unreadable for the users and you simply must preformat then you could even use the above table to generate and update the views/queries semi-automatically.

NOTE: I am taking a purist point of view here since I have a feeling that you are making design decisions here and not chasing last drop of performance or convenience (for example between application data types and database data types) when practical issues can be more important than modelling guidelines and rules. But the questions from the last paragraph still stand.

Unreason
Maybe I misunderstand what you're saying here, but I read this like we're in agreement.The problem I'm having is that the data types that MySQL provides are too broad. I have an element of 'aliasing' going on with the `VARCHAR` type, for instance. I need to represent text and I need to represent hexadecimal numbers. The hex numbers have no meaning in any other number base (they are akin to network MAC addresses).What I'm *attempting* to do is to separate the presentation of the data from the database, but providing enough information within the database to allow the right format choice.
Dancrumb
We are after the same goal I believe. Keep in mind, even as I am pursuing one point of view, that I think that friedo's answer is a good one on the level at which it was given - simple solution for a simple scenario, and with the awareness of its limitations will not do any harm. i was only trying to provide a bigger picture, which may or may not apply in a particular case.
Unreason
@goran; thanks for answer and your follow-up response. I think it's an interesting question... i wish there'd been a greater response from the SO community. That said, I really appreciate your input here.
Dancrumb