ansaurus

Question

What is the preferred way to store custom fields in a SQL database?

Answer 1

A:

I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.

GSerg 2010-01-18 23:15:02

Answer 2

+3 A:

Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.

I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.

Rob Farley 2010-01-18 23:15:11

Answer 3

+2 A:

Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.

This way, you can have any kind of data, working with any type of database.

Paulo Santos 2010-01-18 23:16:12

I just got torn a new one this morning for recommending serialization. Glad to see that someone else sees their purpose.

Chuck Vose 2010-01-18 23:19:11

@Chuck Vose, best way to store things as strings.

Paulo Santos 2010-01-18 23:24:00

Nice suggestion, but sorting or other calculations won't perform well. I added this to my question.

Marc-André Lafortune 2010-01-18 23:25:42

Answer 4

+1 A:

I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.

The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.

grenade 2010-01-18 23:17:48

Nice suggestion, but sorting or other calculations won't perform well. It is easy (and mandatory!) to filter the user inputs anyways, to make sure they are in an acceptable range (in case of numbers). That filtering is already written for the standard fields anyways.I added this to my question.

Marc-André Lafortune 2010-01-18 23:27:08

Answer 5

+1 A:

I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.

The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.

The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.

MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.

As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.

Chuck Vose 2010-01-18 23:18:28

If I understand correctly, Drupal has a hybrid of both strategies? It modifies the DB's schema dynamically (by adding tables) and having to do joins (since the data is split in different tables)...I fail to see the upside, this seems like the worst of both approaches, but maybe I'm missing something?

Marc-André Lafortune 2010-01-19 01:45:15

Hah, you probably really aren't. There are some really nice performance boosts to only having one column per table. Anyways, it _is_ totally bizarre structure and I wouldn't trade it for my redis key-store.

Chuck Vose 2010-01-19 05:57:31

Answer 6

A:

I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:

create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)


create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)

Pierre 2010-01-18 23:19:31

Answer 7

+2 A:

I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.

Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.

SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.

Dour High Arch 2010-01-19 00:14:18

Thanks for pointing out to that term (inner-platform effect). It is new to me, but I feel it applies very well here.

Marc-André Lafortune 2010-01-19 01:51:23

Answer 8

A:

Define two new tables: custom_exam_schema and custom_exam_data.

custom_exam_data has an exam_id column, plus an additional column for every custom attribute.

custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.

So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.

Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.

benzado 2010-01-19 00:15:40

Thanks. I'll count this as a vote for the solution I am suggesting.

Marc-André Lafortune 2010-01-19 01:36:06

Answer 9

+2 A:

It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:

Entity-Attribute-Value (EAV) Model

Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.

Aaronaught 2010-01-19 00:29:46

+1 for XML: flexible to the nth degree, and performance can be improved on SQL Server by using Typed-XML columns (although typing the column can constrain flexibility--but if a flexible XSD schema is used there is the potential for a fair balance)

STW 2010-01-19 00:34:03

Thanks for the link to EAV. This doesn't apply here; if a medical unit decides that for their purpose it is important to add the level of sugar in the bloodstream, say, then they'll add that using the custom fields and this will be inputed for the vast majority of clients.If I were to serialize anything, I would use YAML, but serialization is a terrible idea here (see the last 2 notes)

Marc-André Lafortune 2010-01-19 01:48:09

YAML is not supported by any database I know of (and has no schema spec), while XML can be strongly-typed and indexed in several of them. I also offered the possibility of nested sets which would have been more performant. I don't think the fact that you have conflicting requirements warranted a downvote.

Aaronaught 2010-01-19 02:00:17

+1 for XML. I believe in SQL Server 2005 XML fields can be indexed also. I have seen a proprietary framework (build upon sql2005) that does exactly what is asked in the question, performing in hundreds of thousands of rows, just as it would do if they where plain relational data.It could also add drill down searching capabilities, using UI components that where data-aware (ie. a combo box with looked-up values, criteria drilling down the xpath, etc).

daskd 2010-01-19 04:47:29

I didn't mean any offense by the downvote. I do believe that the EAV pattern does not apply to this problem, although I may have badly presented it?I take my comment about YAML back, indeed that made no sense here.

Marc-André Lafortune 2010-01-19 06:09:31

@Marc: Custom fields can be represented as custom attributes, and the EAV model is extremely prevalent in the health/medical field in particular, due to the frequency of changes required, so I think it definitely applies. It is not necessarily a "best practice", a fact which I highlighted; still, ad-hoc DDL is worse. It depends on lax security and relies on end users to make decisions that are already difficult for developers/DBAs (schema, indexes, constraints). One error could cause mass data corruption. Better to just have a dozen `CustomField` columns, but I wouldn't recommend that either.

Aaronaught 2010-01-19 14:19:17

To add: One reason XML is a good alternative is that XSD specifically can help you *avoid* the inner-platform effect as well as the security/maintenance woes. In SQL Server at least, the fact that you can create an XML index makes a huge difference: http://msdn.microsoft.com/en-us/library/ms345121(SQL.90).aspx

Aaronaught 2010-01-19 14:25:36

Answer 10

A:

lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.

He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.

I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.

Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code

the custom table would be my solution, modifying each table would end up being too risky.

Greg 2010-01-19 00:29:50

I'd like to hear about what kind of actual risk there is.

Marc-André Lafortune 2010-01-19 01:40:09

ansaurus

tags:

views:

answers:

What is the preferred way to store custom fields in a SQL database?

related questions