ansaurus

Question

How to design a generic database whose layout may change over time?

Answer 1

+1 A:

This is for A) & B), and is not something I have done but thought it was an interesting idea that Reddit put to use, see this link (look at Lesson 3):

http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html

Kerry 2010-06-10 02:28:01

That may work great for low-importance social bookmarking type apps but for a payroll app it would be a complete disaster.

Aaronaught 2010-06-10 02:45:46

Mawg 2010-06-10 03:45:37

You also note that any mention about reports or analysis against the Reddit database are suspiciously omitted.

Thomas 2010-06-10 04:04:37

Answer 2

+1 A:

Not sure about the database but for charts instead of using PHP for the charts, I recommend looking into using javascript (http://www.reynoldsftw.com/2009/02/6-jquery-chart-plugins-reviewed/). Advantages to this are some of the processing is offloaded to the client side for chart displays and they can be interactive.

mOX 2010-06-10 02:29:40

Thanks. I am generally wary of client-side stuff as it can be turned off, hacked,etc, but I will look into this.

Mawg 2010-06-10 03:46:01

Answer 3

+6 A:

In essence, you are asking how to build an application without specifications. Relational databases were not designed so that you can do this effectively. The common approach to this problem is an Entity-Attribute-Value design and for the type of system in which you want to use it, the odds of failure are nearly 100%.

It makes no sense for example, that the column called "Name" could become "Salary". How would a report where you want the total salary work if the salary values could have "Fred", "Bob", 100K, 1000, "a lot"? Databases were not designed to let anyone put anything anywhere. Successful database schemas require structure which means effort with respect to specifications on what needs to be stored and why.

Therefore, to answer your question, I would rethink the problem. The entire approach of trying to make an app that can store anything in the universe is not a recipe for success.

Thomas 2010-06-10 02:29:52

"asking how to build an application without specifications". Well, the app has specs. And the db will too, when the input form is generated, that defines the database."It makes no sense for example, that the column called "Name" could become "Salary". That does seem unlikely; my examples are more realistic, but, of course I can't prevent the user for doing stupid things. That is why the column name cannot be the primary key. But I don't think it unreasonable for someone to decide to rename an input field on a form. +1 for taking the time to reply, Thanks

Mawg 2010-06-10 03:34:32

please see edit at end of original question. Thanks.

Mawg 2010-06-10 03:42:27

@mawg - You are lacking specifications about the most critical piece of any database driven application: the data itself. Where it will really unravel is reporting. In essence, you will need to ask for the schema before you can build the data that will drive a report. Storage is only a small part of the puzzle. What will kill the project is performance and maintenance. Here is a great article about one guy's experience with an EAV and note the great analogy he uses. http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

Thomas 2010-06-10 04:13:49

@mawg - If changes to the schema are expected to be infrequent, why can't specifications be gathered ahead of time? It would be far better and cheaper to build a normalized database with standard columns. In fact, it would be cheaper to custom build three different databases than to build one EAV to rule them all to store all three system's information.

Thomas 2010-06-10 04:26:17

Mawg 2010-06-10 05:38:00

@mawg - Then it sounds like management wants the same type of application mentioned in the article link I provided in an earlier post. It would make a good case study read for them to understand why such a system is untenable and more importantly, unprofitable.

Thomas 2010-06-10 14:01:38

If I were you, I run as fast as I can from this disaster project.

HLGEM 2010-06-10 15:35:17

Answer 4

+1 A:

This is almost always a bad thing. Check out this thread for more responses:

http://stackoverflow.com/questions/66385/dynamic-database-schema/

Matt Rogish 2010-06-10 02:31:16

Good link +1, thanks, but please also see edit at end of original question. Thanks.

Mawg 2010-06-10 03:42:59

Answer 5

+2 A:

Like Thomas said, rational database is not good at your problem. However, you may want to take a look at NoSQL dbs like MongoDB.

Codism 2010-06-10 02:40:22

+1 thanks, but it has to be ODBC compliant

Mawg 2010-06-10 03:43:17

Answer 6

+4 A:

"This all seems very tricky to me, but I am database n00b - maybe it is simple to you gurus?"

Nope, it really is tricky. Fundamentally what you're describing is not a database application, it is a database application builder. In fact, it sounds as if you want to code something like Google App Engine or a web version of MS Access. Writing such a tool will take a lot of time and expertise.

Google has implemented flexible schemas by using its BigTable platform. It allows you to flex the schema pretty much at will. The catch is, this flexibility makes it very hard to write queries like "position = 'senior salesman' and salary > 50k".

So I don't think the NoSQL approach is what you need. You want to build an application which generates and maintains RDBMS schemas. This means you need to design a metadata repository from which you can generate dynamic SQL to build and change the users' schemas and also generate the front end.

Things your metadata schema needs to store

For schema generation:

foreign key relationships (an EMPLOYEE works in a DEPARTMENT)
unique business keys (there can be only one DEPARTMENT called "Sales")
reference data (permitted values of EMPLOYEE.POSITION)
column data type, size, etc
whether column is optional (i.e NULL or NOT NULL)
complex business rules (employee bonuses cannot exceed 15% of their salary)
default value for columns

For front-end generation

display names or labels ("Wages", "Salary")
widget (drop down list, pop-up calendar)
hidden fields
derived fields
help text, tips
client-side validation (associated JavaScript, etc)

That last points to the potential complexity in your proposal: a regular form designer like Joe Soap is not going to be able to formulate the JS to (say) validate that an input value is between X and Y, so you're going to have to derive it using templated rules.

These are by no means exhaustive lists, it's just off the top of my head.

For primary keys I suggest you use a column of GUID datatype. Timestamps aren't guaranteed to be unique, although if you run your database on an OS which goes to six places (i.e. not Windows) it's unlikely you'll get clashes.

last word

'My first thought was to use the control name to identify each column, then I realized that the user can edit the form and rename, so that maybe "name" becomes "employee" or "wages" becomes ":salary". I am leaning towards a unique number for each.'

I have built database schema generators before. They are hard going. One thing which can be tough is debugging the dynamic SQL. So make it easier on yourself: use real names for tables and columns. Just because the app user now wants to see a form titled HEADCOUNT it doesn't mean you have to rename the EMPLOYEES table. Hence the need to separate the displayed label from the schema object name. Otherwise you'll find yourself trying to figure out why this generated SQL statement failed:

update table_11123
set col_55542 = 'HERRING'
where col_55569 = 'Bootle'
/

That way madness lies.

APC 2010-06-10 03:54:06

+1 Thanks for some great input,e specially the last word

Mawg 2010-06-10 05:39:30

Answer 7

+1 A:

The other respondents are correct that you should be very cautious with this approach because it is more complex and less performant than the traditional relational model - but I've done this type of thing to accommodate departmental differences at work, and it worked fine for the amount of use it got.

Basically I set it up like this, first - a table to store some information about the Form the user wants to create (obviously, adjust as you need):

--************************************************************************
-- Create the User_forms table
--************************************************************************
create table User_forms
    (
    form_id            integer identity,
    name               varchar(200),
    status             varchar(1),
    author             varchar(50),
    last_modifiedby    varchar(50),
    create_date        datetime,
    modified_date      datetime
    )

Then a table to define the fields to be presented on the form including any limits and the order and page they are to be presented (my app presented the fields as a multi-page wizard type of flow).

-

-************************************************************************
-- Create the field configuration table to hold the entry field configuration
--************************************************************************
create table field_configuration
    (
    field_id                integer identity,
    form_id                 SMALLINT,
    status                  varchar(1),
    fieldgroup              varchar(20),
    fieldpage               integer,
    fieldseq                integer,
    fieldname               varchar(40),
    fieldwidth              integer,
    description             varchar(50),
    minlength               integer,
    maxlength               integer,
    maxval                  varchar(13),
    minval                  varchar(13),
    valid_varchars             varchar(20),
    empty_ok                varchar(1),
    all_caps                varchar(1),
    value_list              varchar(200),
    ddl_queryfile           varchar(100),
    allownewentry           varchar(1),
    query_params            varchar(50),
    value_default           varchar(20)
    );

Then my perl code would loop through the fields in order for page 1 and put them on the "wizard form" ... and the "next" button would present the page 2 fields in order etc.

I had javascript functions to enforce the limits specified for each field as well ...

Then a table to hold the values entered by the users:

--************************************************************************
-- Field to contain the values
--************************************************************************
create table form_field_values
    (
    session_Id        integer identity,
    form_id           integer,
    field_id          integer,
    value             varchar(MAX)
    );

That would be a good starting point for what you want to do, but keep an eye on performance as it can really slow down any reports if they add 1000 custom fields. :-)

Ron Savage 2010-06-10 04:06:05

+1 Ron, this may well get awarded the answer. I really don't see the problem (maybe because I am a n00b) and suspect that I am just explaining it badly.

Mawg 2010-06-10 05:41:34

It's not so much of a problem - just that most "run time configurable" type of systems don't fit into a lot of developers object/data model mentality. As with any type of system, there are trade-offs. :-)

Ron Savage 2010-06-10 06:02:11

@Ron Savage - It is not that the system does not fit within developers' mentality it is that the schema does not fit within the primary designs of a relational model. The effect of that is that reporting or querying against the data, i.e., getting information from your data, is a bear and performs poorly. Now, if you never have to report against the data, that is entirely different.

Thomas 2010-06-10 15:57:58

Answer 8

+1 A:

Hi Mawg,

See this article: http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/ for someone else's experience of your problem.

Mark Bannister 2010-06-10 10:56:59

ansaurus

tags:

views:

answers:

How to design a generic database whose layout may change over time?

Things your metadata schema needs to store

last word

related questions