Advice on data capture management system

Hi, I'm looking at a potential piece of work involving a fairly complex data capture and management site and am wondering how best to approach it. I would be using a LAMP architecture.

The backend will involve:

Mass user creation via user uploaded csv file - I have done this before so am confident about this part.

Once registered, users will submit confidential information on a regular basis. Although the data is confidential, I would simply be planning on storing this in a MySQL database, which would be backup on a regular basis.

There would be different roles involved:

An administrator, who would monitor user activity and create new roles and user accounts where required.
End users - who would simply be supplying information via forms, with the ability to edit their profiles, view previous submissions and other basic stuff like that
Management users - who can run a series of pre-defined reports on the data and display this information within the browser. They may also carry out "free-form queries" on the data as well as extract selected results as spreadsheets. A number of these standard reports would also be made available as web services/feeds. The free-form queries part is
slightly concerning, as I don't know yet what the data actually is, plus there is a strong likelihood that new questions would be dynamically added to the data capture forms over time. The database structure would have to be flexible to accommodate this, plus I would have to provide this capacity for free-form queries, which I've never done before. Can anyone suggest a sensible approach to this?

There would also be some form of versioning, so that if a user updates/revises certain data, the changes would be tracked and the previous record(s) would still be available. I would be planning to incorporate this into the database design, so that instead of updating and overwriting data in the database, a new record would always be created, with the "overwritten" one simply being flagged as archived. This way, I think I would always be able to retrieve the live data row, as well as the archived rows (organised by date). Does that make sense?

Thanks in advance for any pointers, this is a bit more complex than what I've worked on before (mainly standard CMSs) - I think i know how to handle the above, but would be grateful for any advice a more experienced developer could provide.

If you are going to use that versioning system, I recommend that you also create views to get all the records that are active and have the user interface always use the views (except where they are looking at changes) not directly access that table. I would also put a trigger on the tables to make sure one and only one record is marked as active. Then you have the issue of the unique key and the key to identify the record in the PK/FK relationships (you don't want to have to change this every time you add another record).

Alternatively, you could set up audit tables if you just need to go back to old data if you need to research a change. If you need to see history frequently audit tables can be hard to query as they don't usually have the whole record structure just the old and new values, the column name and some meta data like when the change was made and by whom.

Possibly you could have a history table that is exactly like the orgional table with a few extra columns (for when the change was made and by whom as well a new surrogate key) and taking the autonumbering and unique index off the orginal PK. Then the active records are all in one table, the inactive records are in another table (populated by trigger) and you use views only when you want to see both sets of data together.

The key to all of this is to use triggers to populate the history wherever you intend to store it. Databses are not just affected by the user application, so you need to make sure all changes to the database are recorded not just those done by the user interface or you will have gaps in your history. I can't speak for all databases but in SQl Server, triggers act once on each batch of data, don't write them assuming they will act one record at a time if someone updates a roup of prices in one query. It is generally critical to write triggers in a set-based fashion. Looping through records is slow and in a trigger can bring your entire system to a halt while you loop through the million records that hada 10% price increase.

Thanks for the detailed reply. I would be using MySql 5, which has support for triggers. I've not used triggers before but for this application they seem like a good way of handling changes to data. If I understand correctly, in my case the trigger would most likely be activated on an UPDATE to a record - handling a chain of db events exclusively at the db end, thereby reducing PHP application code to handle these events. Also, the idea of audit tables sounds reasonable, with extra columns for logging dates etc.

kenny99 2010-02-03 10:52:28

ansaurus

tags:

views:

answers:

Advice on data capture management system

related questions