views:

1267

answers:

3

Background

I am working on a legacy small-business automation system (inventory, sales, procurement, etc.) that has a single database hosted by SQL Server 2005 and a bunch of client applications. The main client (used by all users) is an MS Access 2003 application (ADP), and other clients include various VB/VBA applications like Excel add-ins and command-line utilities.

In addition to 60 or so tables (mostly in 3NF), the database contains about 200 views, about 170 UDFs (mostly scalar and table-valued inline ones), and about 50 stored procedures. As you might have guessed, some portion of so-called "business logic" is encapsulated in this mass of T-SQL code (and thus is shared by all clients).

Overall, the system's code (including the T-SQL code) is not very well organized and is very refactoring-resistant, so to speak. In particular, schemata of most of the tables cry for all kinds of refactorings, small (like column renamings) and large (like normalization).

FWIW, I have pretty long and decent application development experience (C/C++, Java, VB, and whatnot), but I am not a DBA. So, if the question will look silly to you, now you know why it is so. :-)

Question

While thinking about refactoring all this mess (in a peacemeal fashion of course), I've come up with the following idea:

  1. For each table, create a "wrapper" view that (a) has all the columns that the table has; and (b) in some cases, has some additional computed columns based on the table's "real" columns.

    A typical (albeit simplistic) example of such additional computed column would be sale price of a product derived from the product's regular price and discount.

  2. Reorganize all the code (both T-SQL and VB/VBA client code) so that only the "wrapper" views refer to tables directly.

    So, for example, even if an application or a stored procedure needed to insert/update/delete records from a table, they'd do that against the corresponding "table wrapper" view, not against the table directly.

So, essentially this is about isolating all the tables by views from the rest of the system.

This approach seems to provide a lot of benefits, especially from maintainability viewpoint. For example:

  • When a table column is to be renamed, it can be done without rewriting all the affected client code at once.

  • It is easier to implement derived attributes (easier than using computed columns).

  • You can effectively have aliases for column names.

Obviously, there must be some price for all these benefits, but I am not sure that I am seeing all the catches lurking out there.

Did anybody try this approach in practice? What are the major pitfalls?

One obvious disadvantage is the cost of maintaining "wrapper" views in sync with their corresponding tables (a new column in a table has to be added to a view too; a column deleted from a table has to be deleted from the view too; etc.). But this price seems to be small and fair for making the overall codebase more resilient.

Does anyone know any other, stronger drawbacks?

For example, usage of all those "wrapper" views instead of tables is very likely to have some adverse performance impact, but is this impact going to be substantial enough to worry about it? Also, while using ADODB, it is very easy to get a recordset that is not updateable even when it is based just on a few joined tables; so, are the "wrapper" views going to make things substantially worse? And so on, and so forth...

Any comments (especially shared real experience) would be greatly appreciated.

Thank you!


P.S. I stepped on the following old article that discusses the idea of "wrapper" views:

The Big View Myth

The article advises to avoid the approach described above. But... I do not really see any good reasons against this idea in the article. Quite the contrary, in its list of good reasons to create a view, almost each item is exactly why it is so tempting to create a "wrapper" view for each and every table (especially in a legacy system, as a part of refactoring process).

The article is really old (1999), so whatever reasons were good then may be no longer good now (and vice versa). It would be really interesting to hear from someone who considered or even tried this idea recently, with the latest versions of SQL Server and MS Access...

+8  A: 

When designing a database, I prefer the following:

  • no direct table access by the code (but is ok from stored procedures and views and functions)
  • a base view for each table that includes all columns
  • an extended view for each table that includes lookup columns (types, statuses, etc.)
  • stored procedures for all updates
  • functions for any complex queries

this allows the DBA to work directly with the table (to add columns, clean things up, inject data, etc.) without disturbing the code base, and it insulates the code base from any changes made to the table (temporary or otherwise)

there may be performance penalties for doing things this way, but so far they have not been significant - and the benefits of the layer of insulation have been life-savers several times

Steven A. Lowe
+1  A: 

I agree with Steven's comment--primarily because you are using Access. It's extremely important to keep the pros/cons of Access in focus when re-designing this database. I've been there, done that with the Access front-end/SQL Server back-end (although it wasn't an ADP project).

I would add that views are nice for ensuring that data is not changed outside of the Access forms in the project. The downside is that stored procedures be required for all updates--if you don't already have those, they'd have to be created too.

ranomore
+3  A: 

You won't notice any performance impact for one-table views; SQL Server will use the underlying table when building the execution plans for any code using those views. I recommend you schema-bind those views, to avoid accidentally changing the underlying table without changing the view (think of the poor next guy.)

When a table column is to be renamed

In my experience, this rarely happens. Adding columns, removing columns, changing indexes and changing data types are the usual alter table scripts that you'll run.

It is easier to implement derived attributes (easier than using computed columns).

I would dispute that. What's the difference between putting the calculation in a column definition and putting it in a view definition? Also, you'll see a performance hit for moving it into a view instead of a computed column. The only real advantage is that changing the calculation is easier in a view than by altering a table (due to indexes and data pages.)

You can effectively have aliases for column names.

That's the real reason to have views; aliasing tables and columns, and combining multiple tables. Best practice in my past few jobs has been to use views where I needed to denormalise the data (lookups and such, as you've already pointed out.)

As usual, the most truthful response to a DBA question is "it depends" - on your situation, skillset, etc. In your case, refactoring "everything" is going to break all the apps anyways. If you do fix the base tables correctly, the indirection you're trying to get from your views won't be required, and will only double your schema maintenance for any future changes. I'd say skip the wrapper views, fix the tables and stored procs (which provide enough information hiding already), and you'll be fine.

Rick
Regarding derived attributes: Perhaps the best example is when computation of derived attribute involves an UDF. With a computed column in a table, any change of the UDF is real pain in the ass.
Yarik
Regarding frequencies of changes, additions, and deletions of columns: It obviously depends on the system. In our case changes happen much more frequently than additions and deletions.
Yarik
Good point about UDFs in computed columns, I hadn't considered that. You're right, not being able to alter the function without altering the computed column first would be annoying. I still think views should be created only when needed, rather than as a standard.
Rick