views:

95

answers:

6

This requires a bit of background. I'm creating a web app (ASP.NET/VB.NET with SQL Server backend) that will accept an application to receive funding, and this form has contact info fields for three people. Because the information is the same for all three people (name, email, phone), I chose to store contact info in a table separate from the applications. In the application table, there is an ID field for each of these three contacts that points to a contact in the contact table.

However, this presents an interesting question/problem in doing my CRUD stuff. The only ways I can think of to create, update, and retrieve information for both the application and all three contacts in one SQL transaction require very complicated stored procedures. In other words, it would be much simpler (for me) to retrieve this information via multiple transactions. However, since this information is never needed independently, I would ALWAYS be doing multiple transactions to get information for one application.

So my questions:

  1. Is this design overkill? There will never be more than three contact people per application, and there MUST be exactly three. Am I over-complicating by removing this information to a separate table?
  2. What is the real cost of doing several SQL transactions vs. writing my complex stored procedures and only needing one transaction?
  3. Generally speaking, what is the cost of a SQL transaction to a web application using ADO.NET?

Thanks for bearing with that long-winded explanation.

*EDIT*

After reading some of your responses, it appears I am using the term "transaction" wrong. What I am actually curious about is the cost of executing multiple queries across a single connection, versus doing one query. Sorry for the misunderstanding.

A: 

You're using the term 'transaction' a bit incorrectly here. I think what you're referring to is a simple database query, ACID transactions aide.

That said, it's difficult to say. Connection pooling, enabled by default, will prevent the need to open three/four separate connections.

You'd likely not even need multiple queries to get the data, and can join the two tables to retrieve it, such as:

SELECT app.field1, app,field2, contact1.name, contact2.name, contact3.name
FROM Applications AS app
INNER JOIN Contacts AS contact1 ON app.contact1ID = contact1.id
INNER JOIN Contacts AS contact2 ON app.contact2ID = contact2.id
INNER JOIN Contacts AS contact3 ON app.contact3ID = contact3.id

or something similar, given your situation (adding new fields to the select list from app, returning more data from each contact table, etc. If the tables are indexed properly, the lookup are very quick, and will likely not impact performance at all.

If you're inserting/updating the data, then yes, you'll need multiple queries to do that, but the selecting of the data can occur in one.

Of course, the other option would be to put the data back into the original application table, removing the need for the second table in the first place. At which point, no extra joins or anything else is needed.

Brian M.
A: 

Use 1 stored procedure to do everything so you only need to open 1 sql connection.

Every variable in your web form maps to a parameter in the stored procedure (including the hidden contact and application keys which you default to 0 for new apps)

In the procedure, if the record keys are 0, the procedure adds records to the database and retrieves the keys quickly using scope_identity(). If you send non-zero keys to the procedure it updates the record. The procedure then returns all data back having all the stored procedure variables declared as output.

You can then wrap the procedure inside a sql begin/end transaction so it's an all or nothing sql command.

mike42
I don't understand - why would we need to bring stored procedures into this?
Jason Swett
Because the answer is bad.
TomTom
+1  A: 
  1. This is not overkill. This is a good normalization and a good application of the Don't-Repeat-Yourself principle. The requirements of the system may change and you may need 2 or 4 contacts in the future instead of 3.
  2. You don't need to do any complex stored procedures. You're not doing anything out of the ordinary here. You don't need stored procedures at all.
  3. Don't know.

When you say "transaction" I think you just mean "query." When I wanted to get the data for one application, I would use two separate, simple queries:

SELECT *
  FROM application

and

SELECT *
  FROM contact

It's two separate queries but big deal. The more important part is designing the database in a way that makes sense.

Jason Swett
Also, when you're inserting these records, you should definitely use a transaction. You can read about that here: http://jasonswett.net/why-you-should-use-database-transactions/
Jason Swett
+4  A: 

Transactions are needed to move the database from one consistent state into another consistent state. The 'consistent' here applies to the application view of the database. The typical example is the money transfer between two accounts: you have to debit one account and credit another account. In between these two operations the database is inconsistent, there is some money that have 'vanished' (the sum debited from one account is nowhere). But at the end of the transaction, the database is again consistent. You need a transaction to span these two operations in order to protect the readers from viewing an inconsistent database, and in order to ensure a consistent database in case of a crash.

You say that in order to process multiple items in your logic in a single transaction you require complicated procedures. That is not true, the transaction scope is orthogonal to the request scope: a client can start a transaction, do 3 operations by invoking 3 procedures in 3 calls, then commit the transaction. It is not required that all the operations be done in one single stored procedure.

So transaction should not induce significant procedure overhead. In fact, a procedure would be oblivious to transactions. A well written procedure should behave correctly when is invoked inside of a transaction as well as when is invoked w/o a transaction. See Exception handling and nested transactions for a template for procedures that behave correctly in presence of transactions and exceptions.

An finally, what is the real cost of a transaction? Transaction write data (read aren't really needing transactions) and as such they lock the rows modified. Reads normally block when attempting to read these blocked rows, and then the longer a transaction is, the longer it locks rows, and the more reads it blocks. But there is a really simple solution: snapshot reads. Snapshot reads are really a magic pixie dust, they allow applications to go ahead unhindered by blocked rows, because a reader can always read the row version prior to the update that is blocking the row. See Using Row Versioning-based Isolation Levels.

So the conclusion is simple: transactions have no cost. Simply because there is no alternative to transactions. Is not an issue of 'X is slower that Y', is an issue of 'X is the only correct alternative'.

Updated

After your edit: the cost oh having multiple requests vs. one requests can be significant. A round-trip to the server (ie. issuing a request on an open connection) has a fixed cost. If you do multiple round-trips then you pay this fixed cost on each request. If you do one single request with multiple procedure invocations then this fixed cost is only payed once. On very hot systems, this difference is measurable and has a cost on overall performance. But I'm talking about really hot systems, as in thousands of requests per second. The solution is usually not to complicate the procedures, but issue multiple procedure calls in one single request:

SqlCommand cmd = new SqlCommand(@"
exec usp_firstProc @arg1, @arg2;
exec usp_secondProc @arg2, @arg3;
exec usp_thirdProc @arg1, @arg3;", connection);
cmd.Parameters.Add("@arg1", someValue);
...
cmd.ExecuteNonQuery();

But I must say that this solution is not really required unless you have a really hot workload. As a back-of-the-envelop rule, for anything under 1000 requests/sec, I would consider the benefits of clear code to outweigh the performance benefits.

One thing different though if you have to open a new connection for each request. The login handshake is really expensive, can be measure din hundreds of ms. But the solution is trivial: connection pooling (is enabled by default in ADO.Net) and don't discard connections in the app prematurel, keep the connection and reuse it until the entire unit of work is finished.

Remus Rusanu
+1 and accepted! This is a great answer! You clearly know your stuff :)
Ender
A: 

Couple thoughts for you...

If your current design is two tables, first table (applciation) as

ID, contact_id1,contact_id2,contact_id3,otherinfo

second table as contact is simply

contact_id , info

Biggest advantage on this format is if you have 1 contact that is on multiple applications...Conatact 'Bob' is contact_1 on 8 different applcations means bob's info is only recorded once and only his ID is recorded on the application table. Biggest restriction is 3 contacts per application is permanently coded

Alternatively..application can be defined as application_ID, Info and conact as application_ID, id_sequence, info

The advantage of this setup is an infinate number of contacts can be stored for each application.

Although everything to meet your need...if you always have 3 contacts, the second option here is sorta unnessacary. If you're not planning to have more than 1000 records around, having all the info stored in one table isn't the worst idea (option one primarily eliminates redundant data). Answer to your question 1...it's overkill in some cases, but the answer will depend on your current requirements and future requirements. From a 'best practice' design, the 2 table setup is preferential.

Just a comment for Question # 2 - It'll be a simpler stored proc...I can't see it being too complicated with your setup. You could probably create an updateable view (Not sure what limitation or problems might arise there though).

M.E.
A: 

Every single query will cost you one client-database-server round trip time in addition to the pure processing cost. So, you can save on round trip times if you manage to batch simple single queries.

Peter G.