views:

1701

answers:

3

One of our clients has an application (FoxPro 9) running on top of a SQL Server 2005 backend. Intermittently, they are losing their ODBC connection with the SQL Server database. Below is the intial error information:

Err Msg: Connectivity error: [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionRead (recv()).

ODBC Err Msg: [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionRead (recv()).

SQL State: 01000

ODBC Err No: 10054

ODBC Handle: 1

FoxPro Error No: 1526

We cannot duplicate this error on command. We have tried any number of solutions to no avail. One such hardware base solution which we found was described in: http://support.microsoft.com/kb/942861/en-us

I mention this because it almost perfectly matches what we have been seeing. However, we have implemented all the workarounds listed in that posting (and in this one http://support.microsoft.com/kb/948496 ) - and the problem still continues.

This issue seems to show itself after the execution of long running queries, but we are not receiving any timeout errors, either from the application, or from SQL Server. I do not believe that this is the result of an idle timeout, because it sometimes occurs in the middle of an executing program.

I am not a hardware guy, but both the network, and the server (Windows Server 2003), appear to be fast and well designed. There are times however, when the database server is under significant stress.

If anyone has any suggestions on things we could try...please let us know!

+1  A: 

Just a shot in the dark, but have you tried running a trace and trying to capture error events as well as any tsql. This might provide some clues or help you to see a pattern.

Sam
We did try once...but of course it didn't happen while we had the trace on. There are too many users hitting the system for us to try a general trace. We have to have a very narrow scope, and hope that it occurs in that scope. We shoud probably try again.
Clinemi
Maybe just capture error events - I'm not sure if that would show you anything more than what you were getting. If you're not aware, 2005 profiler can correlate with perfmon data as well, so you could gather some stats and see if it happens during system stress.
Sam
I am going to give this a shot. It is a shot in the dark, but it may shine some light on what is happening.
Clinemi
A: 

Just a follow-up on this question...with a partial solution.

I did run a trace, in fact a number of them. What I found, is that there appears to be multiple causes for these errors. I was able find and fix one of them, but we still get this error in other places, and they do not show up on the traces that I have done.

So what was the deal with the one I did find? Well, from the trace I found that these ODBC errors appeared after another SQL Server error:

Error: 1203, Severity: 20, State: 1. Process ID 94 attempted to unlock a resource it does not own: OBJECT: 25:1699834390:0 . Retry the transaction, because this error may be caused by a timing condition. If the problem persists, contact the database administrator.

From the FoxPro code I found that an insert statement was causing this error... not always... but sometimes. In this insert, they were pulling all the fields from one table, and some of the fields from another table - into a third table. Every table in this database has an identity column called id_col, and the select statement that was populating the thrid table was returning two id_col fields.

insert into tablethree select a.*, b.price, b.item, id_col from tableone a, tabletwo b where a.item = ....

When we restructured the code so that only one id_col was being returned...the errors stopped.

To be honest there is one other possible contributor to this error that I fixed at the same time. There was another large/long query right before this one that was using Foxpro Rushmore syntax (e.g. "a.item+a.customer = lc_item+lc_customer") in a sql server query. We have had issues with this type of thing before, so it could be a contributor to the problem... but the evidence is highly in favor of the extra identity column being the cause.

Clinemi
A: 

Not sure if you have come across a complete solution, but have you looked into whether the network connection ever gets interrupted? One of the VFP programs I was developing started losing its SQL connection very frequently for users that were using laptops. It seemed that the laptops were temporary losing the network connection. Even if the connection is only gone for a couple seconds, the handle in VFP needs to be reset. Not sure if this is your exact issue, but sounded similar to me.

Yes we looked at the quality of the network connections, and as far as we can tell the network is pretty good...but I am still not entirely convinced.
Clinemi