views:

25

answers:

2

For some time now our flagship application has been having mysterious errors. The error message is the generic

[DBNETLIB][ConnectionWrite (send()).]General network error. Check your network documentation.

This is reliably reproduced by leaving the app open for the night and resuming work in the morning. Since it's a backend server app this is a normal scenario.

The funny thing is - we've migrated from SQL Server 7 to 2000 to 2008 and the issue is present on all of them. But what seems to matter is the OS on which we run the app. On WinXP it works fine, on Vista/7 it fails. So the problem is at the client end.

The results of Google on the error message cover a very wide spectrum of different causes (since this is a very generic error) and none of the scenarios found there are similar to ours.

So perhaps someone around here will know what the problem is in our case?

+1  A: 

Have you tried disabling SNP/TCP Chimneying?

Alex K.
No... but the error isn't intermittent. As long as you keep working with the server, all is fine. It appears **if and only if** there has been a prolonged (several hours at least) idle time.
Vilx-
+1  A: 

You should be able to reproduce this error condition on demand by:
1. Opening a database connection (in your client application)
2. Unplugging the network cable
3. Plugging network cable back in (wait until the network connection is restored)
4. Using the previously opened connection to query the database

As far as I can tell from experience, client side ADO code is not able to consistently determine if an underlying network connection is actually valid or not. Checking if the database connection is open (in the client code) returns true. However, performing any operations on that connection results in a General network error.

The connection pool appears to be able to determine when a connection goes 'bad' so it never returns a bad connection to the application. It simply opens a new connection instead.

So, if a database connection is kept alive for a long time (used or unused) by the application, the underlying TCP/IP connectivity can get broken.

The bottom line is that database connections should be closed and returned back to the the connection pool when not in use.

edit
Also, depending on the number of clients connecting to the db, not using the connection pool can cause another issue. You may hit the maximum number of sockets open on the server side. This is from memory. Once a connection is closed on the client side, the connection on the server goes into a TIME_WAIT state. By default, the server socket takes about 4 minutes to close, so it is not available to other clients during that time. The bottom line is that there is a limited number of available sockets on the server. Keeping too many connections open can create a problem.

One project I worked on easily hit this socket limit with around 120 users. A new 'feature' was added that absolutely hammered the server, and after a few hours of using the app, things would suddenly slow to a crawl for everyone. SQL server was not closing enough sockets in time for new connection requests. Although there are 65K sockets altogether, only the first 5000 are made available to the ADO (this is a default registry setting thing, so can be changed).

The number of sockets in TIME_WAIT state would slowly build up until the OS would not allocate any more. So clients had to wait until server side sockets closed and a new connection could then be created.

Chris Bednarski
The error message might be the same in both cases, but the underlying cause isn't. Why does the TCP/IP connection get broken? I thought that the SQL Server protocol even included an automatic ping functionality so that this doesn't happen. And why doesn't it happen on WinXP?
Vilx-
Nice addition, I'll keep it in mind, but unfortunately this isn't the case. :(
Vilx-
Intermittent network problems/errors do happen. To tell you the truth, I never got to the bottom cause of this problem either. Could be simple electrical interference, anything really.
Chris Bednarski
Pinging or polling the server periodically will not help in this case. The connection may have been broken and restored between the pings.
Chris Bednarski
That's just the thing - it **isn't intermittent**. It's completely reliably reproducible. I get it every morning when I come to work. And so do many of our clients. Note that just the first person to use the app gets the error. After that the connections get reset and everything works fine for the whole day. Until night comes again and the app stands idle for a couple of hours. It smells like a timeout somewhere but where and why?
Vilx-
@Vilx: OK. You got me convinced. You have two choices then. Either, keep looking for the timeout, or, update the software to use connection pooling. In my case, changing the way connections are handled by the software permanently fixed the problem.
Chris Bednarski