views:

821

answers:

3

I'm trying to track down possible causes of my app not being able to connect to a db server.

I have a windows service that connects to the database when it starts. The service runs on machines with a reliable wired network connection. It's installed with startup Automatic so normally it starts when windows does, and in almost all cases this works fine.

However, with one set of XP machines (that I don't have control of) the database connection fails when the service starts up with windows starting. The standard exception is raised:

System.Data.SqlClient.SqlException: An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)

On these machines if a user is logged in and starts the service manually it connects to the database correctly, which is pretty weird. So I guess the problem at windows startup is either:

  • The service starts before the network is connected
  • The service starts before the machine can connect to DNS to resolve the server name (if that's how db server names are resolved)
  • There is a policy/firewall/etc on the machine that initially prevents outgoing connections
  • something else...

The problem doesn't occur when a user manually starts the service, so something must have happened to resolve the problem. I guess this is either a process that runs at startup but hasn't finished when my service starts, or the fact that a user has logged in - possibly something in their logon script.

I don't have direct access to the machines, so need to come up with a good idea of what the problem could be and a way to identify if that's correct. I can't repeatedly deploy diagnostic programs so need to be thorough the first time.

So first question is: does anyone know of desktop policies, network policies or software that could cause this situation? Second question: what can I do to diagnose exactly what is happening?

I'm thinking of creating a new diagnostic service which will also be installed to start automatically and will perform various actions to see what is going on, and log this info. eg:

  • run "ipconfig /all" (to see if there is a network connection)
  • ping the database server by IP address (to see if it can find the server)
  • ping the database server by name
  • check the HKLM registry key that contains the server name (in case the HKLM registry is later updated)
  • create sql connection to the database (to find out when this starts working)
  • repeat these steps every few seconds.

I'll have this service installed, restart the machine, then after a period have a user log in. The diagnostics should show some useful info... but anyone have a better idea or additional suggestions?

I have also tried changing the service to run as a user account with suitable privileges, but that didn't resolve the problem. Note that the Local System account does have sufficient priveleges to connect to the db server, since the service works fine when manually started. so it's not the same as this question.

UPDATE: troublesome machines are running Win XP.

UPDATE: this article gives a good discussion of error code 26

+2  A: 

Several hints:

  1. Maybe you should set a dependency from your service to another Windows service (like SQL Server or DTC) which would make sure that these services are started before your service starts.

  2. Another option (depending on the OS you're running) is delaying auto-start services, take a look at http://msdn.microsoft.com/en-us/magazine/cc164252.aspx

  3. Third option: simply put some Thread.Sleep (or polling loop) to your service thread before it connects to the DB for the first time.

Igor Brejc
Regarding #1: what service could i use on a normal pc (won't have SQL or DTC) to ensure the network is available first?
Rory
#2: unfortunately not possible as they're XP
Rory
#3: yes, but ideally i'd like to identify the cause of the problem and change that rather than my app. Or change both.
Rory
I'm not sure it's the networking issue, but if it is, I think you should first try with the #3 - you could simply to a ping (http://msdn.microsoft.com/en-us/library/system.net.networkinformation.ping.aspx) to the DB server in a loop, log that to a file + when the DB server responds, exit the loop.
Igor Brejc
Put the DNS Client as a dependency. this will catch both DNS and TCP/IP in one as the DNS Client is dependent on the TCP/IP service. or combine #1 and #3, and sleep for 60 seconds after waiting on the DNS Client as a dependency before trying.
Moose
BTW: are you sure you don't have DTC service on your XP? Look for "Distributed Transaction Coordinator". As for changing your app: #1 can only be done by changing (and reinstalling) the Windows service code. So #3 is still the best option.
Igor Brejc
+3  A: 

Is it possible that these machines have some kind of firewall installed on them that hasn't spun up yet by the time your service starts? I've run into some cases where a software firewall will completely lock down internet connectivity until the service is fully started and it can monitor traffic properly - if this is the case, and your service starts up faster than the firewall, you could see what appears to your application to be a disconnected network.

rwmnau
it's possible - I will ask their administrator
Rory
+2  A: 

We have encountered just this issue before. The way we solved the problem was to have a start thread that did not exit until a connection could be made to the database. We would attempt the connection every so often and log the fact that we could not connect. Once we established the connection we would continue the service.

In otherwords, let your service start normally, but don't let it attmept to perform any work until it knows if it can connect to the database. Do be careful if you decide to log every connection failure as you could fill up a machine with useless log entries. We would log the first connection failure and not again unless the service was restarted.

Of course if you database is always local to your service, the better solution is to create a dependency to the database windows service, but it doesn't sound like your database is local.

Daniel
thanks. did you work out at all what was the cause of the problem?
Rory
For us the issue was we just couldn't guarantee the database server was going to be ready when are service was. Our database could be local or not, but we didn't care. If we couldn't talk to the database we knew our service could not work. So we just waited until we could.
Daniel