I'm trying to track down possible causes of my app not being able to connect to a db server.
I have a windows service that connects to the database when it starts. The service runs on machines with a reliable wired network connection. It's installed with startup Automatic so normally it starts when windows does, and in almost all cases this works fine.
However, with one set of XP machines (that I don't have control of) the database connection fails when the service starts up with windows starting. The standard exception is raised:
System.Data.SqlClient.SqlException: An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)
On these machines if a user is logged in and starts the service manually it connects to the database correctly, which is pretty weird. So I guess the problem at windows startup is either:
- The service starts before the network is connected
- The service starts before the machine can connect to DNS to resolve the server name (if that's how db server names are resolved)
- There is a policy/firewall/etc on the machine that initially prevents outgoing connections
- something else...
The problem doesn't occur when a user manually starts the service, so something must have happened to resolve the problem. I guess this is either a process that runs at startup but hasn't finished when my service starts, or the fact that a user has logged in - possibly something in their logon script.
I don't have direct access to the machines, so need to come up with a good idea of what the problem could be and a way to identify if that's correct. I can't repeatedly deploy diagnostic programs so need to be thorough the first time.
So first question is: does anyone know of desktop policies, network policies or software that could cause this situation? Second question: what can I do to diagnose exactly what is happening?
I'm thinking of creating a new diagnostic service which will also be installed to start automatically and will perform various actions to see what is going on, and log this info. eg:
- run "ipconfig /all" (to see if there is a network connection)
- ping the database server by IP address (to see if it can find the server)
- ping the database server by name
- check the HKLM registry key that contains the server name (in case the HKLM registry is later updated)
- create sql connection to the database (to find out when this starts working)
- repeat these steps every few seconds.
I'll have this service installed, restart the machine, then after a period have a user log in. The diagnostics should show some useful info... but anyone have a better idea or additional suggestions?
I have also tried changing the service to run as a user account with suitable privileges, but that didn't resolve the problem. Note that the Local System account does have sufficient priveleges to connect to the db server, since the service works fine when manually started. so it's not the same as this question.
UPDATE: troublesome machines are running Win XP.
UPDATE: this article gives a good discussion of error code 26