views:

316

answers:

8

So I have this code that takes care of command acknowledgment from remote computers, sometimes (like once in 14 days or something) the following line throws a null reference exception:

computer.ProcessCommandAcknowledgment( commandType );

What really bugs me is that I check for a null reference before it, so I have no idea whats going on. Here's the full method for what its worth:

    public static void __CommandAck( PacketReader reader, SocketContext context )
 {
        string commandAck = reader.ReadString();

  Type commandType = Type.GetType( commandAck );

        Computer computer = context.Client as Computer;

        if (computer == null)
        {
            Console.WriteLine("Client already disposed. Couldn't complete operation");
        }
        else
        {
       computer.ProcessCommandAcknowledgment( commandType );
        }
 }

Any clues?

Edit: ProcessCommandAcknowledgment:

 public void ProcessCommandAcknowledgment( Type ackType )
 {
  if( m_CurrentCommand.GetType() == ackType )
  {
   m_CurrentCommand.Finish();
  }
 }
+2  A: 

Is it possible that ReadString() is returning null? This would cause GetType to fail. Perhaps you've received an empty packet? Alternatively, the string may not match a type and thus commandType would be null when used later.

EDIT: Have you checked that m_CurrentCommand is not null when you invoke ProcessCommandAcknowledgment?

tvanfosson
Nah, the nullref is thrown at the line I highlighted, the packets are 100% sure to be valid and complete. (hash control + they're encrypted).
arul
Besides that, the command acknowledge method can never throw a nullref as its comparing the given type with the type bound to the computer, so actually comparison to null is valid.
arul
No part of code ever sets it to null. In the ctor it defaults to Command.Invalid.
arul
+4  A: 

Based on the information you gave, it certainly appears impossible for a null ref to occur at that location. So the next question is "How do you know that the particular line is creating the NullReferenceException?" Are you using the debugger or stack trace information? Are you checking a retail or debug version of the code?

If it's the debugger, various setting combinations which can essentially cause the debugger to appear to report the NullRef in a different place. The main on that would do that is the Just My Code setting.

In my experience, I've found the most reliable way to determine the line an exception actually occurs on is to ...

  1. Turn off JMC
  2. Compile with Debug
  3. Debugger -> Settings -> Break on Throw CLR exceptions.
  4. Check the StackTrace property in the debugger window
JaredPar
I have a PDB detailed stack trace. The information is 100% correct.
arul
Oh well, worth a shot.
JaredPar
+2  A: 

I would bet money that there's a problem with your TCP framing code (if you have any!)

"PacketReader" perhaps suggests that you don't. Because, technically, it would be called "FrameReader" or something similar if you did.

If the two PC's involved are on a local LAN or something then it would probably explain the 14 days interval. If you tried this over the Internet I bet your error frequency would be much more common especially if the WAN bandwidth was contended.

NathanE
Invalid packets are discarded and clients who sent them disconnected. First the crc of the payload is checked, then the packets are decompressed and finally decrypted.If any of these steps fail the client is disconnected.
arul
+1  A: 

If you have optimizations turned on, it's likely pointing you to a very wrong place where it actually happens.

Something similar happened to me a few years back.

leppie
Well, it's been working with the very same compiler settings reliably. Always pointed to the location where was the error. Only in this particular case there seems to be a problem.
arul
+1  A: 

Or else a possible thread race somewhere where context gets set to null by another thread. That would also explain the uncommonness of the error.

Joshua
Nope, only one thread access the netcode at the time.
arul
Well, so much for my idea.
Joshua
+1  A: 

Okay, ther are really only a few possibilities.

  1. Somehow your computer reference is being tromped by the time you call that routine.

  2. Something under the call is throwing the null pointer dereference error but it's being detected at that line.

Looking at it, I'm very suspicious the stack is getting corrupted, causing your computer automatic to get mangled. Check the subroutine/method/function calls around the one you have trouble with; in particular, check that what you're making into a "Computer" item really is the type you expect.

Charlie Martin
Thank you, any clues how to debug it?
arul
These can be hard to debug. I'd try two things: (1) catch the null reference exception at that location, and get as much information as you can when it happens; (2) grep the code for calls to that routine, and examine the code at each one.
Charlie Martin
Oh, in that exception handler, have a look at what Console.client is really giving you.
Charlie Martin
+1  A: 

What are the other thread(s) doing?

Edit: You mention that the server is single threaded, but another comment suggests that this portion is single threaded. If that's the case, you could still have concurrency issues.

Bottom line here, I think, is that you either have a multi-thread issue or a CLR bug. You can guess which I think is more likely.

Mark Brackett
There are no other threads, the server is singlethreaded.
arul
Surely there are dozen of other threads doing something useful, the netcode, on the other hand is single-threaded. Besides, I'm working with a local variable that can never be null at that point, unless GC'ed or whatever.
arul
@arul - Is your backing field declared volatile? Are you taking locks when you set that field? There can be slightly less obvious concurrency issues.
Mark Brackett
It's a local variable, which exists on stack and can't be declared as volatile.
arul
The backing field for Context.Client may need to be volatile.
Mark Brackett
Declaring it as volatile probably solved the problem, I mean, it didnt happen since then.
arul
+1  A: 

computer.ProcessCommandAcknowledgment( commandType );

Do you have debugging symbols to be able to step into this?

The null ref exception could be thrown by ProcessCommandAcknowledgement, and bubble up.

FlySwat
Yep. Besides that, ProcessCommandAcknowledgement accepts null command.
arul