tags:

views:

294

answers:

3

I'm responsible for some embedded software that has to work with a customer's proprietary TCP interface (also embedded, but running under a well known and well regarded RTOS), but it's not getting through the three-way handshake, even though the HTTP interface, etc., all work fine, and I can communicate using the custom protocol with a program running on my PC.

Looking at the WireShark captures, his side initiates by sending a SYN, I send a SYN-ACK, and then he immediately sends a RST, so it looks like the problem is on his end. Is my analysis correct?

Here's a typical three packet example of the problem, with the MAC IDs anonymized (the real MAC IDs are valid). Sorry about pasting the raw hex, if anybody's got a better idea of how to put the WireShark capture up, I'm certainly amenable.

63  2009-06-29 13:07:49.685057 10.13.91.2 10.13.92.3 TCP 1024 > 49151 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=0 TSV=194 TSER=0

0000   f1 f1 f1 00 03 09 ab ab ab 60 10 89 08 00 45 00  
0010   00 3c 00 68 40 00 40 06 6f 35 0a 0d 5b 02 0a 0d  
0020   5c 03 04 00 bf ff 7d b3 81 44 00 00 00 00 a0 02  
0030   20 00 9c 2f 00 00 02 04 05 b4 01 03 03 00 01 01  
0040   08 0a 00 00 00 c2 00 00 00 00  

64  2009-06-29 13:07:49.685375 10.13.92.3 10.13.91.2 TCP 49151 > 1024 [SYN, ACK] Seq=0 Ack=1 Win=1460 Len=0

0000   ab ab ab 60 10 89 f1 f1 f1 00 03 09 08 00 45 00  
0010   00 28 00 02 00 00 64 06 8b af 0a 0d 5c 03 0a 0d  
0020   5b 02 bf ff 04 00 d4 ff ff ff 7d b3 81 45 50 12  
0030   05 b4 47 07 00 00 00 00 00 00 00 00  

65  2009-06-29 13:07:49.685549 10.13.91.2 10.13.92.3 TCP 1024 > 49151 [RST] Seq=1 Win=0 Len=0

0000   f1 f1 f1 00 03 09 ab ab ab 60 10 89 08 00 45 00  
0010   00 28 00 6a 00 00 40 06 af 47 0a 0d 5b 02 0a 0d  
0020   5c 03 04 00 bf ff 7d b3 81 45 00 00 00 00 50 04  
0030   00 00 21 c9 00 00 00 00 00 00 00 00
A: 

First of all, those aren't valid MAC addresses; a high-order byte & 0x1 means it's a multicast MAC. See http://en.wikipedia.org/wiki/MAC_address

I suspect sskuce has scrubbed the MACs
nik
In an embedded environment, I think it's possible that he's arbitrarily chosen "random" MACs that are causing him problems. At least, that's happened here, and led to the rule that the high-order byte in made-up MACs shall be 0 :)
Woops, I did forget to mention that I'd scrubbed the MACs. Both sides have their own OUIs, so I didn't want them to be identifiable.
sskuce
A: 

If you're not using fancy stuff on your side like custom tcp stack or raw sockets, I'd suspect the "proprietary TCP interface".

Has this ever worked with that client? Does it work with other clients?

Eric
I am running a web server that works fine, at least with all the browsers I've tried it with, and some .NET programs that directly communicate with port 80.
sskuce
+1  A: 

If both of you are using standard RTOS implementations, it is unlikely the TCP stack has a problem. Or, did you say the TCP is locally implemented?

If his client sends a SYN properly, and you can reply with a SYN+ACK,
it would appear that either your SYN+ACK is not well formed
(but, I could not see anything wrong yet), or,
like you suspect, his TCP stack did not accept the SYN+ACK properly.
However, if these are standard implementations, that is unlikely.

So, what more can you do?

  • Since it is the TCP handshake we are checking, you can just make him connect to any other machine at your end that is listening on the desired port

    • This will check his implementation (its good if the 3-way completes).
  • You can check your TCP stack with a TELNET connect to the port from another local machine

    • This will check your implementation (good if 3-way completes).
  • If both these things are fine, we need to suspect the network path

    • For example, could there be some firewall not allowing the communication and actively sending a RST to you?
nik
My side is a "bare metal" device with no RTOS, but we're using an open source TCP stack that's been around a while, and like I said, it does work with other machines and other protocols, just not this machine with this protocol. I just connected to my device with telnet, and it worked. There should be nothing in the network that's screwing with us - the only thing between him and me is a Windows PC with two NICs bridged so we can use it to sniff the traffic, which doesn't give us issues with any other devices. I'll see about writing a PC-based implementation of the protocol for him to test.
sskuce
Well, if the TCP 3-way handshake is not working for him, he is far from protocol testing. You should just run a TELNET server on the expected protocol port on some PC for him. Or, maybe pickup a TCP ECHO Server sample from some place and rig it on the PC at the desired port.
nik
Writing an implementation of the protocol on a PC using, e.g. the .NET TCP sockets API will allow him to test his connection as well as testing the protocol. It's more difficult for him to get under the hood of his device - he didn't develop the firmware for his device, he just writes applications that run on it, using a proprietary language. That means I have to think of everything I can to leverage my limited ability to change/eliminate the variables in this puzzle.
sskuce
Once you reach the protocol testing level, that is the agreeable way to go. But, don't stop him from debugging this 3-way with simpler things while you emulate for him.
nik
just a thought: your description of a open source stack on bare metal sounds suspiciously like the microchip tcp/ip stack on a PIC. If this is the case that stack, especially older versions had some issues of this type. If this is the case, check out the TCP/IP forum on www.microchip.com to determine if it is a related issue.
Mark
Try eliminating the two-NIC windows PC. Connect one NIC of the windows pc, your system and the other system to an old 10-baseT hub (not switch - HUB) and sniff that way. It's less intrusive and less likely to accidentally cause problems.
Michael Kohne