views:

925

answers:

2

Is there a problem using VMware on Windows to host a virtual linux box running iptables? I have a configuration that seems to work on physical hardware but is flaky under VMware.

I'm using VMware to run a virtual linux 2.6.24 machine on a Windows 2003 Server host. The linux application is essentially a NATting router that runs iptables. The rules in the nat table include:

Chain foo_pre
 target     prot opt in  out  source      destination
 LOG        all  --  *   *    0.0.0.0/0   0.0.0.0/0     [options here]
 LOG        all  --  *   *    0.0.0.0/0   10.10.1.33    [options here]
 DNAT       all  --  *   *    0.0.0.0/0   10.10.1.33    tcp dpt:80 to:192.168.0.33:8080

Chain PREROUTING
 target     prot opt in  out  source      destination
 foo_pre    all  --  *   *    0.0.0.0/0   0.0.0.0/0

I'm seeing the incoming packets to 10.10.1.33:80 using tcpdump, and the first LOG generates messages, but neither the DNAT or the second LOG show the packets registering on their packet counters, the second LOG generates no messages, and tcpdump doesn't show the packets to 192.168.0.33.

The eth0 adapter is on the 10.10.0.0/16 network with a default gateway of 10.10.1.1; it has a secondary address of 10.10.1.33/32. /proc/sys/net/ipv4/config/eth0/forwarding is set to 1.

Is VMware the culprit, or am I missing something? Thanks!


Update: we've simplified the test environment. No NAT rules at all, just a linux VM running under a Win2k3 Server host. Test steps:

  1. VM is bridged to host NIC. VM and host are on the same subnet, with the same default gateway as above.

  2. VM communicates with devices both on and off its subnet: ICMP, TCP, UDP. Communication is bidirectional: it doesn't matter which equipment initiates it.

  3. Engineer power-cycled the default gateway while poking at the system.

  4. VM now communicates only with devices on its subnet. Any attempt to communicate through the gateway to the same equipment from Step 2 fails to put packets on the wire. tcpdump on eth0 on the VM shows outgoing packets with no response; WireShark on the host shows nothing on the physical NIC.

  5. Stopping and restarting the VM does not change its behavior. Stopping the VM and replacing it with a different VM with appropriate IP address, etc. does not change the behavior.

  6. The Win2k3 host continues to communicate normally, both on and off its subnet.

I can only conclude from this that "something happens" between the VM and the host: in the VMware drivers, or in the host's network stacks. I'm off to scour the web again.... it's hard to imagine we're the first to observe this.

Updates as they come. Thanks for your thoughts and discussion.

+1  A: 

Your second log line is trying to match packets sent to 10.10.1.33, but you changed the destination address to 192.168.0.33 on the line above it.

I'm not sure why you don't see the outgoing packets in tcpdump yet. I assume you're running tcpdump on the linux VM itself. Is the VM sending packets on the same interface it's receiving, or is there a second virtual ethernet adapter? What machines are the various IP addresses assigned to (other than 10.10.1.33).

Regarding update: I gather you're not using DHCP (people usually don't bother when using static IP addresses). Also, it sounds like the gateway sees one NIC using two IP addresses. Normally that should be ok, but it's always the details that get you.

Is it possible the gateway will only assign one IP address to the NIC and is ignoring traffic from the VM?

Magus
Sorry, typo in the post, not the actual code. I've corrected it. Add'l info in the edit above. Thanks for your interest!
Adam Liss
A: 

After your edit, I suggest an experiment: on your physical machine, configure your NIC to disable all hardware acceleration.

Windows programmer
Not sure I follow. Do you mean prevent it from offloading the checksums, or something else? What's the suspicion you want to test? Thanks!
Adam Liss
The hardware might be too smart for its own good, maybe recreating some settings for communication between the physical host and the gateway and assuming that it's done. If the hardware is told to do nothing then maybe drivers in both physical and guest machines can control it.
Windows programmer
If checksums are the only hardware optimization available then that's probably not the problem and my experiment is useless.
Windows programmer