views:

221

answers:

2

Hi,

I have a low-level (like really low-level, it's basically all IOCTL calls and several calls to enumeration APIs) that crashes sporadically on Windows Vista/7 on clients' machines. Unfortunately, I have not been able to procure any crash dumps but one helpful user did mention that running the program in XP Compatibility Mode solved the problem.

The application is always launched with full admin rights (it's launched from another program that requires admin authorization) so it's not a UAC issue. I don't use any deprecated APIs and I'm not relying on any registry hacks, etc. I'm just issuing calls to enumerate disks, then using IOCTL commands to get some more low-level info about all attached devices.

What happens in XP Compatibility mode? What does Windows inject into my application or otherwise sandbox it with that prevents it from crashing on Vista/7? I had originally suspected heap corruption (though I've pulled my hair out attempting to replicate or to track down the issue) before being told that it runs fine in XP Compatibility Mode.

Can anyone suggest any possible issues that would be avoided in XP Compat Mode that I should look into to try to address this issue? Thanks!

EDIT:

One more thing that's probably very important to mention: I'm calling DDK/Kernel functions from userspace in order to get at certain features not exposed via the WIN32 API.

I'm using ZwReadFile, ZwCreateFile, ZwWriteFile, RtlInitUnicodeString, ZwQueryVolumeInformationFile, ZwDeviceIoControlFile, ZwSetInformationFile, ZwClose.

The IOCTLs I'm calling include IOCTL_DISK_GET_PARTITION_INFO_EX, IOCTL_STORAGE_GET_DEVICE_NUMBER, IOCTL_DISK_GET_LENGTH_INFO, and IOCTL_DISK_GET_DRIVE_LAYOUT_EX.

A: 

There have been many changes in the low-level driver's from XP to vista. I suspect you are using an IOCTL that is affected by it.

Am
Am, this code is working on over a dozen test machines running various permutations of Windows Vista/7 on widely-varying hardware. Certain hardware/software combinations are causing it to crash.All the IOCTLs I'm using are supported on Vista/7.
Computer Guru
@Computer Guru: What certain hard/soft-ware configurations that are different from the rest that is causing the crash?
tommieb75
Primarily the partition table. Win32 code still has really iffy support for GUID-partitioned harddisks (OS X-style) and that's one of the reasons I'm manually doing things from the DDK.
Computer Guru
+1  A: 

This is very odd, but I was calling ZwQueryVolumeInformationFile with FsInformationClass set to FileFsVolumeInformation.

I had passed in a buffer of FILE_FS_VOLUME_INFORMATION first normally allocated, then overallocated to (sizeof(FILE_FS_VOLUME_INFORMATION) + sizeof(TCHAR)*FILE_FS_VOLUME_INFORMATION->VolumeLabelLength).

Then I called FILE_FS_VOLUME_INFORMATION->VolumeLabel[FILE_FS_VOLUME_INFORMATION->VolumeLabelLength/2] = _T('\0'); and only on some machines this would result in memory corruption.

Regardless of the size of the overallocation (even tried allocating a full 256 chars extra!), this would reliably result in heap corruption even when using a vector<unsigned char> as the FILE_FS_VOLUME_INFORMATION buffer.

It seems that the kernel places some sort of write protection on the buffer somehow that was resulting in corruption regardless of the size. Copying the first VolumeLableLength bytes to a second buffer, then post-pending _T('\0') solved the problem. Not sure how/why Windows was making the buffer that I allocated and passed in as a parameter readonly or if it was storing after the FILE_FS_VOLUME_INFORMATION struct (which should end with the character array!), but simply not modifying any data in the buffer that I passed did the trick.... which is crazy because it only happens (consistently and 100% reproducible) on certain machines.

At any rate: problem solved *phew*!

Computer Guru
looks like a bug to me: you're using sizeof(TCHAR) but VolumeLabel is WCHAR, there is no ANSI version. Then your assignment writes off the end of the array.
Ben Voigt
Nope. They're both wide characters (UNICODE is defined), and the NULL *is* placed in the correct location (verified via logging and memory dumps).
Computer Guru