views:

699

answers:

5

Hi, I'll give you a little bit of background first as to why I'm asking this question:

I am currently working in a stricly-regulated industry and as such our code is quite carefully looked-over by official test houses. These test houses expect to be able to build the code and generate an .exe or .dll which is EXACTLY the same each and every time (without changing any code obviously!). They check the MD5 and the SHA1 of the executables that they create to ensure this.

Up until this point I have predominantly been coding in C++, where (after a few project setting tweaks) I managed to get the projects to rebuild consistantly to the same MD5/SHA1. I am now using C# in a project and am having great difficulty getting the MD5's to match after a rebuild. I am aware that there are "Time-Stamps" in the PE header of the file, and they have been cleared to 0. I am also aware that there is a GUID for the .exe, which again has been cleared to 00 00 00... etc. However the files still don't match.

I'm using CFF Explorer to view and edit the PE Header to remove the time and date stamps. After using a binary comparison tool there are only 2 blocks of bytes in the .exe's that are different (both very small).

One of the inconsistant blocks appears just before some binary code, which in ASCII details the path of the *Project*\obj\Release\xxx.pdb file.

EDIT: This is now known to be the GUID of the *.pdb file, however I still don't know if I can modify it without causing any errors!?

The other block appears in the middle of what looks to be function names, ie. (a typical section) AssemblyName.GetName.Version.get_Version.System.IO.Ports.SerialPort.Parity.Byte.<PrivateImplementationDetails>{

then the different code block:

4A134ACE-D6A0-461B-A47C-3A4232D90816

followed by:

"}.ValueType.__StaticArrayInitTypeSize=7.$$method0x60000ab-1.RuntimeFieldHandle.InitializeArray`... etc..

Any ideas or suggestions would be most welcome!

+2  A: 

I'm not sure about this, but just a thought: are you using any anonymous types for which the compiler might generate names behind the scenes, which might be different each time the compiler runs? Just a possibility which occurred to me. Probably one for Jon Skeet ;-)

Update: You could perhaps also use Reflector addins for comparison and disassembly.

Vinay Sajip
Nope, not using any anonymous types in the application, although it was a good thought! ;)
Siyfion
As for Reflector for comparison, unfortunately it's not me that chooses the tool they use to compare, it has to be an exact MD5 match :(
Siyfion
+1  A: 

Take a look at the answers from this question. Especially on the external link provided in the 3rd one.

EDIT:

I actually wantetd to link to this article.

Frank Bollack
That's a link to a diff tool for comparing binaries.
Vinay Sajip
I can't actually find a version of Dumpbin.exe to use anywhere, but aside from that it seems as though the only differences should be the Date and Time (which i've cleared to 0), the GUID (which I've cleared to 00 00.. etc), the assembly version (which should be the same?) and a strong hash (which should be the same if everything else is!). So I think the next step is for me to use Ildasm.exe to try to figure out if any of the MSIL code differs!?
Siyfion
Sorry for the confusion. I edited my post to point to the right article. Please look there for mor information.
Frank Bollack
Ah ha, ok well from that link it seems as though one of the blocks that is different is the *.pdb file's GUID. Although I still can't find a way to set that to a specific value? The other difference, I'm still looking at.
Siyfion
+2  A: 

Regarding the PDB GUID problem, if you specify that a PDB shouldn't be generated at compilation for Release builds, does the binary still contain the PDB's file system GUID?

To disable PDB generation:

  1. Right-click your project in Solution Explorer and select Properties.
  2. From the menu along the left, select Build.
  3. Ensure that the Configuration selection is Release (you'll still want a PDB for debugging).
  4. Click the Advanced button in the bottom right.
  5. Under Output / Debug Info, select None.

If you're building from the console, use /debug- to get the same result.

fatcat1111
I'll give that a go tomorrow...
Siyfion
I am only using Visual C# Express at the moment for evaluation purposes, do you know if I can switch the *.pdb generation off in this version?
Siyfion
You can. I'll add instructions, since the option is kind of buried.
fatcat1111
+3  A: 

You should be able to get rid of the debug GUID by disabling PDB generation. If not, setting the GUID to zeroes is fine - only debuggers look at that section (you won't be able to debug the assembly anymore, but it should still run fine).

The PrivateImplementationDetails are a bit more difficult - these are internal helper classes generated by the compiler for certain language constructs (array initializers, switch statements using strings, etc.). Because they are only used internally, the class name doesn't really matter, so you could just assign a running number to them.

I would do this by going through the #Strings metadata stream and replacing all strings of the form "<PrivateImplementationDetails>{GUID}" with "<PrivateImplementationDetails>{running number, padded to same length as a GUID}".

The #Strings metadata stream is simply the list of strings used by the metadata, encoded in UTF-8 and separated by \0; so finding and replacing the names should be easy once you know where the #Strings stream is inside the executable file.

Unfortunately the "metadata stream headers" containing this information are quite buried inside the file format. You'll have to start at the NT Optional Header, find the pointer to the CLI Runtime Header, resolve it to a file position using the PE section table (it's an RVA, but you need a position inside the file), then go to the metadata root and read the stream headers.

I have code that can find the file positions for all of this (I'm writing a tool to inject new managed resources in an assembly), but it's not ready to be published yet.

Daniel
Ok, well if the GUID can be got rid of by disabling PDB generation (or clearing it to all 0's) that's difference number 1 solved.Difference number 2 seems to be a lot harder to solve; are you saying that I have to through the IL and change the value in there? Or access the compliled *.exe directly and manually set the bytes?
Siyfion
Well, due to my work on the resource injection tool, I would have chosen the *.exe patching solution.Doing an ILDASM/ILASM roundtrip to replace the class name should also be possible.
Daniel
Any chance of you releasing this patching tool in the near future? ;)
Siyfion
Hi Daniel, I was just wondering if you have any news on your little tool that might make this easier?
Siyfion
No news. I didn't have time to continue writing that tool. All I have currently is a proof-of-concept that sometimes produces corrupt assemblies.
Daniel
A: 

You said that after a few project tweaks you were able to get C++ apps to compile repeatably to the same SHA1/MD5 values. I'm in the same boat as you in being in an industry with a third party test lab that needs to rebuild exactly the same executables repeatably.

In researching how to make this happen in VS2005, I came across your post here. Could you share the project tweaks you did to make the C++ apps build to the same SHA1/MD5 values consistently? It would be of great help to myself and perhaps any others that share this requirement.

Tom
Of course, although this is off the top of my head!In release mode do the following:- Disable the generation of the Manifest file (Solution Properties->Linker->Manifest File)OR- Change the manifest settings (Solution Properties->Manifest Tool->Input and Output) so that "Embed Manifest" is set to "No".Also make sure all debugging info is turned off for the release build. Then you just need to remove the TimeAndDateStamp from the PE file header. (Try googling "CFF Explorer")
Siyfion
Ack.. manual manipulation of the file header? Talk about a human error disaster waiting to happen. Do you know of a command line utility that can do this so that it can be automated, reliable and repeatable?
Tom