tags:

views:

546

answers:

4

Hi, I need to implement the MD5 checksum to verify a MD5 checksum in a XML file including all XML tags and which has received from our client. The length of the received MD5 checksum is 32 byte hexadecimal digits.

We need set MD5 Checksum field should be 0 in received XML file prior to checksum calculation and we have to indepandantly calculate and verify the MD5 checksum value in a received XML file.

Our application is implemented in C. Please assist me on how to implement this.

Thanks

+1  A: 

There are public-domain implementations of MD5 that you should use, instead of writing your own. I hear that Colin Plumb's version is widely used.

Chris Jester-Young
+3  A: 

This directly depends on the library used for XML parsing. This is tricky however, because you can't embed the MD5 in the XML file itself, for after embedding the checksum inside, unless you do the checksum only from the specific elements. As I understand you receive the MD5 independently? Is it calculated from the whole file, or only the tags/content?

Exact solutions depend on the code used.

Based on your comment you need to do the following steps:

  • load the xml file (possibly even as plain-text) read the MD5
  • substitute the MD5 in the file with zero, write the file down (or better to memory)
  • run MD5 on the pure file data and compare it with the value stored before
Kornel Kisielewicz
Yes I want to treat entire XML file as one big file. the Client MD5 checksum aslo available inside that XML file. I need to replace that that MD5 checksum into 0 prior to my calculation.
Thi
"This is tricky however, because you can't embed the MD5 in the XML file itself". Sounds to me as though the procedure to construct is "build document with checksum field set to zero; MD5sum it; replace 0 with the value". To validate is "find checksum value and replace with 0; MD5sum it; compare result with the value removed".
Steve Jessop
A: 

Don't reinvent the wheel, use a proven existing solution: http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html

Incidentally that was the first link that came up when I googled "md5 c implementation".

j_random_hacker
A: 

This is rather nasty. The approach suggested seems to imply you need to parse the XML document into something like a DOM tree, find the MD5 checksum and store it for future reference. Then you would replace the checksum with 0 before re-serializing the document and calculating it's MD5 hash. This all sounds doable but potentially tricky. The major difficulty I see is that your new serialization of the document may not be the same as the original one and irrelevant (to XML) differences like the use of single or double quotes around attribute values, added line breaks or even a different encoding will cause the hashs to differ. If you go down this route you'll need to make sure your app and the procedure used to create the document in the first place make the same choices. For this sort of problem canonical XML is the standard solution (http://www.w3.org/TR/xml-c14n).

However, I would do something different. With any luck it should be quite easy to write a regular expression to locate the MD5 hash in the file and replace it with 0. You can then use this to grab the hash and replace with 0 it in the XML file before recalculating the hash. This sidesteps all the possible issues with parsing, changing and re-serializing the XML document. To illustrate I'm going to assume the hash '33d4046bea07e89134aecfcaf7e73015' lives in the XML file like this:

<docRoot xmlns='some-irrelevant-uri>
  <myData>Blar blar</myData>
  <myExtraData number='1'/>
  <docHash MD5='33d4046bea07e89134aecfcaf7e73015' />
  <evenMoreOfMyData number='34'/>
</docRoot>

(which I've called hash.xml), that the MD5 should be replaced by 32 zeros (so the hash is correct) and illustrate the procedure on a shell command line using perl, md5 and bash. (Hopefully translating this into C won't be too hard given the existence of regular expression and hashing libraries.)

Breaking down the problem, you first need to be able to find the hash that is in the file:

perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml 

(this works by looking for the start of the MD5 attribute of the docHash element, allowing for possible other attributes, and then grabbing the next 32 hex characters. If it finds them it bungs them in the magic $_ variable, if not it sets $_ to be empty, then the value of $_ gets printed for each line. This results in the string "33d4046bea07e89134aecfcaf7e73015" being printed.)

Then you need to calculate the hash of the the file with the has replaced with zeros:

perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5

(where the regular expression is almost the same, but this time the hex characters are replaced by zeros and the whole file is printed. Then the MD5 of this is calculated by piping the result through an md5 hashing program. Putting this together with a bit of bash gives:

if [ `perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml` = `perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5` ] ; then echo OK; else echo ERROR; fi

which executes those two small commands, compares the output and prints "OK" if the outputs match or "ERROR" if they don't. Obviously this is just a simple prototype, and is in the wrong language, I think it illustrates the most straight forward solution.

Incidentally, why do you put the hash inside the XML document? As far as I can see it doesn't have any advantage compared to passing the hash along on a side channel (even something as simple as in a second file called documentname.md5) and makes the hash validation more difficult.

Andrew Walker