views:

386

answers:

3

I need to find a way to find a way to find the hash for the base64 encoded data in the XML node //note/resource/data, or somehow otherwise match it to the hash value in the node //note/content/en-note//en-media@hash

See below for the full XML file

Please suggest a way to {obtain|match} using XSLT

4aaafc3e14314027bb1d89cf7d59a06c

{from|with}

R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==

This sample XML file has obviously been trimmed for brevity/simplicity. The actual may contain > 1 image per note, therefore the need to obtain/match hashes.

The XML file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export.dtd"&gt;
<en-export export-date="20091029T063411Z" application="Evernote/Windows" version="3.0">

<note>
    <title>A title here</title>
    <content><![CDATA[
     <?xml version="1.0" encoding="UTF-8"?>
     <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml.dtd"&gt;
     <en-note bgcolor="#FFFFFF">
      <p>Some text here (followed by the picture)
      <p><en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="A picture"/></p>
      <p>Some more text here (preceded by the picture)
     </en-note>
    ]]></content>
    <created>20090925T063154Z</created>
    <note-attributes>
     <author/>
    </note-attributes>
    <resource>
     <data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
     </data>
     <mime>image/gif</mime>
     <resource-attributes>
      <file-name>clip_image001.gif</file-name>
     </resource-attributes>
    </resource>
</note>

</en-export>


Implemented solution

Using concept of the solution suggested by Jackem. The main difference is that I avoid creating my own Java class (and creating an extra dependency). I do the processing within the XSLT, since it's straight forward enough, only referencing external dependencies that come with the basic Java libraries.
Jackem's solution is more correct because it doesn't lose the leading zero in some hashes, however I found that it was much easier to take care of this elsewhere using li'l basic hackery.

<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    ...
    xmlns:md5="java.security.MessageDigest"
    xmlns:bigint="java.math.BigInteger"
    exclude-result-prefixes="md5 bigint">
...
<xsl:for-each select="resource">
    <xsl:variable name="md5inst" select="md5:getInstance('MD5')" />
    <xsl:value-of select="md5:update($md5inst, $b64bin)" />
    <xsl:variable name="imgmd5bytes" select="md5:digest($md5inst)" />
    <xsl:variable name="imgmd5bigint" select="bigint:new(1, $imgmd5bytes)" />
    <xsl:variable name="imgmd5str" select="bigint:toString($imgmd5bigint, 16)" />
    <!-- NOTE: $imgmd5str loses the leading zero from imgmd5bytes (if there is one) -->
</xsl:for-each>
...

P.S. see sibling question for my implementation of of the base64-->image file conversion


This question is a subquestion of another question I have asked previously.

+1  A: 
  • Download some freeware Base64 decoder like this one or use some source code from the web for this
  • Output file is some_file.gif, 268 bytes, a folder icon
  • Generate the MD5 checksum of that file using md5sum or again some source code from the web

Output for me:

4aaafc3e14314027bb1d89cf7d59a06c

That's what you wanted, isn't it? It will be tricky (if not impossible, and if you ask me, definitely not worth the effort) to do all this in XSLT, but at least you now have got the information that this hash was created using MD5 on the GIF file.

schnaader
I do acknowledge that it would be easier to do withotu XSLT, but I want to do other things as well with XSLT, ergo this question. I have already been able to find out how to decode base64 in XSLT, but now need to find out a way to obtain th md5sum, using XSLT of course.
bguiz
The only thing I found is a message in a mailing list of exslt from 2004 where someone tried to develop a crypto namespace that could generate MD5 and other checksums, but it seems this is a dead-end - see here: http://osdir.com/ml/text.xml.xslt.extensions/2004-05/msg00002.html
schnaader
+1  A: 

The 4aaaf... is the MD5 of the binary data you get when you decode the base64-encoded data. I don't think you have any choice but to decode the contents of <data> element and run it through an MD5 implementation, which is obviously outside the scope of an XSL transformation. Presumably, the result of the XSLT will be processed by some other code, which can extract and verify the images.

Tim Sylvester
+2  A: 

For your related question about doing the base64 decoding in XSLT, you have accepted an answer which uses Saxon and Java extensions. So I assume you are OK with using those.

In that case, you can create an extension in Java for computing the MD5 sum:

package com.stackoverflow.q1684963;

import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class MD5Sum {
    public static String calc(byte[] data) throws NoSuchAlgorithmException {
        MessageDigest md5 = MessageDigest.getInstance("MD5");
        byte[] digest = md5.digest(data);
        BigInteger digestValue = new BigInteger(1, digest);
        return String.format("%032x", digestValue);
    } 
}

From your XSLT 2.0 stylesheet which you run with Saxon, you can then just call that extension. Assuming you already have the base64-decoded data (for example from extension function saxon:base64Binary-to-octets as in the linked answer) in variable data:

<xsl:value-of xmlns:md5sum="com.stackoverflow.q1684963.MD5Sum"
              select="md5sum:calc($data)"/>
Jukka Matilainen
Thank you! I have managed to get it to work using this same concept, except that I don't create a class myself, but just call these methods from within the XSLT. I'll post my impl. soln shortly...
bguiz
Credit where credit is due: The Java code is adapted from various answers to question http://stackoverflow.com/questions/332079/in-java-how-do-i-convert-a-byte-array-to-a-string-of-hex-digits-while-keeping-le
Jukka Matilainen