views:

152

answers:

1

I'm trying to write a simple RTF document pretty much from scratch in Java, and I'm trying to embed JPEGs in the document. Here's an example of a JPEG (a 2x2-pixel JPEG consisting of three white pixels and a black pixel in the upper left, if you're curious) embedded in an RTF document (generated by WordPad, which converted the JPEG to WMF):

{\pict\wmetafile8\picw53\pich53\picwgoal30\pichgoal30 
0100090000036e00000000004500000000000400000003010800050000000b0200000000050000
000c0202000200030000001e000400000007010400040000000701040045000000410b2000cc00
020002000000000002000200000000002800000002000000020000000100040000000000000000
000000000000000000000000000000000000000000ffffff00fefefe0000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000
0000001202af0801010000040000002701ffff030000000000
}

I've been reading the RTF specification, and it looks like you can specify that the image is a JPEG, but since WordPad always converts images to WMF, I can't see an example of an embedded JPEG. So I may also end up needing to transcode from JPEG to WMF or something....

But basically, I'm looking for how to generate the binary or hexadecimal (Spec, p.148: "These pictures can be in hexadecimal (the default) or binary format.") form of a JPEG given a file URL.

Thanks!


EDIT: I have the stream stuff working all right, I think, but still don't understand exactly how to encode it, because whatever I'm doing, it's not RTF-readable. E.g., the above picture instead comes out as:

ffd8ffe00104a464946011106006000ffdb0430211211222222223533333644357677767789b988a877adaabcccc79efdcebcccffdb04312223336336c878ccccccccccccccccccccccccccccccccccccccccccccccccccffc0011802023122021113111ffc401f001511111100000000123456789abffc40b5100213324355440017d123041151221314161351617227114328191a182342b1c11552d1f024336272829a161718191a25262728292a3435363738393a434445464748494a535455565758595a636465666768696a737475767778797a838485868788898a92939495969798999aa2a3a4a5a6a7a8a9aab2b3b4b5b6b7b8b9bac2c3c4c5c6c7c8c9cad2d3d4d5d6d7d8d9dae1e2e3e4e5e6e7e8e9eaf1f2f3f4f5f6f7f8f9faffc401f103111111111000000123456789abffc40b51102124434754401277012311452131612415176171132232818144291a1b1c19233352f0156272d1a162434e125f11718191a262728292a35363738393a434445464748494a535455565758595a636465666768696a737475767778797a82838485868788898a92939495969798999aa2a3a4a5a6a7a8a9aab2b3b4b5b6b7b8b9bac2c3c4c5c6c7c8c9cad2d3d4d5d6d7d8d9dae2e3e4e5e6e7e8e9eaf2f3f4f5f6f7f8f9faffda0c31021131103f0fdecf09f84f4af178574cd0b42d334fd1744d16d22bd3f4fb0b74b6b5bb78902450c512091c688aaaa8a0500014514507ffd9

This PHP library would do the trick, so I'm trying to port the relevant portion to Java. Here is is:

$imageData = file_get_contents($this->_file);
$size = filesize($this->_file);

$hexString = '';

for ($i = 0; $i < $size; $i++) {
    $hex = dechex(ord($imageData{$i}));

    if (strlen($hex) == 1) {
        $hex = '0' . $hex;
    }

    $hexString .= $hex;
}

return $hexString;

But I don't know what the Java analogue to dechex(ord($imageData{$i})) is. :( I got only as far as the Integer.toHexString() function, which takes care of the dechex part....

Thanks all. :)

+1  A: 

Given a file URL for any file you can get the corresponding bytes by doing (exception handling omitted for brevity)...

int BUF_SIZE = 512;
URL fileURL = new URL("http://www.somewhere.com/someurl.jpg");
InputStream inputStream = fileURL.openStream();
byte [] smallBuffer = new byte[BUF_SIZE];
ByteArrayOutputStream largeBuffer = new ByteArrayOutputStream();
int numRead = BUF_SIZE;
while(numRead == BUF_SIZE) {
    numRead = inputStream.read(smallBuffer,0,BUF_SIZE);
    if(numRead > 0) {
        largeBuffer.write(smallBuffer,0,BUF_SIZE);
    }
}
byte [] bytes = largeBuffer.toByteArray();

I'm looking at your PHP snippet now and realizing that RTF is a bizarre specification! It looks like each byte of the image is encoded as 2 hex digits (which doubles the size of the image for no apparent reason). The the entire thing is stored in raw ASCII encoding. So, you'll want to do...

StringBuilder hexStringBuilder = new StringBuilder(bytes.length * 2);
for(byte imageByte : bytes) {
    String hexByteString = Integer.toHexString(0x000000FF & (int)imageByte);
    if(hexByteString .size() == 1) {
        hexByteString = "0" + hexByteString ;
    }
    hexStringBuilder.append(hexByteString);
}
String hexString = hexStringBuilder.toString();
byte [] hexBytes = hexString.getBytes("UTF-8"); //Could also use US-ASCII

EDIT: Updated code sample to pad 0's on the hex bytes

EDIT: negative bytes were getting logically right shifted when converted to ints >_<

Pace
Thanks Pace, I'm so close now - any idea about the dec/hex/ASCII/etc. encoding bit?
Toph
Hi again Pace - thanks so much for your updates. The *2 in (bytes.length * 2) doesn't seem to change the hexString output, oddly. And the 0-padding seems overeager.I put together a separate solution (just reading locally for now)--FileInputStream oStream = new FileInputStream(imagePath);for(int i = 0; i<oStream.available(); i++) { Integer imageBytes = (Integer)oStream.read(); String imageHex = imageBytes.toHexString(imageBytes); if(imageHex.length() == 1) imageHex = "0" + imageHex; imageHexString += imageHex;}oStream.close();--and it actually sort of works, but...
Toph
OK, I didn't realize comments didn't preserve line breaks, haha. Point is, the images are only readable in Word, not, say, Win7 WordPad or OS X TextEdit, which is too bad.As for "over-eager" padding, check out the preponderance of f's in yours: http://pastebin.com/TdZajtVq (does that make sense? I haven't got a clue, haha...)Anyway, I'm going to check out other options (e.g. http://www.xmlmind.com/foconverter/what_is_xfc.html) since compatibility seems to be an issue....Thanks again!
Toph
The preponderance of f's was because -1 as a byte is FF but -1 as an int is FFFFFFFF so every negative byte value was getting 6 extra F's. I edited the above code to fix that. Note the line Integer.toHexString(0x000000FF
Pace
However, I would definitely recommend using an external library if you plan to use this conversion for something. If you were doing it as a learning exercise then I think you're close.
Pace
Huh, interesting. All right, thanks!
Toph