tags:

views:

295

answers:

1

I have datatype stored as blob (XML) in Oracle DB. I am retrieving this column and converting to byte[] and then to string. I do some string operations and converting it to UTF-8 format and inserting back into the DB. Some special characters are inserted as junk characters. I do not really know what I am doing wrong? Any idea/ help would be highly appreciated.

Here is the source code.

     // DB COnnection
     // Get the resultset

     Blob data = (Blob) rs.getBlob(3);
 byte[] bdata = data.getBytes(1,(int)data.length());  

 // Converting to String and doing operation
 s = new String(bdata);   
 // String operation


 // Before inserting into DB, converting to UTF-8 format.
 byte[] dataAsByteArray = s.getBytes("UTF8");  
 updateStmt.setBinaryStream(1,
          new  ByteArrayInputStream(dataAsByteArray),dataAsByteArray.length);

 commit();
+2  A: 

Is the BLOB initially in UTF-8 format ? If it's really in XML, I would use the XML APIs to read and interpret the header. The header (if present) may specify the character encoding. See DocumentBuilder, and read the BLOB as an InputStream to avoid any byte-to-char conversions on your side.

Note in the above, when you convert it to a String, you don't specify the byte-to-char encoding there.

 // Converting to String and doing operation
 s = new String(bdata);

The above uses the default charset that the JVM is running with (doc here). So I think there's some possibility for error in the above. I would confirm the character encoding of the BLOB, and enforce the byte-to-char encoding in the String conversion.

Additionally, I can't remember if UTF8 is valid in addition to UTF-8 for specifying an encoding. I guess it may be since I'd expect an exception otherwise, but perhaps worth checking.

Brian Agnew
You are right. I should use XML API to read and interpret the header.I am using Javolution library and it has class called UTF8StreamReader. I thought of populating some dummy/fake data for testing purpose using the above code in a faster way but it seems I have to stick with original way :)
Anand
@Brian Agnew - canonical names for Sun Java 6 encodings are listed here: http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
McDowell
@McDowell - thanks for that. I couldn't find that last night
Brian Agnew