views:

105

answers:

4

Want to truncate error string so it for sure fits into Oracle table column VARCHAR2(2000 BYTE)

Design forces:

  1. The main goal is to fit to the table column.

  2. 90-95% of string text is exception message and stacktraces. But it could contain some customer name with french, turkish characters which I am willing to disregard and see as ? or whatever.

  3. I want code to be dead simple. database encoding can change. Chinese characters can be introduced but I want code to work anyway.

Should be "dead simple" but it got me pondering for a while.

What are suggestions?

Probably the best options is to convert to ascii. But I came up variant which is not nice but probably works.

public static String trimStringToBytes(StringBuilder builder, int maximumBytes)
{
    String truncatedString = builder.length() > maximumBytes ?  builder.substring(0, maximumBytes) : builder.toString();

    byte[] bytes;
    String asciiCharsetName = "US-ASCII";
    try
    {
        bytes = truncatedString.getBytes(asciiCharsetName);
    }
    catch (UnsupportedEncodingException e)
    {
        //not really possible as JVM should support always US-ASCII but anyway
        int worstCaseScenarioBytesPerCharacter = 4;
        bytes = truncatedString.substring(0, truncatedString.length() / worstCaseScenarioBytesPerCharacter).getBytes();
    }

    return new String(bytes, 0, bytes.length > maximumBytes ? maximumBytes : bytes.length);
}
A: 

I think your method should work, but intentionally losing all non-ASCII characters is pretty nasty. If you ever have messages in Chinese, they will be replaced completely with ???

IMO the best thing would be to use SQL functions in the insert query to do the trimming. That makes sure that you never exceed the column size AND lose as little data as possible. It's also much less error-prone as trying to do encoding-aware trimming in the Java code.

Michael Borgwardt
A: 

Is it possible to change the column to VARCHAR2(2000 CHAR)? That would eliminate the encoding issue entirely.

Adam Hawkes
+1  A: 

You won't need to truncate the string if you use a CLOB.

Chris B
CLOB is *not* a substitute for VARCHAR2, the two types have wildly different characteristics, and are handled differently by both Oracle and JDBC.
skaffman
Very true, but it sounds like the OP wants to do something (storage of a large chunk of text, with nothing said about indexability) that is better suited to a CLOB than a VARCHAR. +1
kdgregory
A: 

I would recommend not doing this in Java, but instead in SQL when you perform the INSERT.

For example, in Oracle you could use the SUBSTR function to trim, using connection.prepareStatement:

insert into mytable (col1, col2) values (?, substr(?, 2000));

Then set your col1 and col2 values on the PreparedStatement, and Oracle should get the first 2000 characters/bytes/whatever-it-is-that-oracle-does of the value, and set that.

It might even be worth doing this with a stored procedure, passing in the entire String as a VARCHAR2 argument to the procedure, which then trims it and inserts the row. No need for the application to get involved with the underlying storage semantics.

skaffman
if the limit count is bytes, use `SUBSTRB` instead.
Carlos Heuberger