tags:

views:

565

answers:

2

I have input data coming from a flat file which has english, japanese, chinese characters in one column. I am loading these values in a staging table column whose schema definition is VARCHAR2(250 CHAR), the main table column has definition VARCHAR2(250) WHICH i can not change. So, i am doing a SUBSTR on this column. After loading the table when i did a

SELECT * FROM TABLE

...I get this error :

ORA-29275: partial multibyte character

If i select other columns then no issues.

A: 

Using substr will behave differently depending on the database character set. I assume from your description that your DB character set is not one of the Unicode variants, and you must truncate the varchar2(250 char) data to 250 BYTES or less. This is dangerous because it can stop in the middle of a 2-byte character, resulting in the message you got. You should look at the documentation for substrc(), which will calculate its length based on characters and not bytes.

It might help if you explain more why you are required to throw away part of the data.

Jim Garrison
the description column in source system allowed creating strings which are very long, and the data file is extracted from source system, but the target system does not allow more than 250 bytes. chinese characters are taking more than one byte, which is creating the problem, the ETL tool i am using to load is informatica, it does not have full oracle functions supported.I will check if substrc() is available.
Manish
+1  A: 

Hi Manish,

you should use SUBSTRB when you copy your data from your 250 CHAR column to your 250 byte column. This function will only output whole characters (you won't get incomplete unicode characters):

SQL> select substrb('中华人', 1, 9) ch9,
  2         substrb('中华人', 1, 8) ch8,
  3         substrb('中华人', 1, 7) ch7,
  4         substrb('中华人', 1, 6) ch6,
  5         substrb('中华人', 1, 5) ch5
  6    FROM dual;

CH9       CH8      CH7     CH6    CH5
--------- -------- ------- ------ -----
中华人       中华       中华      中华     中
Vincent Malgrat
Shouldn't that be substrc? substrb copies in bytes and can truncate an extended character.
Jim Garrison
@Jim: the column will have to be truncated, a `250 CHAR` column may possibly have as much as 1000 bytes of data, it won't fit in a `250 byte` column. The function will not cut a UTF-8 character in half however: the result will always be a legal UTF-8 string.
Vincent Malgrat