views:

1177

answers:

2

I am changing all varchar columns in our firebird database to UTF8 however I don't understand the difference in varchar size.

For example, with the charset and collation set to nothing, we can set the varchar size to 255, if we set the charset and collation to UTF8, when we set the varchar to 255, it reads different values.

What would be the equivalent varchar size for varchar(255) in UTF8?

+2  A: 

Using the UTF8 character set for VARCHAR(N) fields needs to reserve enough space for any N UTF8 characters. The length of one such character may be between 1 and 4, so the only safe thing is to allow for N characters of length 4 each, meaning there needs to be space for 200 bytes to store the 50 characters (worst-case condition).

You could use the FlameRobin tool to have a look at the internals. Let's assume you have a table

CREATE TABLE "TableÅÄÖåäö"
(
  "ColÅÄÖåäö" Varchar(50)
);

in a database with default character set UTF8. (Note that you need at least Firebird 2.0 for this.)

The system tables store information about all relations and their fields. In the system table RDB$RELATION_FIELDS there is a record for this field, which has (for example) RDB$1 as the RDB$FIELD_SOURCE. Looking into RDB$FIELDS there is one record for RDB$1, and its value of RDB$FIELD_LENGTH is 200.

So to answer your question: To have a UTF8 column with space for 255 characters you enter it as VARCHAR(255), but in the database it will have a size of 1220 bytes.

mghie
A: 

The VARCHAR(n) datatype contains text of varying length, up to a maximum of n characters. The maximum size is 32,767 bytes, which can be 10,992 to 32,767 characters, depending on the character size (1..3 bytes). You must supply n; there is no default to 1.

Firebird converts from variable-length character data to fixed-length character data by adding spaces to the value in the varying column until the column reaches its maximum length n. In the reverse conversion, trailing blanks are removed from the text.

The main advantage of using the VARCHAR(n) datatype are that it saves memory space during the execution of PSQL programs.

msi