views:

44

answers:

1

I'm running an experiment on Berkeley DBs. I'm simply removing the contents from DB a and reinserting the key-value pairs into DB b. However, I am getting Wide character errors when inserting key-value pairs into this DB b. Help?

+6  A: 

BerkeleyDB stores bytes ("octets"). Perl strings are made of Perl characters. In order to store Perl characters in the octet-based store, you have to convert the characters to bytes. This is called encoding, as in character-encoding.

The warning you get indicates that Perl is doing the conversion for you, and is guessing about what character encoding you want to use. Since it will probably guess wrong, it's best to explicitly say. The Encode module allows you to do that.

Instead of writing:

$db->store( key => $value );

You should instead write:

use Encode qw(encode);

$db->store( key => encode('utf-8', $value) );

And on the way out:

use Encode qw(decode);

$db->get($key, $octets); # BDB returns the result via the arg list.  C programmers...
my $value = decode('utf-8', $octets);

This is true of more than just BDB; whenever you are communicating across the network, via files, via the terminal, or pretty much anything, you must be sure to encode characters to octets on the way out, and decode octets to characters on the way in. Otherwise, your program will not work.

jrockway