ansaurus

Question

Answer 1

+3 A:

Try this:

# select length(E'aa\\000aa'::bytea);
 length
--------
      5

Updated: Why the original didn't work? First, understand the difference between one slash and two:

pg=# select E'aa\055aa', length(E'aa\055aa') ;
 ?column? | length
----------+--------
 aa-aa    |      5
(1 row)

pg=# select E'aa\\055aa', length(E'aa\\055aa') ;
 ?column? | length
----------+--------
 aa\055aa |      8

In the first case, I'm writing a literal string, 4 characters unescaped('a') and one escaped. The slash is consumed by the parser in a first pass, which converts the full \055 to a single char ('-' in this case).

In the second case, the first slash just escapes the second, the pair \\ is translated by the parser to a single \ and the 055 is seen as three characters.

Now, when converting a text to a bytea, escape characters (in a already parsed or produced text) are parsed/interpreted again! (Yes, this is confusing).

So, when I write

 select E'aa\000aa'::bytea;

in the first parsing, the literal E'aa\000aa' is converted to an internal text with a null character in the third position (and depending on your postgresql version, the null character is interpreted as an EOS, and the text is assumed to be of length two - or in other versions an illegal string error is thrown).

Instead, when I write

 select E'aa\\000aa'::bytea;

in the first parsing, the literal string "aa\000aa" (eight characters) is seen, and is asigned to a text; then in the casting to bytea, it is parsed again, and the sequence of characters '\000' is interpreted as a null byte.

IMO postgresql kind of sucks here.

leonbloy 2010-06-07 17:55:57

Ok. Now why does this work?

Thanatos 2010-06-07 18:50:38

Ah - I think I see: Postgres strings can't represent binary data, so they use an escaped form, where a null is \000, which we must type as '\\000'. I see now the PQescapeBytea and PQunescapeBytea - I'm assuming that if I select a bytea column, I'll get this intermediate escaped form back, which I must then pass through PQunescapeBytea. This all makes more sense now.

Thanatos 2010-06-07 19:18:09

Yes, thanks for your update. That confirms what I was thinking, and where you marked "(Yes, this is confusing.)" I wholeheartedly agree!

Thanatos 2010-06-07 19:42:54

The explanatory update makes this a great answer.

araqnid 2010-06-07 20:19:29

Sometimes it's simpler to specify base64 input for a binary literal, e.g. `decode('AAAA','base64')` instead of `E'\\000\\000\\000'::bytea`. 9.0 supports a simpler hex encoding for bytea values which is rather clearer (E'\\x000000' and no slavish ::bytea suffix needed)

araqnid 2010-06-07 20:22:47

ansaurus

tags:

views:

answers:

bytea type & nulls, Postgres

related questions