views:

639

answers:

6

Just what the title says.

$ ./configure --help | grep -i ucs
  --enable-unicode[=ucs[24]]

Searching the official documentation, I found this:

sys.maxunicode: An integer giving the largest supported code point for a Unicode character. The value of this depends on the configuration option that specifies whether Unicode characters are stored as UCS-2 or UCS-4.

What is not clear here is - which value(s) correspond UCS-2 and UCS-4.

The code is expected to work on Python 2.6 and above.

A: 

I'm guessing that 65535 is for UCS-2 and 4294967295 for UCS-4

mjv
A: 

I'm currently using Python 2.6, compiled with UCS-2, and my value of sys.maxunicode is 65535. This makes sense -- UCS-2 is 2 bytes or 16 bits, and 65535 = 216-1. UCS-4 is 4 bytes or 32 bits, so I imagine the value of sys.maxunicode in Python UCS-4 is 232-1.

mipadi
+5  A: 

It's 0xFFFF (or 65535) for UCS-2, and 0x10FFFF (or 1114111) for UCS-4:

Py_UNICODE
PyUnicode_GetMax(void)
{
#ifdef Py_UNICODE_WIDE
    return 0x10FFFF;
#else
    /* This is actually an illegal character, so it should
       not be passed to unichr. */
    return 0xFFFF;
#endif
}

The maximum character in UCS-4 mode is defined by the maxmimum value representable in UTF-16.

Martin v. Löwis
+4  A: 

When built with --enable-unicode=ucs4:

>>> import sys
>>> print sys.maxunicode
1114111

When built with --enable-unicode=ucs2:

>>> import sys
>>> print sys.maxunicode
65535
Stef
A: 

I had this same issue once. I documented it for myself on my wiki at

http://arcoleo.org/dsawiki/Wiki.jsp?page=Python%20UTF%20-%20UCS2%20or%20UCS4

I wrote -

import sys
sys.maxunicode > 65536 and 'UCS4' or 'UCS2'
Dave