views:

240

answers:

2

The following does not work in this particular case, complaining that whatever you give it is not a character.

(handler-bind ((sb-int:character-coding-error
                 #'(lambda (c)
                      (invoke-restart 'use-value #\?))))
    (sb-ext:octets-to-string *euc-jp* :external-format :euc-jp))

Where *euc-jp* is a variable containing binary of EUC-JP encoded text.

I have tried #\KATAKANA_LETTER_NI as well, instead of #\? and also just "". Nothing has worked so far.

Any help would be greatly appreciated!

EDIT: To reproduce *EUC-JP*, fetch http://blogs.yahoo.co.jp/akira_w0325/27287392.html using drakma.

A: 

It works for me:

CL-USER> (handler-bind ((sb-int:character-coding-error
                         #'(lambda (c)
                             (declare (ignore c))
                             (invoke-restart 'use-value #\?))))
           (sb-ext:octets-to-string (make-array '(16)
                                                :element-type '(unsigned-byte 8)
                                                :initial-contents '#(181 65 217 66 164 67 181 217 164 223 164 222 164 185 161 163))
                                    :external-format :euc-jp))
"?A?B?C休みます。"

Might *euc-jp* be something other than a (vector (unsigned-byte 8))?

Matthias Benkard
That works for me too, but unfortunately not on the *euc-jp* sequence - the decoding goes fine until it has to actually insert an "?" after which it dies. You can see that as the majority of the website pops up correctly in the debugger =|
Tapio Saarinen
+1  A: 

There's an expression in SBCL 1.0.18's mb-util.lisp that looks like this:

(if code
    (code-char code)
    (decoding-error array pos (+ pos bytes) ,format
                    ',malformed pos))

I'm not very familiar with SBCL's internals, but this looks like a bug. The consequent returns a character, while the alternative returns a string (no matter what you give to it via USE-VALUE, it's always converted into a string by way of the STRING function; see the definition of DECODING-ERROR in octets.lisp).

Matthias Benkard
I've reported it as a bug (https://bugs.launchpad.net/sbcl/+bug/314939) and it's been accepted. Thanks for pointing me at the correct source file, I might tinker around and hopefully not break things :-)
Tapio Saarinen