Hi all,
Short version: When using emacs' xterm-mouse-mode, Somebody (emacs? bash? xterm?) intercepts xterm's control sequences and replaces them with \0. This is a pain on wide monitors because only the first 223 columns have mouse.
What is the culprit, and how can I work around it?
From what I can tell this has something to do with Unicode/UTF-8 support, because it wasn't a problem 5-6 years ago when I last had a big monitor.
Gory details follow...
Thanks!
Emacs xterm-mouse-mode has a well-known weakness handling mouse clicks starting around x=95. A workaround, adopted by recent versions of emacs, pushes the problem off to x=223.
Several years ago I figured out that xterm encodes positions in 7-bit octets. Given position 'x' to encode, with X=x-96, send:
\40+x (x < 96)
\300+X/64 \200+X%64 (otherwise)
We have to add one to given x position from emacs, because positions in xterm start at one, not zero. Hence the magic x=95 number pops up because it's coded as "\300\200" -- the first escaped number. Somebody (emacs? bash? xterm?) treats those like "C0" control sequences from ISO 2022. Starting at x=159, we change to "C1" sequences (\301\200), which are also part of ISO 2022.
Trouble hits with \302 sequences, which corresponds to the current x=223 limit. Several years ago I was able to extend the hack to intercept \302 and \303 sequences manually, which got past the problem. Fast forward a few years, and today I find that I'm stuck back at x=223 because Somebody is replacing those sequences with \0.
So, where I'd expect clicking at line 1, col 250 to produce
ESC [ M SPC \303\207 ! ESC [ M # \303\207 !
Instead emacs reports (for any col > 223)
ESC [ M SPC C-@ ! ESC [ M # C-@ !
I suspect that Unicode/UTF-8 support is the culprit. Some digging shows that the Unicode standard allowed C0 and C1 sequences as part of UTF-8 until Nov 2000, and I guess Somebody didn't get the memo (fortunately). However, \302\200 - \302\237 are Unicode control sequences, so Somebody slurps them up (doing who-knows-what with them!) and returns \0 instead.
Some more detailed questions:
- Who is this Somebody that intercepts the codes before they reach emacs' lossage buffer?
- If it's really just about control sequences, how come characters after \302\237, which are UTF-8 encodings of printable Unicode, also come back as \0 ?
- What makes emacs decide whether to display lossage as unicode characters or octal escape sequences, and why don't the two match? For example, my self-built cygwin emacs 23.2.1 (xterm 229) reports \301\202 for column 161, but my rhel5.5-supplied emacs 22.3.1 (xterm 215) reports "Â" (latin A with circumflex), which is actually \303\202 in UTF-8!
Update:
Here's a patch against xterm-261 which makes it emit mouse positions in utf-8 format:
diff -r button.c button.utf-8-fix.c
--- a/button.c Sat Aug 14 08:23:00 2010 +0200
+++ b/button.c Thu Aug 26 16:16:48 2010 +0200
@@ -3994,1 +3994,27 @@
-#define MOUSE_LIMIT (255 - 32)
+#define MOUSE_LIMIT (2047 - 32)
+#define MOUSE_UTF_8_START (127 - 32)
+
+static unsigned
+EmitMousePosition(Char line[], unsigned count, int value)
+{
+ /* Add pointer position to key sequence
+ *
+ * Encode large positions as two-byte UTF-8
+ *
+ * NOTE: historically, it was possible to emit 256, which became
+ * zero by truncation to 8 bits. While this was arguably a bug,
+ * it's also somewhat useful as a past-end marker so we keep it.
+ */
+ if(value == MOUSE_LIMIT) {
+ line[count++] = CharOf(0);
+ }
+ else if(value < MOUSE_UTF_8_START) {
+ line[count++] = CharOf(' ' + value + 1);
+ }
+ else {
+ value += ' ' + 1;
+ line[count++] = CharOf(0xC0 + (value >> 6));
+ line[count++] = CharOf(0x80 + (value & 0x3F));
+ }
+ return count;
+}
@@ -4001,1 +4027,1 @@
- Char line[6];
+ Char line[9]; /* \e [ > M Pb Pxh Pxl Pyh Pyl */
@@ -4021,2 +4047,0 @@
- else if (row > MOUSE_LIMIT)
- row = MOUSE_LIMIT;
@@ -4028,1 +4052,5 @@
- else if (col > MOUSE_LIMIT)
+
+ /* Limit to representable mouse dimensions */
+ if (row > MOUSE_LIMIT)
+ row = MOUSE_LIMIT;
+ if (col > MOUSE_LIMIT)
@@ -4090,2 +4118,2 @@
- line[count++] = CharOf(' ' + col + 1);
- line[count++] = CharOf(' ' + row + 1);
+ count = EmitMousePosition(line, count, col);
+ count = EmitMousePosition(line, count, row);
Hopefully this (or something like it) will appear in a future version of xterm... the patch makes xterm work out of the box with emacs-23 (which assumes utf-8 input) and fixes the existing problems with xt-mouse.el also. To use it with emacs-22 requires a redefinition of the function it uses to decode mouse positions (the new definition works fine with emacs-23 also):
(defadvice xterm-mouse-event-read (around utf-8 compile activate)
(setq ad-return-value
(let ((c (read-char)))
(cond
;; mouse clicks outside the encodable range produce 0
((= c 0) #x800)
;; must convert UTF-8 to unicode ourselves
((and (>= c #xC2) (< emacs-major-version 23))
(logior (lsh (logand c #x1F) 6) (logand (read-char) #x3F)))
;; normal case
(c) ) )))
Distribute the defun as part of the .emacs on all machines you log into, and patch the xterm on any machines you work from. Voila!
WARNING: Applications which use xterm's mouse modes but do not treat their input as utf-8 will get confused by this patch because the mouse escape sequences get longer. However, those applications break horribly with the current xterm because mouse positions with x > 95 look like utf-8 codes but aren't. I'd create a new mouse mode for xterm, but certain applications (gnu screen!) filter out unknown escape sequences. Emacs is the only terminal-mouse app I use, so I consider the patch a net win, but YMMV.