views:

130

answers:

1

I have a document A in encoding A displayed in tool A and a document B in encoding B displayed in tool B. If I cut and paste (part of) B into A what might be the resultant character encoding? I realise this depends on tool A and tool B and the information held in the paste buffer (which presumably can contain an encoding?) and the operating system.

What should high-quality tools do? and in practice how many of the common tools (e.g. Word, TextPad, various IDEs, etc.) do a good job?

+2  A: 

First of all, a text editor's internal representation of text has no bearing on how the text is encoded (serialized) when you save the file. So a document is not "in" an encoding; it's a sequence of abstract characters. When the document is saved to a file (or transmitted over the network) then it gets encoded.

It's up to each application to decide what it puts on the clipboard. Typically, a windows app that knows what it's doing will put a number of different representations on the clipboard. When you paste in the other app, the app will look for the representation that best suits its need.

In your case, a text editor (that knows what it's doing) will put a Unicode representation of a selected string onto the clipboard (where Unicode, in Windows, is typically moved around as UTF-16, but that's not important). When you paste in the other app, it will insert that sequence of Unicode characters into the document at the selection point.

There's an app floating around called "ClipSpy" that will help you see what I'm talking about, interactively.

Jonathan Feinberg
+1 So a good clipboard will try to normalize one version of the characters to UTF-16/Unicode.
peter.murray.rust
I'm not sure what you mean by that. The clipboard doesn't *do* anything, other than hold onto some bytes that the application put there. The clipboard acts like a key-value store, where the key is something like a mimetype, and the value is a blob (that you then interpret according to the type).
Jonathan Feinberg
@peter.murray.rust that's somewhat Windows specific. But I'm no expert.
Adriano Varoli Piazza
Actually, on Windows CF_TEXT, CF_OEMTEXT, and CF_UNICODETEXT are always present. Add one to the clipboard, and the clipboard with convert and add the others. So the clipboard *does* something.
Mihai Nita
+1 for educating me.
Jonathan Feinberg