views:

197

answers:

2

My application needs to display "orphaned" combining characters. I would like to use the same format as the "official" unicode charts, using the dotted circle placeholder. See, for example:

A quick scan through the charts and I came up with U+25CC "DOTTED CIRCLE". That looks good, but the note on this character reads:

note that the reference glyph for this character is intentionally larger than the dotted circle glyph used to indicate combining characters in this standard; see, for example, 0300

Which says (I think) that U+25CC is not the correct character. (Or, if it is, perhaps just a poorly worded note.)

So: if the dotted circle used on the "Combining Diacritical Marks" is not U+25CC, what is the correct code for that little booger?

I have tried:

  • Copying the text from the PDF and inspecting it, but the copy is disabled in the PDF.
  • Emailing it to myself in Gmail and then viewing the attachment as HTML, but there is gets converted to U+0024 ("DOLLAR SIGN"). Which means that either the conversion failed or they are just playing some font rendering games in the PDF.

[Clarification] I realize that the U+25CC looks OK (assuming one's font supports it), but it sounds like the spec says that this is the wrong character. Many unicode characters have similar glyphs but are different characters, semantically speaking. "Latin Capital Letter A" (U+0041) and "Greek Capital Letter Alpha" (U+0391) will look identical for most fonts, but they have different semantic meanings and are not interchangable.

A: 

Just tried this: create a blank .html file, copy the text, and load in Firefox. Displays as expected (although I really didn't expect space+combining character to display correctly):

<html>
<body>
<font size="24pt">
&#x25CC;&#x0300;
&#x25CC;&#x0301;
&#x25CC;&#x0302;
&#x25CC;&#x0303;
<br/>
&#x0041;&#x0300;
&#x0041;&#x0301;
&#x0041;&#x0302;
&#x0041;&#x0303;
<br/>
&#x0020;&#x0300;
&#x0020;&#x0301;
&#x0020;&#x0302;
&#x0020;&#x0303;
</font>
</body>
</html>
devio
I added a clarification to my original question. I realize that U+25CC looks correct, but it sounds like it is not the correct character, semantically speaking, according to the spec.
Dave
+1  A: 

I don't think there is an official placeholder character. The way I read that note, they chose U+25CC arbitrarily, purely for display purposes. Then, in the chart where the "real" dotted circle is listed, they made it a little larger to emphasize that it's not being used as a placeholder there. (Or maybe they shrunk it in the other charts; as you said, the note's poorly worded.)

Whatever the case, I don't see any reason not to use U+25CC as your placeholder.

Alan Moore
Sounds reasonable. Thanks!
Dave