When writing interpreters for PDF, HTML and other documents we need to deal with a variety of white-space characters and additional non-printing characters. The ANSI ones are well defined, but how many others are likely to be found in practice? A typical example is the cluster in ISO10646 (I think):
    en space
    em space
    thin space
‌ ‌ zero width non-joiner
‍ ‍ zero width joiner
‎ ‎ left-to-right mark
‏ ‏ right-to-left mark
(For obvious reasons the characters do not appear above!).