While searching for a proper way to trim non-breaking space from parsed HTML, I've first stumbled on java's spartan definition of String.trim() which is at least properly documented. I wanted to avoid explicitly listing characters eligible for trimming, so I assumed that using Unicode backed methods on Character class would do the job for me.
That's when I discovered that Character.isWhitespace(char) explicitly excludes non-breaking spaces:
It is a Unicode space character (
SPACE_SEPARATOR,LINE_SEPARATOR, orPARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0','\u2007','\u202F').
Why is that?
The implementation of corresponding .NET equivalent is less discriminating.