views:

897

answers:

6

Hi All

How to detect the presence of Extended ASCII values (128 to 255) in a C++ character array.

Thank you

+2  A: 

Iterate over array and check that each character doesn't fall in 128 to 255 range?

Alex Reitbort
+1 for being 36 seconds faster than me ;-).
Gamecat
Ascii stupid question get a stupid Ansi. +1
Tim Matthews
-1, char is often signed. In that case char(130) < 129 !
MSalters
A: 

Check the values that they are not negative

Riho
+11  A: 

Please remember that there is no such thing as extended ASCII. ASCII was and is only defined between 0 and 127. Everything above that is either invalid or needs to be in a defined encoding other than ASCII (for example ISO-8859-1).

Please read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

Other than that: what's wrong with iterating over it and check for any value > 127 (or <0 when using signed chars)?

Joachim Sauer
-1 for being completely wrong: http://en.wikipedia.org/wiki/Extended_ASCII
shoosh
@shoosh: read your link again: "The use of the term is sometimes criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue"
Mehrdad Afshari
@shoosh: I'm aware that some encodings can be collectively referred to as "extended ASCII", but whenever I see someone use that term they usually don't know this. So I discourage it's use and try to clarify where I see it used.
Joachim Sauer
-1, char is often signed.
MSalters
@MSalters that is the lamest excuse for -1. @shoosh extended ascii is not standard.
Tim Matthews
@MSalters: fixed
Joachim Sauer
Ok, new code works. @CtrlAltDel: The bug was quite real; "char c = 128; std::cout << bool(c>127)" will print "false" on those systems where char(128) < 0. Hence, all chars would appear to be ASCII.
MSalters
I like to think the term "extended ASCII" refers to characters that are in a standard which in some way "extends" ASCII; in this sense the term "extended ASCII" to refer to character sets which are supersets of ASCII is correct.
Beau Martínez
@Beau: even if that definition could be argued to be correct it is highly misleading, because, as you pointed out, it doesn't refer to a single encoding, but to a class of encodings instead. Also: "extended ASCII" implies that it is some-kind-of ASCII, which is wrong. "ASCII-based encoding", "ASCII-compatible encoding" or something like that would be a more correct term.
Joachim Sauer
A: 
bool detect(const signed char* x) {
  while (*x++ > 0);
  return x[-1];
}
Mehrdad Afshari
+2  A: 

Make sure you know the endianness of the machine in question, and just check the highest bit with a bitwise AND mask:

if (ch & 128) {
  // high bit is set
} else {
  // looks like a 7-bit value
}

But there are probably locale functions you should be using for this. Better yet, KNOW what character encoding data is coming in as. Trying to guess it is like trying to guess the format of data going into your database fields. It might go in, but garbage in, garbage out.

Lee B
hi Lee B. My application acts as a middleware between java front end and DCE backend application. the DCE server populates some junk character in the outparam of the middle ware. i have to send the out param content to the FE. on sending the junk char, the middleware dumps core.
ilan
i have to convert the char* from the DCE to string and then pass it on to the front-end.
ilan
You'd better cast 128 to char, or else ch will be converted to int, in which case 128 isn't the high bit anymore.
MSalters
+6  A: 

Char can be signed or unsigned. This doesn't really matter, though. You actually want to check if each character is valid ASCII. This is a positive, non-ambiguous check. You simply check if each char is both >=0 and <= 127. Anything else (whether positive or negative, "Extended ASCII" or UTF-8) is invalid.

MSalters