ansaurus

Question

C++ Extended Ascii characters

Answer 1

+2 A:

Iterate over array and check that each character doesn't fall in 128 to 255 range?

Alex Reitbort 2009-03-06 10:17:43

+1 for being 36 seconds faster than me ;-).

Gamecat 2009-03-06 10:19:04

Ascii stupid question get a stupid Ansi. +1

Tim Matthews 2009-03-06 10:25:37

-1, char is often signed. In that case char(130) < 129 !

MSalters 2009-03-06 11:06:10

Answer 2

A:

Check the values that they are not negative

Riho 2009-03-06 10:18:18

Answer 3

+11 A:

Please remember that there is no such thing as extended ASCII. ASCII was and is only defined between 0 and 127. Everything above that is either invalid or needs to be in a defined encoding other than ASCII (for example ISO-8859-1).

Please read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

Other than that: what's wrong with iterating over it and check for any value > 127 (or <0 when using signed chars)?

Joachim Sauer 2009-03-06 10:18:32

-1 for being completely wrong: http://en.wikipedia.org/wiki/Extended_ASCII

shoosh 2009-03-06 10:29:03

@shoosh: read your link again: "The use of the term is sometimes criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue"

Mehrdad Afshari 2009-03-06 10:35:34

@shoosh: I'm aware that some encodings can be collectively referred to as "extended ASCII", but whenever I see someone use that term they usually don't know this. So I discourage it's use and try to clarify where I see it used.

Joachim Sauer 2009-03-06 10:58:55

-1, char is often signed.

MSalters 2009-03-06 11:05:06

@MSalters that is the lamest excuse for -1. @shoosh extended ascii is not standard.

Tim Matthews 2009-03-06 11:18:55

@MSalters: fixed

Joachim Sauer 2009-03-06 12:15:02

Ok, new code works. @CtrlAltDel: The bug was quite real; "char c = 128; std::cout << bool(c>127)" will print "false" on those systems where char(128) < 0. Hence, all chars would appear to be ASCII.

MSalters 2009-04-10 11:37:18

I like to think the term "extended ASCII" refers to characters that are in a standard which in some way "extends" ASCII; in this sense the term "extended ASCII" to refer to character sets which are supersets of ASCII is correct.

Beau Martínez 2010-08-27 18:36:28

@Beau: even if that definition could be argued to be correct it is highly misleading, because, as you pointed out, it doesn't refer to a single encoding, but to a class of encodings instead. Also: "extended ASCII" implies that it is some-kind-of ASCII, which is wrong. "ASCII-based encoding", "ASCII-compatible encoding" or something like that would be a more correct term.

Joachim Sauer 2010-08-28 15:08:25

Answer 4

A:

bool detect(const signed char* x) {
  while (*x++ > 0);
  return x[-1];
}

Mehrdad Afshari 2009-03-06 10:18:41

Answer 5

+2 A:

Make sure you know the endianness of the machine in question, and just check the highest bit with a bitwise AND mask:

if (ch & 128) {
  // high bit is set
} else {
  // looks like a 7-bit value
}

But there are probably locale functions you should be using for this. Better yet, KNOW what character encoding data is coming in as. Trying to guess it is like trying to guess the format of data going into your database fields. It might go in, but garbage in, garbage out.

Lee B 2009-03-06 10:42:22

hi Lee B. My application acts as a middleware between java front end and DCE backend application. the DCE server populates some junk character in the outparam of the middle ware. i have to send the out param content to the FE. on sending the junk char, the middleware dumps core.

ilan 2009-03-06 10:51:50

i have to convert the char* from the DCE to string and then pass it on to the front-end.

ilan 2009-03-06 10:52:38

You'd better cast 128 to char, or else ch will be converted to int, in which case 128 isn't the high bit anymore.

MSalters 2009-03-06 11:08:51

Answer 6

+6 A:

Char can be signed or unsigned. This doesn't really matter, though. You actually want to check if each character is valid ASCII. This is a positive, non-ambiguous check. You simply check if each char is both >=0 and <= 127. Anything else (whether positive or negative, "Extended ASCII" or UTF-8) is invalid.

MSalters 2009-03-06 11:04:35

ansaurus

tags:

views:

answers:

C++ Extended Ascii characters

related questions