ansaurus

Question

Answer 1

A:

As Wikipedia says, ASCII is only 0-127. "Extended ASCII" is a misnomer, should be avoided, and used to loosely mean "some other character set based on ASCII which only uses single bytes" (meaning not multibyte like UTF-8). Sometimes the term means the 128-255 codepoints of that specific character set⁠—⁠but again, it's vague and you shouldn't count on it meaning anything specific.

The use of the term is sometimes criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue.

Source: http://en.wikipedia.org/wiki/Extended_ASCII

Roger Pate 2010-10-17 12:22:33

Answer 2

+3 A:

ASCII only covers the characters with value 0-127, and those are the same on all computers. (Well, almost, although this is mostly a matter of glyphs rather than semantics.)

Extended ASCII is a term for various single-byte code pages that are assign various characters to the range 128-255. There is no single "extended ASCII" set of characters.

In C# and VB.NET, all strings are Unicode, so by default, there's no need to worry about this - whether or not a character can be displated in a console app is a matter of the fonts being used, not the limitation of any specific single-byte codepage.

Michael Madsen 2010-10-17 12:27:57

You don't write software that runs on EBCDIC systems?! :P

Roger Pate 2010-10-17 12:29:13

Congrats on 10k!

Roger Pate 2010-10-17 12:32:36

@Roger: No, no I don't. And I don't think the OP will do that either :) (Also, thanks.)

Michael Madsen 2010-10-17 12:35:11

Thanks, also if I use only the first 127, I can be sure they will be displayed well, right?

Mojmi 2010-10-17 12:37:48

@Mojmir: Assuming you don't use any non-printable characters, and we ignore the issue about the glyph used for a backslash on a Japanese or Korean system, then yes.

Michael Madsen 2010-10-17 12:41:43

Come on everyone, let's get him down below 10k again so he doesn't get too cocky. :)

bzlm 2010-10-17 12:42:07

@bzlm: He'd need posts worth downvoting for that. When I saw he was just shy of 10k I looked through his answers, and didn't see any meriting that (but I was looking for ones worth upvoting instead ;).

Roger Pate 2010-10-17 12:45:15

@Michael Madsen, and when using non printable chars? I just do not know what is bad about it. If I make simple app and wants to display some of the ASCII standard non printable chars.

Mojmi 2010-10-17 12:53:54

@Mojmir It's hard to understand what your question is. Are you asking whether what you see in your console output will work for any user anywhere regardless of which ASCII character you use?

bzlm 2010-10-17 12:59:42

Well, they're *non-printable*, therefore, you can't really count on anything sensible happening if you try to print them. If you're thinking of the glyphs you could usually show for those in the old DOS days, there are equivalent Unicode characters for those, but depending on the console font, you may not be able to display all of them - you'll have to try for yourself. See [Wikipedia's page on code page 850](http://en.wikipedia.org/wiki/Code_page_850) for an example of this mapping.

Michael Madsen 2010-10-17 13:03:15

@bzlm, yes, basically. If I use only the first 127 chars - uc0002 etc.

Mojmi 2010-10-17 13:05:50

@ Michael Madsen Thanks. So, if I am able to display say this uc0002 (smile) in the console app, then (as C# .NET uses Unicode) everyone will. I was only confused whether this char is ascii or not

Mojmi 2010-10-17 13:31:50

@Mojmir: No, Unicode character 0002 is not a smile. Look at the hexadecimal number below it; *that's* the Unicode value of that character.

Michael Madsen 2010-10-17 14:19:59

@Michael Madsen Well, so what is the reason \u0002 prints that smile? I do not get it :(

Mojmi 2010-10-17 16:23:35

@Mojmir: Because that's how those really old code pages were defined, so the glyphs are in that particular font for legacy reasons - it's not something you should depend on, because it's officially a [control character with no formally defined glyph](http://www.fileformat.info/info/unicode/char/0002/index.htm). Use the proper smile glyph from Unicode if you really need it.

Michael Madsen 2010-10-17 16:32:06

Ok, thanks. That was what I needed to know.

Mojmi 2010-10-17 16:36:17

Answer 3

A:

As others have said, true ASCII is always the lower 7 bits of each byte. Before the advent (and ubiquity) of Unicode standards, various extensions to the ASCII character set that utilized the eighth bit were released. The most common in the Windows world is Windows code page 1252.

If you're looking to use this encoding in .NET, you can get it like this:

Encoding windows1252 = Encoding.GetEncoding("windows-1252");

Adam Robinson 2010-10-17 12:28:13

ansaurus

tags:

views:

answers:

Extended ASCII question

related questions