Why haven't ASCII and ISO-8859-1 encoding been relegated to history?

views:

104

answers:

+2 Q:

Why haven't ASCII and ISO-8859-1 encoding been relegated to history?

It seems to me if UTF-8 was the only encoding used everywhere ever, there would be a lot less issues with code:

Don't even need to think about encoding issues.
No issues with mixed 1-2-byte character streaming, because everything uses 2 bytes.
Browsers don't need to wait for the <meta> tag specifying encoding before they can do anything. StackOverflow doesn't even have the meta tag, making browsers download the full page first, slowing page rendering.
You would never see ? and other random symbols on old web pages (e.g. in place of Microsoft Word's special [read: horrible] quotes).
More characters can be represented in UTF-8.
Other things I can't think of right now.

So why haven't the inferior encodings been nuked from space?

I don't think UTF-8 uses "2 bits" it's variable length. Also a lot of OS level code is UTF-16 and UTF-32 respectively, which means the choice is between ASCII or ISO-8859-1 for latin encodings.

Novikov 2010-09-02 04:29:14

2 bits was meant to be 2 bytes. Edited the question.

Coronatus 2010-09-02 04:32:36

Yes but it still stands that UTF-8 is anywhere between 1 to 4 bytes.

Novikov 2010-09-02 04:38:50

@Coronatus but the point is, UTF-8 is *NOT* a 2-byte encoding. It's a variable-length encoding that uses from 1-4 bytes per character. That's one of its disadvantages compared to single-byte encodings: you have to worry about splitting a string in the middle of a character, can't tell how long a string is (in chars) without parsing each byte, and so forth.

David Gelhar 2010-09-02 04:43:54

It's common to need to know how many *bytes* are in a string for memory allocation purposes. Or, less commonly, to know how many *terminal columns* a string takes for text-wrapping purposes. But how often do you need to know the number of *characters*?

dan04 2010-09-03 05:13:33

+6 A:

Why are EBCDIC, Baudot, and Morse still not nuked from orbit? Why did the buggy-whip manufacturers not close their doors the day after Gottlieb Daimler shipped his first automobile?

Relegating a technology to history takes non-zero time.

msw 2010-09-02 04:42:30

True, but Unicode has been around for almost 20 years.

dan04 2010-09-03 04:05:34

But Baudot has been around for more than 100 years and occupies only 70% of the space of wasteful ASCII!

msw 2010-09-03 23:46:19

+8 A:

Don't even need to think about encoding issues.

True. Except for all the data that's still in the old ASCII format.

No issues with mixed 1-2-byte character streaming, because everything uses 2 bytes.

Incorrect. UTF-8 is variable length, from 1 to 6 or so bytes.

Browsers don't need to wait for the tag specifying encoding before they can do anything. StackOverflow doesn't even have the meta tag, making browsers download the full page first, slowing page rendering.

Browsers don't generally wait for the full page, they make a guess based on the first part of the page data.

You would never see ? and other random symbols on old web pages (e.g. in place of Microsoft Word's special [read: horrible] quotes).

Except for all those other old web pages that use other non-UTF-8 encodings (the non-English speaking world is pretty big).

More characters can be represented in UTF-8.

True. Your problems of data validation just got harder, too.

Greg Hewgill 2010-09-02 05:09:20

Good answer. Except for the first point, since UTF-8 can treat existing ASCII text as perfectly valid UTF-8. Which is not true for ISO-8859-1.

Avi 2010-09-02 05:16:56

+1 A:

dan04 2010-09-03 04:58:33

ansaurus

tags:

views:

answers:

Why haven't ASCII and ISO-8859-1 encoding been relegated to history?

related questions