ansaurus

Question

What encoding/code page is cmd.exe using

Answer 1

+2 A:

To answer your second query re. how encoding works, Joel Spolsky wrote a great introductory article on this. Strongly recommended.

Brian Agnew 2009-08-11 08:39:47

I've read it and I know it. However, on Windows I always feel lost because the OS and most applications seem totally ignorant of encoding.

danglund 2009-08-11 13:59:23

Answer 2

+1 A:

Command CHCP shows current codepage. It has 3 difgits: 8xx and is different from windows 12xx. So typing a english only text you wouldn't see any difference, but extended codepage (like Cyrillic) will be printed wrongly.

Dewfy 2009-08-11 08:42:11

Answer 3

+1 A:

type

chcp

to see your current code page. (as Dewfy already said).

nlsinfo

to see all installed code pages and find out what that your code page number means.

edited : You need to have Windows Server 2003 Resource kit installed (works on Windows XP) to use nlsinfo

Cagdas Altinkaya 2009-08-11 08:47:24

Interestingly, `nlsinfo` doesn't appear to exist on my Windows 7.

Joey 2009-08-11 10:29:51

`nlsinfo` also doesn't exist on my Windows XP SP3 machine.

Thomas Owens 2009-08-11 11:41:27

Oh, I'm sorry. I think it comes with Windows Server Resource Kit tools. I've used it a couple of times on my Windows XP SP3 machine earlier and didn't know it wasn't installed by default.

Cagdas Altinkaya 2009-08-11 11:52:05

Ah, that explains why it's there on my Vista machine, where I installed those.

Joey 2009-08-11 14:05:42

Answer 4

+11 A:

While chcp does indeed show the current code page cmd uses, it is of little to no relevance depending on your settings and how you started cmd.

First of all: Your console font determines what the console window is capable of displaying. More on that below.

Secondly, for Unicode files the current codepage only determines what gets displayed, depending on the font used (again, see below). For non-Unicode files the interpretation of the bytes is left to the current codepage, indeed:

> chcp 850
Active code page: 850

> type 1251.txt
abcde xyz
ÓßÔÒõ ²■ 

> chcp 1251
Active code page: 1251

> type 1251.txt
abcde xyz
абвгд эюя

(Will show up garbled if you have raster fonts enabled, but will copy fine.)

For the following I prepared a little test file, containing letters from different cultures:

ASCII     abcde xyz
German    äöü ÄÖÜ ß
Polish    ąęźżńł
Russian   абвгдеж эюя
CJK       你好

If you use "Raster Fonts" then the console window will be confined th the codepage chcp shows. Unfortunately this is still the default in Windows 7 and I wish they wouldn't stick to such stupid defaults. However, the characters that are displayable still depend on the system you have, in my case the raster fonts are only for Latin and won't display Russian or otherwise.
```
> chcp 850
Active code page: 850


> type uc-test.txt
ASCII     abcde xyz
German    äöü ÄÖÜ ß
Polish    aezznl
Russian   ??????? ???
CJK       ??


> chcp 437
Active code page: 437


> type uc-test.txt
ASCII     abcde xyz
German    äöü ÄÖÜ ß
Polish    aezznl
Russian   ??????? ???
CJK       ??
```
Note that in both CP850 and CP437 the German umlauts and ß work fine. The polish letters ąęźżńł get converted as good as possibly to their closest fits in ASCII, whereas for Russian or CJK ideographs there is no such easy replacement, which is why they become question marks.
```
> chcp 1251
Active code page: 1251


> type uc-test.txt
ASCII     abcde xyz
German    aou AOU ?
Polish    aezznl
Russian   абвгдеж эюя
CJK       ??
```
1251 is the ANSI codepage for Cyrillic, as you can see, it lacks both umlauts and Polish letters, but they can get converted to their closest equivalent in that codepage, unlike ß which just becomes a question mark again. But Russian now works correct.
```
> chcp 1250
Active code page: 1250


> type uc-test.txt
ASCII     abcde xyz
German    äöü ÄÖÜ ß
Polish    ąęźżńł
Russian   ??????? ???
CJK       ??
```
1250 is the ANSI codepage for Central European, which includes Polish, also German special letters are also included which is nice when talking to German-speaking Poles. However, Russian and Chinese are not there and thus just get question marks again.

Interesting to note is that when using raster fonts, the console window's copy/paste abilities will cause text to be copied tied to the selected codepage (probably it's copied in Unicpde anyway) so even when one isn't able to see Russian due to font issues, it copies fine, as long as one is in CP1251, or 866, or 855 (well, there are many of them :-)).

If you select a Unicode font, such as Lucida Console or Consolas, then you will be able to see and type Unicode characters on the console, regardless of what chcp says:

> chcp 850
Active code page: 850


> type uc-test.txt
ASCII     abcde xyz
German    äöü ÄÖÜ ß
Polish    ąęźżńł
Russian   абвгдеж эюя
CJK       你好


> chcp 437
Active code page: 437


> type uc-test.txt
ASCII     abcde xyz
German    äöü ÄÖÜ ß
Polish    ąęźżńł
Russian   абвгдеж эюя
CJK       你好


> chcp 1251
Active code page: 1251


> type uc-test.txt
ASCII     abcde xyz
German    äöü ÄÖÜ ß
Polish    ąęźżńł
Russian   абвгдеж эюя
CJK       你好

(Note that the CJK characters probably only show up as boxes in your console, as they do here, but the characters are still correct, it's the font that lacks the glyphs.)

Then there is the encoding that is used when cmd is redirecting stuff to a file. This closely follows chcp, regardless of the font used for the console window. You can start cmd with

cmd /u

to cause it to redirect to files in Unicode (UTF-16, Little Endian in this case, as usual on Windows).

Joey 2009-08-11 10:07:50

Thanks a lot for this detailed description. As always there is no short and easy answer when it comes to encoding, but this explains it beautifully. Thanks!

danglund 2009-08-11 14:01:03

Answer 5

A:

We have a problem with a DOS-application under Windows 7 (32bit). The DOS application runs just fine under code page 852. But when we call a DOS Shell from running DOS-app to execute an external program/application, the CP 852 changes to CP 437 as soon as DOS execution (DOS Shell) is finished. We have tried many tricks to resolve the bug but we're completely stuck now... :-( We really need help, please. Thank you very much for your time.

Vladimir Cvajniga 2009-12-28 11:20:10

You might have better luck asking this on http://superuser.com/ . In any case, you should ask it as a standalone question, not as an answer to someone else's question.

Alan Moore 2009-12-29 14:50:33

This is not an answer

Vertis 2010-06-08 23:45:10

ansaurus

tags:

views:

answers:

What encoding/code page is cmd.exe using

related questions