tags:

views:

33

answers:

1

I ran this regex in GIT BASH (I am on a windows machine, and I have GIT installed).

perl -pe 's/\[(?:xx_)?([^]]+)\]/\[\u$1\]/g'

The file now looks as if it is written in Chinese (it is a .sql schema file).

example:

嵛 嬀] IDENTITY(1,1) NOT NULL,

Is there some encoding issue going on?

+2  A: 

Isn't that similar to issue 358?

Windows command line and GUI programs use different codepages by default.
For historical DOS compatibility, the command line ("OEM") codepage is 437, while the GUI ("ANSI") codepage is 1252. See the interesting reading here.

The console uses the OEM codepage (437 on my system) while the GUI uses the ANSI codepage (1252 on my system).
When launching a program from the console, cmd.exe usually does not modify the arguments to that program, except if the program happens to be a .bat or .cmd file in which case cmd.exe performs a codepage conversion on the arguments (see "Codepage Conversions").
So git.exe already receives "Daniël" in 1252 encoding, which is why it looks fine it looking at .git/config using Notepad.
When reading user.name, however, no codepage conversion takes place and "Daniël" in 1252 encoding is printed to the console, resulting in "Daniδl" on my system.
Obviously, the situation is different when using MSYS / Git Bash.

So, to sum up, the solution when running Git from cmd.exe (via the .cmd wrappers) seems to be to:

1) change the console font from a raster font to a True Type font,
2) change the console codepage via "chcp" to match the Windows codepage (whatever that may be).


In short, a fix is coming: could you try this beta Git installer and see if you still have the encoding issue?

VonC