views:

1033

answers:

3

I've got a post-commit hook script that performs a SVN update of a working copy when commits are made to the repository.

When users commit to the repository from their Windows machines using TortoiseSVN they get the following error:

post-commit hook failed (exit code 1) with output:
svn: Error converting entry in directory '/home/websites/devel/website/guides/Images' to UTF-8
svn: Can't convert string from native encoding to 'UTF-8':
svn: Teneriffa-S?\195?\188d.jpg

The file in question above is: Teneriffa-Süd.jpg notice the accented u. This is because the site is German and the files have been spelt in German.

When executing a update on the working copy at the Linux command-line no errors are encountered. The above error only exists when the post-commit hook is executed via a commit by a Windows SVN client.

Questions:

  1. Why would SVN try to change the encoding of a file?
  2. Are filenames allowed to contain chars that are outside the Windows standard ASCII ones?

Update:

It turns out that the file in question's filename correctly displays as Teneriffa-Süd.jpg when viewed from a Windows machine (via Samba) but when I view the filename from the Linux server (using SSH and PuTTY) where the file resides I get Teneriffa-Süd.jpg

+2  A: 
  1. It changes the encoding to a location-neutral encoding in case someone with a different encoding checks it out.

  2. Of course. But it's not "Windows" ASCII (Windows actually uses some strange encoding like CP1251 or so).

The best way to fix this is to make sure that your system uses UTF-8 whenever possible (check $LANG).

Ignacio Vazquez-Abrams
echoed that system variable in Linux and it returned `en_GB.UTF-8` which implies that it is using UTF-8
Camsoft
I meant that it should be echoed on your local system, but it doesn't apply if you're running Windows, so never mind.
Ignacio Vazquez-Abrams
+1  A: 
  1. It does not change the encoding of the file. It changes the encoding of the filename (to something that every client can hopefully understand).
  2. Allowed by whom ? NTFS uses 16-bit code points, and Windows can expose the file names in various encodings, based on how you ask for it (it will try to convert them to the encoding you ask for). Now... That bit (how you ask) depends on the specific svn client you use. It sounds to me like a bug in TortoiseSVN.

Edit to add:

Ugh. I misunderstood the symptoms. the svn server stores everything in utf-8 (and it seems that it did that successfully).

The post-commit hook is the bit that fails to convert from UTF-8. If I understand what you're saying correctly, the post-commit hook on the server triggers an svn update to a shared drive (the svn server therefore starts an svn client to itself...) ? This means that the configuration that needs to be fixed is the one for the client on the server. Check the LANG / LC_ALL on the environment executing the svn server.. As it happens, the hooks are run in a vacuum environment (see Tip). So you should set the variable in the hook itself.

See also this page for info on how svn handles localisation

Bahbar
The file name `Teneriffa-Süd.jpg` is correctly displayed in my working copy on my Windows machine as well as the the working copy that the post-commit hook is trying to update which resides on a Linux server (same server as repositories) when the folder is viewed in Windows using a samba share.But when when I do a `ls` in the folder at the Linux command-line I get: `Teneriffa-Süd.jpg`
Camsoft
that probably just means that the filename holds data that is directly UTF-8 encoded (not surprising since the conversion failed), and windows parses that fine, while your linux box is not configured to see UTF-8 filenames, so it reads it as whatever codepage it wants.
Bahbar
Yes you are correct in that the SVN client that fails in the client on the server itself. I'll have a look at the links you sent me and get back to you.
Camsoft
A: 

Don't forget to generate those locales in your system
(as root)

example for Ru

locale-gen ru_RU.CP1251
locale-gen ru_RU.UTF-8
dpkg-reconfigure locales
n-sw-bit