views:

822

answers:

4

I have a whole bunch of files with filenames using our lovely Swedish letters å å and ö. For various reasons I now need to convert these to an [a-zA-Z] range. Just removing anything outside this range is fairly easy. The thing that's causing me trouble is that I'd like to replace å with a, ö with o and so on.

This is charset troubles at their worst.

I have a set of test files:

files\Copy of New Text Documen åäö t.txt
files\fofo.txt
files\New Text Document.txt
files\worstcase åäöÅÄÖéÉ.txt

I'm basing my script on this line, piping it's results into various commands

for %%X in (files\*.txt) do (echo %%X)

The wierd thing is that if I print the results of this (the plain for-loop that is) into a file I get this output:

files\Copy of New Text Documen †„” t.txt
files\fofo.txt
files\New Text Document.txt
files\worstcase †„”Ž™‚.txt

So something wierd is happening to my filenames before they even reach the other tools (I've been trying to do this using a sed port for Windows from something called GnuWin32 but no luck so far) and doing the replace on these characters doesn't help either.

How would you solve this problem? I'm open to any type of tools, commandline or otherwise

EDIT: This is a one time problem, so I'm looking for a quick 'n ugly fix

A: 

I would write this in C++, C#, or Java -- environments where I know for certain that you can get the Unicode characters out of a path properly. It's always uncertain with command-line tools, especially out of Cygwin.

Then the code is a simple find/replace or regex/replace. If you can name a language it would be easy to write the code.

Jason Cohen
A: 

I'd write a vbscript (WSH) to scan the directories, then send the filenames to a function that breaks up the filenames into their individual letters, then does a SELECT CASE on the Swedish ones and replaces them with the ones you want. Or, instead of doing that the function could just drop it thru a bunch of REPLACE() functions, reassigning the output to the input string. At the end it then renames the file with the new value.

busse
+1  A: 

You might have more luck in cmd.exe if you opened it in UNICODE mode. Use "cmd /U".

Others have proposed using a real programming language. That's fine, especially if you have a language you are very comfortable with. My friend on the C# team says that C# 3.0 (with Linq) is well-suited to whipping up quick, small programs like this. He has stopped writing batch files most of the time.

Personally, I would choose PowerShell. This problem can be solved right on the command line, and in a single line. I'll

EDIT: it's not one line, but it's not a lot of code, either. Also, it looks like StackOverflow doesn't like the syntax "$_.Name", and renders the _ as &#95.

$mapping = @{ 
    "å" = "a"
    "ä" = "a"
    "ö" = "o"
}

Get-ChildItem -Recurse . *.txt | Foreach-Object { 
    $newname = $_.Name  
    foreach  ($l in $mapping.Keys) {
     $newname = $newname.Replace( $l, $mapping[$l] )
     $newname = $newname.Replace( $l.ToUpper(), $mapping[$l].ToUpper() )
    }
    Rename-Item -WhatIf $_.FullName $newname # remove the -WhatIf when you're ready to do it for real.
}
Jay Bazuzi
+1  A: 

You can use this code (Python)

Rename international files

# -*- coding: cp1252 -*-

import os, shutil

base_dir = "g:\\awk\\"    # Base Directory (includes subdirectories)
char_table_1 = "áéíóúñ"
char_table_2 = "aeioun"

adirs = os.walk (base_dir)

for adir in adirs:
    dir = adir[0] + "\\"          # Directory
    # print "\nDir : " + dir

    for file in adir[2]:    # List of files
        if os.access(dir + file, os.R_OK):
            file2 = file
            for i in range (0, len(char_table_1)):
                file2 = file2.replace (char_table_1[i], char_table_2[i])

            if file2 <> file:
                # Different, rename
                print dir + file, " => ", file2
                shutil.move (dir + file, dir + file2)

###

You have to change your encoding and your char tables (I tested this script with Spanish files and works fine). You can comment the "move" line to check if it's working ok, and remove the comment later to do the renaming.

PabloG