views:

199

answers:

4

Hi

I have to create an app that drills into a specific drive, reads all file names and replaces illegal SharePoint characters with underscores. The illegal characters I am referring to are: ~ # % & * {} / \ | : <> ? - ""

Can someone provide either a link to code or code itself on how to do this? I am VERY new to C# and need all the help i can possibly get. I have researched code on recursively drilling through a drive but i am not sure how to put the character replace and the recursive looping together. Please help!

+6  A: 

The advice for removing illegal characters is here:

http://stackoverflow.com/questions/146134/how-to-remove-illegal-characters-from-path-and-filenames/146162#146162

You just have to change the character set to your set of characters that you want to remove.

If you have figured out how to recurse the folders, you can get all of the files in each folder with:

var files = System.IO.Directory.EnumerateFiles(currentPath);

and then

foreach (string file in files)
{
    System.IO.File.Move(file, ConvertFileName(file));
}

The ConvertFileName method you will write to accept a filename as a string, and return a filename stripped of the bad characters.

Note that, if you are using .NET 3.5, GetFiles() works too. According to MSDN:

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

Robert Harvey
+1: Didn't know about Directory.EnumerateFiles(x)
Jared Updike
Use SPUrlUtility.IsLegalCharInUrl(char character) to determine an illegal "SharePoint" file char.
Jason
@Jared: Those Enumerate* methods are new, I think in .NET 4. As you can imagine, they return IEnumerable instead of some sort of list.
Nelson
+5  A: 

Not really an answer, but consider both of the following:

The following characters are not valid in filenames anyways so you don't have to worry about them: /\:*?"<>|.

Make sure your algorithm handles duplicate names appropriately. For example, My~Project.doc and My#Project.doc would both be renamed to My_Project.doc.

Steven
+1  A: 

A recursive method to rename files in folders is what you want. Just pass it the root folder and it will call itself for all subfolders found.

private void SharePointSanitize(string _folder)
{
    // Process files in the directory
    string [] files = Directory.GetFiles(_folder);
    foreach(string fileName in files)
    {
        File.Move(fileName, SharePointRename(fileName));
    }
    string[] folders = Directory.GetDirectories(_folder);
    foreach(string folderName in folders)
    {
        SharePointSanitize(folderName);
    }
}

private string SharePointRename(string _name)
{
    string newName = _name;
    newName = newName.Replace('~', '');
    newName = newName.Replace('#', '');
    newName = newName.Replace('%', '');
    newName = newName.Replace('&', '');
    newName = newName.Replace('*', '');
    newName = newName.Replace('{', '');
    newName = newName.Replace('}', '');
    // .. and so on
    return newName;
}

Notes:

  1. You can replace the '' in the SharePointRename() method to whatever character you want to replace with, such as an underscore.
  2. This does not check if two files have similar names like thing~ and thing%
JYelton
Props to Steven (+1) for noting the duplicate file issue in my note #2
JYelton
Or create an array: `char[] invalidList = new char[] { '~', '#', ... }` and use a loop to replace: `foreach (char invalid in invalidList) { newName = newName.Replace(invalid, '_'); }` However, this has to create a new string every time since it's immutable. Maybe a regex would be faster because of that reason?
Nelson
A: 
class Program
{
    private static Regex _pattern = new Regex("[~#%&*{}/\\|:<>?\"-]+");
    static void Main(string[] args)
    {
        DirectoryInfo di = new DirectoryInfo("C:\\");
        RecursivelyRenameFilesIn(di);
    }

    public static void RecursivelyRenameFilesIn(DirectoryInfo root)
    {
        foreach (FileInfo fi in root.GetFiles())
            if (_pattern.IsMatch(fi.Name))
                fi.MoveTo(string.Format("{0}\\{1}", fi.Directory.FullName, Regex.Replace(fi.Name, _pattern.ToString(), "_")));

        foreach (DirectoryInfo di in root.GetDirectories())
            RecursivelyRenameFilesIn(di);
    }
}

Though this will not handle duplicates names as Steven pointed out.

hoang