tags:

views:

71

answers:

3

Hello!

I have an application that requires me to "clean" "dirty" filenames.

I was wondering if anybody knew how to handle files that are named like:

1.0.1.21 -- Confidential...doc or Accounting.Files.doc

Basically there's no guarantee that the periods will be in the same place for every file name. I was hoping to recurse through a drive, search for periods in the filename itself (minus the extension), remove the period and then append the extension onto it.

Does anybody know either a better way to do this or how do perform what I'm hoping to do? As a note, regEx is a REQUIREMENT for this project.

EDIT: Instead of seeing 1.0.1.21 -- Confidential...doc, I'd like to see: 10121 -- Confidential.doc
For the other filename, Instead of Accounting.Files.doc, i'd like to see AccountingFiles.doc

+3  A: 

You could do it with a regular expression:

string s = "1.0.1.21 -- Confidential...doc";
s = Regex.Replace(s, @"\.(?=.*\.)", "");
Console.WriteLine(s);

Result:

10121 -- Confidential.doc

The regular expression can be broken down as follows:

\.    match a literal dot
(?=   start a lookahead 
.*    any characters
\.    another dot
)     close the lookahead

Or in plain English: remove every dot that has at least one dot after it.

It would be cleaner to use the built in methods for handling file names and extensions, so if you could somehow remove the requirement that it must be regular expressions I think it would make the solution even better.

Mark Byers
or simply just `@"\.+`
Chad
@Chad: No, that will only remove dots in direct succession.
Tim Pietzcker
@Tim Pietzcker, the requirements changed when they were clarified... I deleted my answer, couldn't delete that comment though.
Chad
+2  A: 

Here is an alternate solution that doesn't use regular expressions -- perhaps it is more readable:

string s = "1.0.1.21 -- Confidential...doc";
int extensionPoint = s.LastIndexOf(".");
if (extensionPoint < 0) {
    extensionPoint = s.Length;
}
string nameWithoutDots = s.Substring(0, extensionPoint).Replace(".", "");
string extension = s.Substring(extensionPoint);
Console.WriteLine(nameWithoutDots + extension);
Richard Walters
I like this solution. I have one question: If (extensionPoint < 0) evaluates to true, then doesn't that mean there is no '.' in s, thus we can simply return s?
Ed Gonzalez
Yes, that's true. I had the check in there to avoid having String.Substring raise an exception, and in case you wanted to use "extension" elsewhere and really wanted to get an empty string for extension when there is none.
Richard Walters
+1  A: 

I'd do this without regular expressions*. (Disclaimer: I'm not good with regular expressions, so that might be why.)

Consider this option.

string RemovePeriodsFromFilename(string fullPath)
{
    string dir = Path.GetDirectoryName(fullPath);
    string filename = Path.GetFileNameWithoutExtension(fullPath);
    string sanitized = filename.Replace(".", string.Empty);
    string ext = Path.GetExtension(fullPath);

    return Path.Combine(dir, sanitized + ext);
}

* Whoops, looks like you said using regular expressions was a requirement. Never mind! (Though I have to ask: why?)

Dan Tao
Perhaps it's homework on RegEx 101.
ChrisF