tags:

views:

184

answers:

5

I am new to regex and am looking to trim a known number of characters off the end of a string. The string represents a filepath, so instead of c:\test\test1\test2, I would like to strip off the trailing characters leaving c:\test.

The trouble that I am having is with the backslashes.

What sort of regex would I use to do this?

+14  A: 

Some people, when confronted with a problem, think "I know, I'll use regular expressions."
Now they have two problems.

Since you're dealing with file paths, you can use the methods of the Path class to calculate the result:

string GetPathFirstLevel(string path)
{
    while (Path.GetDirectoryName(path) != Path.GetPathRoot(path))
    {
        path = Path.GetDirectoryName(path);
    }

    return path;
}

This will return the following values:

GetPathFirstLevel(@"c:\test\test1\test2")  //  @"c:\test"
GetPathFirstLevel(@"c:\test")              //  @"c:\test"
GetPathFirstLevel(@"c:")                   //  null
dtb
Using the Path classes is the right way to go.
LBushkin
Good answer, and I agree that this is the right approach, however, it doesn't answer the question and that is 'using regex how do I do this'
BenAlabaster
@BenAlabaster: That's why I upvoted your answer :-)
dtb
@dtb - I shared the love also
BenAlabaster
+2  A: 

Note: I'd like to point out that Regex isn't really the most appropriate tool for this job, it's more appropriate to use the File.IO API to check for path validity - for which I'll point to @dtb's answer.

However, in direct answer to your question without debating the merits of other approaches is this:

The regular expression string used to extract C:\Test from C:\Test\Test\Test\Test where you want the [Drive]:\RootFolder from any given path is:

"[a-zA-Z]:\\[^\\]+"

[a-zA-Z] gives you any single character in the character range a-z or A-Z, thus covering upper and lower case.

followed by a :

followed by \ (\ is an escape character so it must be escaped to use it - you escape the character by prefixing it with a \ so where you want \ you put \\ - make sense?)

[^\]+ means the remainder of the string up to but not including the next instance of a \ or any characters after it.

Also, you can use characters 'unescaped' if you wish by preceding the string with an @ symbol outside the quotes, like so:

@"[a-zA-Z]:\[^\]+"
BenAlabaster
+1  A: 

No sort of regex. If you know how many characters you wish to remove you simply use substring...

//numberOfChars is known...

string result = inputString.Substring(0, inputString.Length - numberOfChars -1);
Jason Punyon
Correct, but using the Path class is much cleaner.
Steven Sudit
+1  A: 

Personally I would use string.split() and Path.DirectorySeparatorChar to figure out what to split on.

jeffamaphone
+1  A: 

The Path solutions are better but if you still want the regex (for learning reasons) here it is

Regex.Replace(@"c:\aaa\bb\c", @"^([^\\]*\\[^\\]*)\\.*", @"$1")

To break it down:

^      // begins with
(      // start capturing what you want to save
[^\\]* // zero or more characters that are _not_ backslash
\\     // followed by a backslash
[^\\]* // again zero or more characters that are _not_ backslash
)      // stop capturing
\\     // a backslash
.*     // followed by anything

Then the $1 gives the value of the capture (i.e. the text that matched what was in the first parentheses).

Motti