tags:

views:

2867

answers:

5

I'm trying to write a regex that will parse out the directory and filename of a fully qualified path using matching groups.

so...

/var/log/xyz/10032008.log

would recognize group 1 to be "/var/log/xyz" and group 2 to be "10032008.log"

Seems simple but I can't get the matching groups to work for the life of me.

NOTE: As pointed out by some of the respondents this is probably not a good use of regular expressions. Generally I'd prefer to use the file API of the language I was using. What I'm actually trying to do is a little more complicated than this but would have been much more difficult to explain, so I chose a domain that everyone would be familiar with in order to most succinctly describe the root problem.

+1  A: 

What language? and why use regex for this simple task?

If you must:

^(.*)/([^/]*)$

gives you the two parts you wanted. You might need to quote the parentheses:

^\(.*\)/\([^/]*\)$

depending on your preferred language syntax.

But I suggest you just use your language's string search function that finds the last "/" character, and split the string on that index.

ΤΖΩΤΖΙΟΥ
Many frameworks (e.g. .NET/Python) have methods for separating file names from paths without needing to manually search for the '/' character. This is great because the tools are typically platform-independent.
j0rd4n
Yes, but he hasn't specified language yet. If it was Python, I would suggest os.path.dirname and os.path.basename .
ΤΖΩΤΖΙΟΥ
+1  A: 

Try this:

^(.+)/([^/]+)$
yjerem
Don't you want to make that non-greedy (if this anon regex can handle that) so that it doesn't have to backtrack all that way to the slash?
Axeman
This one assumes that there is a path and not just a filename.
Travis Illig
It also runs into problems with current directory (.) and root directory (/). The former isn't an issue (fully-qualified pathnames don't start with a dot); the latter might be. The regex also does not handle .. back-traversals - that might be OK because fully-qualified might mean no dot-dot bits.
Jonathan Leffler
+1  A: 

Regular Expression Library is a good regex website, lots of regexes already there and good community support.

Adam Neal
A: 

Try this:

/^(\/([^/]+\/)*)(.*)$/

It will leave the trailing slash on the path, though.

Lucas Oman
A: 

Most languages have path parsing functions that will give you this already. If you have the ability, I'd recommend using what comes to you for free out-of-the-box.

Assuming / is the path delimiter...

^(.*/)([^/]*)$

The first group will be whatever the directory/path info is, the second will be the filename. For example:

  • /foo/bar/baz.log: "/foo/bar/" is the path, "baz.log" is the file
  • foo/bar.log: "foo/" is the path, "bar.log" is the file
  • /foo/bar: "/foo/" is the path, "bar" is the file
  • /foo/bar/: "/foo/bar/" is the path and there is no file.
Travis Illig