views:

26

answers:

1

Say I request

parent/child/child/page-name

in my browser. I want to extract the parent, children as well as page name. Here are the regular expressions I am currently using. There should be no limit as to how many children there are in the url request. For the time being, the page name will always be at the end and never be omitted.

^([\w-]{1,}){1} -> Match parent (returns 'parent')
(/(?:(?!/).)*[a-z]){1,}/ -> Match children (returns /child/child/)
[\w-]{1,}(?!.*[\w-]{1,}) -> Match page name (returns 'page-name')

The more I play with this, the more I feel how clunky this solution is. This is for a small CMS I am developing in ASP Classic (:(). It is sort of like the MVC routing paths. But instead of calling controllers and functions based on the URL request. I would be travelling down the hierarchy and finding the appropriate page in the database. The database is using the nested set model and is linked by a unique page name for each child.

I have tried using the split function to split with a / delimiter however I found I was nested so many split statements together it became very unreadable.

All said, I need an efficient way to parse out the parent, children as well as page name from a string. Could someone please provide an alternative solution?

To be honest, I'm not even sure if a regular expression is the best solution to my problem.

Thank you.

+1  A: 

You could try using:

^([\w-]+)(/.*/)([\w-]+)$

And then access the three matching groups created using Match.SubMatches. See here for more details.

EDIT

Actually, assuming that you know that [\w-] is all that is used in the names of the parts, you can use ^([\w-]+)(.*)([\w-]+)$ instead and it will handle the no-child case fine by itself as well.

VeeArr
Thank you very much.
Mike