views:

75

answers:

3

Looking for a regular expression that match the following relative URLs:

All urls starts with /abc/ but we don't know where it ends and it doesn't have any file extension at the end.

/abc/xyz

/abc/part1/part2

/abc/red/blue/white/green/black

and so on.

Any help?

A: 
/^\/abc\//

However my guess is you are missing something in the question....

RoToRa
Thanks for your effort but not working. Well, the issue we are facing is that length of the relative URL is not known and it don't end with a dot (.) appended with file extension. We do have created a regex for relative url that ends with . and file extension "/abc/.*?[^\x00-\x1F\"<>\|:\*\?/]+\.[a-zA-Z0-9]{3,4}" but not able to do so for unknown length.
Malik
maybe you should explain your scenario a better. I'm concluding you are searching a (basicly random) text document for anything that is a URL starting with abc? And what planguage/regex engine are you using?
RoToRa
@RoRoRa: Right now we are using http://www.radsoftware.com.au/regexdesigner/ and http://regexhero.net/tester/Yes. We want is to get all the relative Urls from files that strats with /abc/ till end but we don't have idea where or how to stop.In other case that mentioned in my previous comment, we used file extension to stop but this is not available in scenarios where relative urls don't have a file extension at the end.to build the regex.
Malik
Then I can only point you to Gumbo's answer, which seems fine to me. Maybe your tools use a different regexp syntax?
RoToRa
@RoRoRa: Thanks a lot for your help.
Malik
A: 

Try this regular expression:

/abc/(?:[A-Za-z0-9-._~!$&'()*+,;=:@]|%[0-9a-fA-F]{2})*(?:/(?:[A-Za-z0-9-._~!$&'()*+,;=:@]|%[0-9a-fA-F]{2})*)*

This is derived from the ABNF of a generic URI.

Gumbo
Thanks for your effort but it is giving me error "Error parsing regular expression" http://regexhero.net/tester/
Malik
@Malik: It works for me.
Gumbo
@Gumbo: Let me try again
Malik
@Malik: I guess that the error you see depends from the programming language.
kiamlaluno
@Gumbo: Thanks a lot for your help and support.
Malik
@Malik: You’re welcome!
Gumbo
@Gumbo: We worked on the regex you provided with an excellent concept of "backreference". Tried to tweak and comes up with follwoing regix that might help others /abc(?:/(?:[^\x00-\x1F\"<>\\|:\\*\\?/#])*)* Thanks again. P.S. The concept is that instead of looking for valid characters, we go for checking characters that are not valid.
Malik
@Malik: A plain space, `[`, `]`, `^`, are not a valid character in a URL path segment; and `%` is only allowed when followed by two hexadecimal digits. However, `:`, `*`, and `~` are allowed in plain.
Gumbo
@Gumbo: Yes, indeed these are the issues for valid and invalid character and that's why I specified in the end of previous comment. Any other user can modify the list of invalid character. This regix don't gurantee prefect results for URL characters but for file system, hopefully will work most of the time :). Thanks again.
Malik
@Malik: File system paths may have other restrictions than the URI path has. Remember: A URI is not a file name.
Gumbo
@Gumbo: Rightly so. We are working right now to make sure that both conditions are met :). Thanks again.
Malik
@Gumbo: We are facing a slight problem with space. We have two scenarios:/abc/part%20Red//abc/part Red/First relative URl is retrived but second relative URL returned/abc/partinstead of/abc/part Red/We cann't add \s in our allowed or disallowed character list.
Malik
A: 

The answer I write here is for PHP; at the moment I am writing it, you didn't report for which programming language you are interested to have an answer.

  • If you want a regular expression to check if the path starts with /abc/

    preg_match('|^/abc/.*$|i', $path);

  • If you want a regular expression that returns the part after /abc/

    preg_match('|^/abc/(.*)$|i', $path, $matches);

In both the cases, regular expressions can be avoided; you can use string functions.

kiamlaluno
Thanks for your effort. We can use string functions but this will take a lot more time. We have to parse files in thousnads in numbers :(
Malik