views:

92

answers:

1

I have a bunch of URLs to parse in PHP, like this:

www.example.com/shopping
shopping.example.com
example.com/pages/shopping

for about 100 different pages (not just shopping - some are contact, some are directions, etc.). I have a seed set of data, which tells me where to look for the page names, like this:

www.example.com/[pagename]
[pagename].example.com
example.com/users/[pagename]

My question is, how do I get the page name from a URL using the seed data to tell me where it is?

So if I use the URL www.example.com/shopping I want to compare it to www.example.com/[pagename], and then just give me the page name of "shopping", pulled out from the string.

+1  A: 

Replace [xyz] with (.+) in the second list and you have a regular expression you can use to match and extract the values from the first list. You will also need to escape characters like . in the rest of the line so that they don't get interpreted as special characters.

To make the replacement, use:

preg_replace('/\[[^]]++\]/', '(.+)', $target)

where ++ is a possessive quantifier.

Mark Byers
hi, thank you! could u please paste the regex code needed to do that replace?
alexia
Post updated with regex for replacement.
Mark Byers
i think you got an extra closing square bracket in there ..
Lukman
@Lukman: the first closing square bracket is interpreted as a literal character because of its position right after `[^`. You could escape it if you want, though that is not necessary.
Geert
@Mark Byers: I'd probably make the `[^]]+` part possessive: `[^]]++`. That prevents needless backtracking in case the target string doesn't contain a closing square bracket.
Geert
@Mark Byers: Also I'd opt for `.+` instead of `.*` since empty pagenames are most likely not to be considered valid.
Geert
Thanks for your comments, Geert. I've updated the post.
Mark Byers