tags:

views:

39

answers:

3

i want to retrieve following urls with a regex:

 HREF="http://www.getty.edu/vow/TGNFullDisplay?find=&place=&nation=&english=Y&subjectid=7009830"

 HREF="http://www.getty.edu/vow/TGNFullDisplay?find=&place=&nation=&english=Y&subjectid=7009830&ptype=PF"

the difference is the ending. the first one omits the &ptype=PF and the last one includes it.

at the moment im using this pattern:

 protected $uriPattern = '/http:\/\/www\.getty\.edu\/vow\/.*?\?find=&place=&nation=&english=Y&subjectid=......./i';

but that works only for the first one.

i wonder how the regex pattern would look like for the preg_match_all to match both of them. thanks for help.

+1  A: 

Try this

protected $uriPattern = '/http:\/\/www\.getty\.edu\/vow\/.*?\?find=&place=&nation=&english=Y&subjectid=.......(&ptype=PF){0,1}/i';
Pete McKinney
+3  A: 

If there is an optional part in the strings you are matching, you can add (optional)?, in your case (&ptype=PF)?.

Otto Allmendinger
A: 

I was going to suggest the more succinct

"/http://www\.getty\.edu/vow/TGNFullDisplay\?find=&place=&nation=&english=Y&subjectid=.+(&ptype=PF)?/i"

The forward slashes are not special in either PHP nor RegEx, and thus do not need to be escaped, and the ID could be a different length.

Synetech inc.
i thought that too..but i dont get why using .+ wont work! i tried .+?, (.)+?, (,)+ too but it doesnt work.
weng
how about using a (reasonable) range for the ID then: `subjectid=[0-9]{1,10}`
Synetech inc.