You can use lookahead instead of making ['\"\< >]
part of your match, i.e.:
(http:\/\/.*?)(?=['\"\< >])
Generally speaking, whereas ab
matches ab
, a(?=b)
matches a
(if it's followed by b
).
References
Related questions
Capturing group option
Lookarounds are not supported by all flavors. More widely supported are capturing groups.
Generally speaking, whereas (a)b
still matches ab
, it also captures a
in group 1.
References
Related questions
Negated character class option
Depending on the need, often times using a negated character class is much better than using a reluctant .*?
(followed by a lookahead to assert the terminator pattern in this case).
Let's consider the problem of matching "everything between A
and ZZ
". As it turns out, this specification is ambiguous: we will come up with 3 patterns that does this, and they will yield different matches. Which one is "correct" depends on the expectation, which is not properly conveyed in the original statement.
We use the following as input:
eeAiiZooAuuZZeeeZZfff
We use 3 different patterns:
A(.*)ZZ
yields 1 match: AiiZooAuuZZeeeZZ
(as seen on ideone.com)
- This is the greedy variant; group 1 matched and captured
iiZooAuuZZeee
A(.*?)ZZ
yields 1 match: AiiZooAuuZZ
(as seen on ideone.com)
- This is the reluctant variant; group 1 matched and captured
iiZooAuu
A([^Z]*)ZZ
yields 1 match: AuuZZ
(as seen on ideone.com)
- This is the negated character class variant; group 1 matched and captured
uu
Here's a visual representation of what they matched:
___n
/ \ n = negated character class
eeAiiZooAuuZZeeeZZfff r = reluctant
\_________/r / g = greedy
\____________/g
References
Related questions