Using [XXX]+
means any character that's between [
and ]
, one or more than one time.
Here, you didn't put any character between [
and ]
-- hence the problem.
If you want to say "any possible character", you can use a .
Note : by default, it will not match newlines ; you'll have to play with Pattern Modifiers if you want it to.
If you want to say any letter, you can use :
- for lower case :
[a-z]
- for upper-case :
[A-Z]
- for both :
[a-zA-Z]
And, for numbers :
[0-9]
: any digit
[a-zA-Z0-9]
: any lower-case or upper-case letter, and any number.
At that point, you will probably want to take a look at :
- The Backslash section of the PCRE manual
- And, especially, the
\w
meta-character, which means "any word character"
After that, when you'll begin using a regex such as
/.+/s
which should match :
- Any possible character
- One or more time
You'll see that it doesn't "stop" when you expect it too -- that's because matching is greedy, by default -- you'll have to use a ?
after the +
, or use the U
modifier ; see the Repetition section, for more informations.
Well, actually, the best thing to do would be to invest some time, carefully reading everything in the PCRE Patterns section of the manual, if you want to start working with regexes ;-)
Oh, and, BTW : using regex to parse HTML is a bad idea...
It's generally much better to use a DOM Parser, such as :