views:

79

answers:

2

In my regex, I want to say that within the sample text, any characters are allowed, including a-z in upper and lower case, numbers and special characters.

For example, my regular expression may be checking that a document is html. therefore:

"/\n<html>[]+</html>\n/"

i have tried []+ but it does not seem to like this?

+1  A: 

the dot . is the meta character for "any character"

knittl
Note that by default, the `.` does not match line breaks.
Bart Kiers
Don't forget the "s" (PCRE_DOTALL) modifier if you want it to match newlines as well.
Matti Virkkunen
How does that work?
Mith
You add the modifiers after your ending delimiter. http://fi2.php.net/manual/en/regexp.reference.delimiters.php
Matti Virkkunen
+1  A: 

Using [XXX]+ means any character that's between [ and ], one or more than one time.

Here, you didn't put any character between [ and ] -- hence the problem.


If you want to say "any possible character", you can use a .
Note : by default, it will not match newlines ; you'll have to play with Pattern Modifiers if you want it to.

If you want to say any letter, you can use :

  • for lower case : [a-z]
  • for upper-case : [A-Z]
  • for both : [a-zA-Z]

And, for numbers :

  • [0-9] : any digit
  • [a-zA-Z0-9] : any lower-case or upper-case letter, and any number.


At that point, you will probably want to take a look at :

  • The Backslash section of the PCRE manual
  • And, especially, the \w meta-character, which means "any word character"


After that, when you'll begin using a regex such as

/.+/s

which should match :

  • Any possible character
    • Including newlines
  • One or more time

You'll see that it doesn't "stop" when you expect it too -- that's because matching is greedy, by default -- you'll have to use a ? after the +, or use the U modifier ; see the Repetition section, for more informations.


Well, actually, the best thing to do would be to invest some time, carefully reading everything in the PCRE Patterns section of the manual, if you want to start working with regexes ;-)


Oh, and, BTW : using regex to parse HTML is a bad idea...

It's generally much better to use a DOM Parser, such as :

Pascal MARTIN
Thank you for the tips. I am rather new to this! :-)I am not actually using it for html though. :-D
Mith
You're welcome :-) *(And glad to hear that, about HTML ;-) )*
Pascal MARTIN