views:

183

answers:

8

I'm trying to match everything, but everything between [ and ].

everything between [ and ] is

\[.+\]

everything, but everything between [ and ] is

[^(\[.+\])]+

The search text is

valid[REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST]

It matches "valid" and "REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST".

It is supposed to match "valid", but not "REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST".

How to solve?

I want my php validation class to be like the CodeIgniter one...

+2  A: 

I believe you'll find your answer in what is called negative lookahead. It allows you to include patterns in your search without actually including them in your match.

/^.*(?!\[.+\])$/

(?! ... ) being the negative lookahead part.

James Maroney
I tested your suggestion in Regex Tester's regexpal.com and in Komodo IDE's Rx Toolkit, but it didn't work :-(
Delirium tremens
Even so, thanks for introducing me to negative lookahead!
Delirium tremens
my apologies. This is the first time I've needed to use it, but this was actually a case for positive lookahead (?=I think you'll find that now matches :)
James Maroney
To get it to work in regex pal, here was the actual regex: .*(?=\[.+\])
James Maroney
+5  A: 

[^(\[.+\])]+ doesn't mean what you think it means.

Literally, it means "match any character except any one of these ()[.+] one or more times."

[] are a character set, matching against one of the characters inside that set (or not matching them if it starts with a ^)

R. Bemrose
+1  A: 
$string = 'valid[REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST]';

preg_match('#(\S+)\[.+?\]#', $string, $match);

echo $match[1];
kemp
You'd have to make it ungreedy, or to replace the `.` with `[^\]]`
troelskn
You're right, although it might not be needed depending on the input text.
kemp
+1  A: 

Try this based on the data you're using 'valid[REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST]'

^[^[]+   

It will return "valid".

Depending no the language you're using, you might need to escape the [, so write it like this ^[^\[]+

This regex assumes that there will never be a "[" in the text preceding [REGEX_EMAIL|REGEX_PASSWORD|REGEX_TEST]

I tested this using Eric Gunnerson's RegexWorkbench for .NET

nickyt
A: 

This is three different kinds of text, if I understand correctly:

  1. From the beginning up to the first [
  2. Between a ] and a [
  3. From the last ] to the end

Given this, there is a regex for each:

  1. ^([^\x5B]*)[
  2. ]([^\x5B\x5D]*)[
  3. ]([^\x5D]*)$

(x5B and x5D are the hex escapes for left and right bracket.) Note that the match of the entire expression will include the brackets that mark the boundaries; sub-expression 1 gives the match excluding the bracket.

silverpie
Probably the most unreadable way to write it, no idea why you would use the hex code for a simple square bracket.
kemp
Understood--you could equally use \[ and \] as the bracket characters if that's your preference. I find it easier than trying to distinguish whether a given bracket is really closing the character class or serving as a literal. With a regex-aware syntax highlighter, that wouldn't be an issue, and using \[ and \] would be superior.
silverpie
A: 

Try a positive look-ahead assertion (which itself is not captured)

^.*(?=\[.+\])
A. Blanc
A: 

You don't have to be a master of regular expressions, when there is Regex Tester. Just type in some test data and play around with a regex until you get a desired result. There is also Quick Reference to help you out on the right hand side.

Well, at least that's how I deal with them.

Cinnamon
+1  A: 

I've checked the source code of CI's Validation class now.

They allow rules to be set like

array('field' => "valid|length[5]|foo|callback_bar")

I didn't see any nested square brackets or pipes inside the square brackets. The Docs clearly say, you may have only one param. The string is set internally to $_rules. When validating, the string will be exploded into an array first, so the above would evaluate to four $rules.

'field' => array('valid', 'length[5]', 'foo', 'callback_bar')

They then loop through the array, checking if the $rule is a callback with substr(). Then they check if there is square brackets in the $rule with the pattern "/(.*?)\[(.*?)\]/" and if so, take it off the $rule and store the inner part of the brackets as $param. And finally, they just execute the $rule as a variable function with the detected param, e.g. $rule(POST[$field], 5);

As you can see, they are not splitting everything in one go. This does not answer your question, but shedding some light on CI's internal logic to get their Validator running might help you rethink your approach.

Opinion: I'd like to add that their approach is terrible. Validator Chains are prime candidates for the Command Pattern. Sure, it's nice to specify validators by small and compact strings, but you pay this by a lot of ugly string juggling, when it comes to actually running the chain. Have a look at how Zend Framework does it or look at PHPs native filter functions.

Gordon