tags:

views:

143

answers:

5

hello, i am trying to split a string by some keywords that aren't contained in parentheses...

so, let's say i have the string "i will meet you where we talked (but not where he said)". i want to split the string into 2 pieces. one containing: "i will meet you", the other containing "we talked (but not where he said)". in other words, i want a regex to match the "where" that isn't in parentheses and ignore the one that is.

thank you very much!
all the best

p.s. i asked a related question yesterday, but i realised that i thinked the whole thing wrong... so now i ask the right question...

+1  A: 

depends on how variable the input can be. If you are certain that it will always be in that format...(non-parenthetical-sentence followed by parenthetical sentence), you can be very simplistic and just search for the first "where".

Can you be more specific about what can be expected from the input? Can it have more than one parenthetical section? Can the sections be in any order?

Edit: It sounds like you can safely assume your input will be valid SQL. I would just search for all instances of "where" and then iterate through each one and count the number of open and closing parentheses before it. If the number of open parentheses before a given "where" string is equal to the number of close parentheses before it, it is the one you are looking for.

Brian Schroth
thank you for your answer.the string i need this to work can be very different, the only rule is that it must match the word "where" outside of any paranthesis...there can be any number of the word "where", but only one will be outside paranthesis and the order is unknown...here is an example of a string i want to apply it to:SELECT (SELECT COUNT(audioMelodii.id) FROM audioMelodii WHERE a='1') AS nrMelodii, audioArtisti.nume AS artist, audioAlbume.*FROM audioAlbumeWHERE audioAlbume.idArtist='$idArtist'thank you!
ant
thank you! i will try that!
ant
Yes, this sounds like a better idea than to incorporate some complex regex you have no idea will work for future input. +1
Bart Kiers
A: 

If you know that your "where" of interest will always happen first and that there will only be one "where" of interest, then this will work:

$array=explode('where',$string,2);
dnagirl
A: 

Try the following regex:

/(?<!\([^\)]*)where/

The (?<!...) is a negative lookbehind, which asserts that the string matched (where) must not be preceded by an open paren \( following by any number of characters that aren't a close paren [^\)]*.

JSBangs
PHP does not support that kind of variable look behind. The number of characters to look behind, has to be fixed.
Bart Kiers
? This got accepted as the correct answer, yet it does not work...
Bart Kiers
i changed it a little, but it works... i hope it will apply in any case... (?<!\()[^)]*\bWHERE\b(?![^)]*\))thank you very much!
ant
this is just my personal opinion, but using a regex where you're not sure if it will work or not is a very dangerous place to be. If you're using a regex you should know what it does so you can figure out if it will work, or do very extensive unit testing so you can be sure it works for every potential convoluted input. If you find yourself saying "I hope it will work" when dealing with a regex, be careful.
Brian Schroth
@ant: `(?<!\()[^)]*\bWHERE\b(?![^)]*\))` does not match any of the two `WHERE`'s in your example string. Your regex will only match `WHERE`'s that do not have a `)` in front of them, regardless if these `WHERE`'s are inside or outside parenthesis. I don't recommend using that regex: it is apparent you know very little regex, so why use them at all?
Bart Kiers
Brian > thank you... i will be careful... i tested it a little and i will do more testing along the way to be sure it is alright.
ant
I offered an alternate solution that does not use regex, that might be more suitable.
Brian Schroth
it seems i talked too soon... i apologise!and it's true! i know very little regex... that's why i asked for your help... i need it in an php application and it seems i can't find another solution other than regex... Brian Schroth offered one and i will try that.thank you all again!if i find a good solution i'll post it here... all the best!
ant
A: 

It was proved way back in the 60s that traditioal Regexes can't solve your specific problem. Fortunately many new features have been added over the years. The feature you need is called "balancing groups", and allows a kind of a counter to be used inside a Regex.

Here is a simple solution to your problem:

^
(?'BeforeWhere'
  ((?'Open'\()|(?'Close-Open'\))|[^()])*  # count balanced parens
  (?(Open)^|)  # make sure parens actually balanced
)
(?'Where'\ where\ )
(?'AfterWhere'
  ((?'Open'\()|(?'Close-Open'\))|[^()])*  # count balanced parens
  (?(Open)^|)  # make sure parens actually balanced
)
$

This will result in BeforeWhere, Where and AfterWhere named groups. If you want to find multiple 'Where' instances outside of parenthesis you can add a ( before (?'Where' and a ) just before the final $

The only tricky part of this regex is making sure parens are actually balanced. In that line, if Open is still defined the parens are unbalanced so I match ^ which of course fails. Otherwise I match nothing, and the match proceeds.

Ray Burns
Agreed, PHP's regex flavor can cope with nesting, but good luck maintaining that monster. This especially is not an option for ant (the OP), who is not so fluent in regex, to put it mildly.
Bart Kiers
I simplified my solution quite a bit, so maybe it isn't a "monster" any more. I hope.
Ray Burns
:) To some it may not, but I'm sure it will haunt the OP for the rest of his life it s/he adds it to his/her code base!
Bart Kiers
Ray Burns - thank you very much! i'll try it right now!Bart - you're right that i don't know very much regex... but you accentuate this too much :)
ant
I accentuate this, because I want to emphasize the danger of blindly copy-pasting something you have no idea will work. And therefor are not able to adjust when things stop working. But by all means, don't mind me: copy-paste all the regexes you think work. I'll leave you be.
Bart Kiers
i understand what you mean...but i want to assure you that this will not remain a copy / paste thing... i will test it and try to understand it as much as i can... i talked too soon before and that's why i left the impression you got.sorry for that!
ant
No hard feelings ant.
Bart Kiers
thank you! i'm reading about balancing groups to understand the solution Ray Burns gave me...
ant
Okay, will you account for cases like this: `SELECT FROM table WHERE column = 'text WHERE and )))';` (note the `where` and `)` inside string literals!)
Bart Kiers
The RegEx I wrote only accounts for parenthesis. Extending to ignore 'WHERE' in string literals would require changing the ((?'Open'\()|(?'Close-Open'\))|[^()])* line to omit '" as well, eg ((?'Open'\()|(?'Close-Open'\))|[^()'"])*, then surrounding it with code to handle quotes.
Ray Burns
A: 

i finnally got it solved with Brian Schroth solution... i made a php function that return every main part of a sql select statement (SELECT, FROM, JOINS, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT) including any number of subqueries contained in that part... if anyone wants this function, i'll post it...

thank you all very much!

ant