tags:

views:

207

answers:

2

I want to match a String which looks like this:

[lang_de]Hallo![/lang_de][lang_en]Hello![/lang_en]HeyHo[lang_es]Hola![/lang_es]

I want the matching to return true if there is text which is not enclosed by lang_tags (in this example: HeyHo). It could also be positioned at the beginning or end of the string.

Whitespace should NOT match, e.g. [lang_de]Tisch[/lang_de] [lang_en]Table[/lang_en] should not cause a match.

I can't use lookahead or lookback, because MySQL doesnt seem to support this.

Any Suggestions?

+2  A: 

Try this regex:

'^ *[^[ ]|\\[/lang_[a-z]{2}\\] *[^[ ]'

This is how you can use it:

select * 
from <table> 
where <field> regexp '^ *[^[ ]|\\[/lang_[a-z]{2}\\] *[^[ ]'

It should handle all cases:

  • Before
  • After
  • Middle
  • Not whitespace
Senseful
I can't really fully understand your REGEX (bear with me I'm noobish), but it seems to work ;)YOU are my ROCKSTAR!
jostey
If this is the solution you chose and worked for you, you should mark it as so. It should be right by the vote up/down indicator.
Raegx
Can't upvote yet, solution marked
jostey
I'll try to explain it: The first part of the regex (everything up to the '|' character) will try to match a non whitespace non '[' character at the start of the string, thus if your string starts with 'A[lang...' it will return true cause of the 'A'. Next, we're looking for an ending [/lang] tag and then checking for the same exact regex as before (i.e. not whitespace, not '['). This will cause '...[/lang_en]A[lang...' to match cause there was an 'A' after the closing lang tag. This will also handle the case of text at the end of the string e.g. '...[/lang_en]A'.
Senseful
A: 

my best attempt is:

SELECT * FROM tbl1 WHERE ('[lang_de]Hallo![/lang_de][lang_en]Hello![/lang_en]HeyHo[lang_es]Hola![/lang_es]' NOT REGEXP '^(\\[lang_[a-z]{2}\\][^\[]*\\[\\/lang_[a-z]{2}\\])*$')

Let us know how you got on.

Question Mark
The problem with this regex is that the * operator is greedy so it doesn't work with the test data provided. So what will happen is your first [lang_*] will match [lang_de] and the second [/lang_*] will match all the way to [/lang_es]. So it won't repeat at all and think that it matched successfully. It will however solve the problem of text in the beginning or end of the string.
Senseful
Yes true, i'm always bitten by .*
Question Mark
True, happens me quite often... learning to regexp is quite hard
jostey