tags:

views:

53

answers:

2

Hello, I'm trying to create a regular expression to match some certain characters, unless they appear within two of another character.

For example, I would want to match abc or xxabcxx but not tabct or txxabcxt.
Although with something like tabctxxabcxxtabcxt I'd want to match the middle abc and not the other two.

Currently I'm trying this in Java if that changes anything.

A: 

EDITED! It was way wrong before.

Oooh, this one's tougher than I thought. Awesome. Using fairly standard syntax:

[^t]{2,}abc[^t]{2,}

That will catch xxabcxx but not abc, xabc, abcx, xabcx, xxabc, xxabcx, abcxx, or xabcxx. Maybe the best thing to do would be:

if 'abc' in string:
    if 't' in string:
        return regex match [^t]{2,}abc[^t]{2,}
    else:
        return false
else:
    return false

Is that sufficient for your intention?

Asker
I'm unsure, I would want to catch all the abc, except when they are surrounded by 2 of t. And t is always paired if it's in there
CrisisSDK
I just edited my answer to more correctly answer what I had thought your question was. But your comment now confuses me. Can you rephrase?
Asker
My comment confuses me too...Well, pretty much I have a string, and somewhere in it there will be abc in more more than 1 place, but some of these will be surrounded by t, and I want to ignore those matches, the t is kind of a "ignore everything in this" thing, for the purposes of this match (they're used by a different part of the program)...I'm really bad at explaining things.
CrisisSDK
... are you trying to parse xml or html?
Asker
Actually I'm not. It's actually an expression parser, I have a character that I you can surround the parts of the expression you don't want it to change. So, it might be !abc!abc or "abc"abc or something, depending on what character t is representing. And I want it to only get out the abc that isn't enclosed. The problem is there's lots of other random characters throughout it surrounding them that I only care about after this match is done.
CrisisSDK
I don't know. Sorry.
Asker
+1  A: 

Try this:

String s = "tabctxxabcxxtabcxt";
Pattern p = Pattern.compile("t[^t]*t|(abc)");
Matcher m = p.matcher(s);
while (m.find())
{
  String group1 = m.group(1);
  if (group1 != null)
  {
    System.out.printf("Found '%s' at index %d%n", group1, m.start(1));
  }
}

output:

Found 'abc' at index 7

t[^t]*t consumes anything that's enclosed in ts, so if the (abc) in the second alternative matches, you know it's the one you want.

Alan Moore
Thanks, it looks good.
CrisisSDK