views:

138

answers:

3
<?php
    $str = "word <a href=\"word\">word</word>word word";
    $str = preg_replace("/word(?!([^<]+)?>)/i","repl",$str);
    echo $str;
    # repl <word word="word">repl</word>
?>

source: http://pureform.wordpress.com/2008/01/04/matching-a-word-characters-outside-of-html-tags/

Unfortunality my project needs a semantic libs avaliable only for Java...

// Thanks Celso

+1  A: 

Before providing a further answer, are you trying to parse an html document? If so, don't use regexes, use an html parser.

Zak
my tool "generates" XHTML replacing terms in a txt in a new tags using the terms as a value inside of tag, i am using the replaceAll approach because some terms can be composited like "Celso Araujo Fontes". Example, how replaceAll myTerm in this situationmyTerm is <friends='z myTerm w'> cool friend </friend>
celsowm
+1  A: 

Use the String.replaceAll() method:

class Test {
  public static void main(String[] args) {
    String str = "word <a href=\"word\">word</word>word word";
    str = str.replaceAll("word(?!([^<]+)?>)", "repl");
    System.out.println(str);
  }
}

Hope this helps.

kolrie
Thanks !!! and perfect version with insensitive case : "(?i)word(?!([^<]+)?>)";
celsowm
+1  A: 

To translate that regex for use in Java, all you have to do is get rid of the / delimiters and change the trailing i to an inline modifier, (?i). But it's not a very good regex; I would use this instead:

(?i)word(?![^<>]++>)

According to RegexBuddy's Debug feature, when it tries to match the word in <a href="word">, the original regex requires 23 steps to reject it, while this one takes only seven steps. The actual Java code is

str = str.replaceAll("(?i)word(?![^<>]++>)", "repl");
Alan Moore
Thanks Alan for the explanation !
celsowm