tags:

views:

161

answers:

4

Hi

Consider this string

hello awesome <a href="" rel="external" title="so awesome is cool"> stuff stuff

What regex could I use to match any occurence of awesome which doesn't appear within the title attribute of the anchor?

So far, this is what I've came up with (it doesn't work sadly)

/[^."]*(awesome)[^."]*/i

Edit

I took Alan M's advice and used a regex to capture every word and send it to a callback. Thanks Alan M for your advice. Here is my final code.

 $plantDetails = end($this->_model->getPlantById($plantId));

        $botany = new Botany_Model();
        $this->_botanyWords = $botany->getArray();
        foreach($plantDetails as $key=>$detail) {
            $detail = preg_replace_callback('/\b[a-z]+\b/iU', array($this, '_processBotanyWords'), $detail); 
            $plantDetails[$key] = $detail;  
        }

And the _processBotanyWords()...

   private function _processBotanyWords($match) {
        $botanyWords = $this->_botanyWords;  
        $word = $match[0];
        if (array_key_exists($word, $botanyWords)) {    
            return '<a href="' . PATH_BASE . 'articles/botany-words/#botany-word-' . str_replace(' ', '-', strtolower($word)) . '" title="' . trim($botanyWords[$word]) . '">' . $word . '</a>';
        } else {
            return $word;
        }
    }

Hope this well help someone else some day! Thanks again for all your answers.

+3  A: 

This subject comes up pretty much every day here and basically the issue is this: you shouldn't be using regular expressions to parse or alter HTML (or XML). That's what HTML/XML parsers are for. The above problem is just one of the issues you'll face. You may get something that mostly works but there'll still be corner cases where it doesn't.

Just use an HTML parser.

cletus
Ah I was hoping it wouldn't come to using a parser..
alex
+1  A: 

Sure, using a parsing library is the industrial-strength solution, but we all have times were we just want to write something in 10 seconds and be done. Next time you want to process the meaty text of a page, ignoring tags, try just run your input through strip_tags first. This way you will get only the plain, visible text and your regex powers will again reign supreme.

rndmcnlly
+2  A: 

Asssuming this is related to the question you posted and deleted a little while ago (that was you, wasn't it?), it's your fundamental approach that's wrong. You said you were generating these HTML links yourself by replacing words from a list of keywords. The trouble is that keywords farther down the list sometimes appear in the generated title attributes and get replaced by mistake--and now you're trying to fix the mistakes.

The underlying problem is that you're replacing each keyword using a separate call to preg_replace, effectively processing the entire text over and over again. What you should do is process the text once, matching every single word and looking it up in your list of keywords; if it's on the list, replace it. I'm not set up to write/test PHP code, but you probably want to use preg_replace_callback:

$text = preg_replace_callback('/\b[A-Za-z]+\b/', "the_callback", $text);

"the_callback" is the name of a function that looks up the word and, if it's in the list, generates the appropriate link; otherwise it returns the matched word. It may sound inefficient, processing every word like this, but in fact it's a great deal more efficient than your original approach.

Alan Moore
Alan, yes that was me. I figured I'd probably worded it wrong, so I reposted in a simplified manner. I hope that's not against etiquette, and I apologise if it is. Noone gave me a working solution anyway, nor was anyone voted up. I have used preg_replace_callbacl() before, but I had forgotten how handy it is. Thanks for your answer +1
alex
Some extra info... I'm processing a lot of strings, and trying to find if values in an array (more specifically, keys) exist in those strings. Then I'd like to provide a link to their definition, and a title tag to boot. I'll give it some more thought tomorrow.
alex
Okay, I've got this working now. Thanks Alan M, I'll post my solution.
alex
+1  A: 

This is so horrible I hesitate to post it, but if you want a quick hack, reverse the problem--instead of finding the stuff that isn't X, find the stuff that IS, change it, do the thing and change it back.

This is assuming you're trying to change awesome (to "wonderful"). If you're doing something else, adjust accordingly.

$string = 'Awesome is the man who <b>awesome</b> does and <a href="awesome.php" title="awesome">awesome</a> is.';

$string = preg_replace('#(title\s*=\s*\"[^"]*?)awesome#is', "$1PIGDOG", $string);

$string = preg_replace('#awesome#is', 'wonderful', $string);

$string = preg_replace('#pigdog#is', 'awesome', $string);

Don't vote me down. I know it's hack.

LibraryThingTim
+1 -- it works...
nickf