ansaurus

Question

Answer 1

+3 A:

This subject comes up pretty much every day here and basically the issue is this: you shouldn't be using regular expressions to parse or alter HTML (or XML). That's what HTML/XML parsers are for. The above problem is just one of the issues you'll face. You may get something that mostly works but there'll still be corner cases where it doesn't.

Just use an HTML parser.

cletus 2009-04-30 06:41:22

Ah I was hoping it wouldn't come to using a parser..

alex 2009-04-30 06:41:52

Answer 2

+1 A:

Sure, using a parsing library is the industrial-strength solution, but we all have times were we just want to write something in 10 seconds and be done. Next time you want to process the meaty text of a page, ignoring tags, try just run your input through strip_tags first. This way you will get only the plain, visible text and your regex powers will again reign supreme.

rndmcnlly 2009-04-30 06:48:29

Answer 3

+2 A:

Asssuming this is related to the question you posted and deleted a little while ago (that was you, wasn't it?), it's your fundamental approach that's wrong. You said you were generating these HTML links yourself by replacing words from a list of keywords. The trouble is that keywords farther down the list sometimes appear in the generated title attributes and get replaced by mistake--and now you're trying to fix the mistakes.

The underlying problem is that you're replacing each keyword using a separate call to preg_replace, effectively processing the entire text over and over again. What you should do is process the text once, matching every single word and looking it up in your list of keywords; if it's on the list, replace it. I'm not set up to write/test PHP code, but you probably want to use preg_replace_callback:

$text = preg_replace_callback('/\b[A-Za-z]+\b/', "the_callback", $text);

"the_callback" is the name of a function that looks up the word and, if it's in the list, generates the appropriate link; otherwise it returns the matched word. It may sound inefficient, processing every word like this, but in fact it's a great deal more efficient than your original approach.

Alan Moore 2009-04-30 07:32:38

Alan, yes that was me. I figured I'd probably worded it wrong, so I reposted in a simplified manner. I hope that's not against etiquette, and I apologise if it is. Noone gave me a working solution anyway, nor was anyone voted up. I have used preg_replace_callbacl() before, but I had forgotten how handy it is. Thanks for your answer +1

alex 2009-04-30 07:36:27

Some extra info... I'm processing a lot of strings, and trying to find if values in an array (more specifically, keys) exist in those strings. Then I'd like to provide a link to their definition, and a title tag to boot. I'll give it some more thought tomorrow.

alex 2009-04-30 07:43:00

Okay, I've got this working now. Thanks Alan M, I'll post my solution.

alex 2009-04-30 23:32:29

Answer 4

+1 A:

This is so horrible I hesitate to post it, but if you want a quick hack, reverse the problem--instead of finding the stuff that isn't X, find the stuff that IS, change it, do the thing and change it back.

This is assuming you're trying to change awesome (to "wonderful"). If you're doing something else, adjust accordingly.

$string = 'Awesome is the man who <b>awesome</b> does and <a href="awesome.php" title="awesome">awesome</a> is.';

$string = preg_replace('#(title\s*=\s*\"[^"]*?)awesome#is', "$1PIGDOG", $string);

$string = preg_replace('#awesome#is', 'wonderful', $string);

$string = preg_replace('#pigdog#is', 'awesome', $string);

Don't vote me down. I know it's hack.

LibraryThingTim 2009-04-30 09:12:15

+1 -- it works...

nickf 2009-04-30 14:29:56

ansaurus

tags:

views:

answers:

preg_replace() help in PHP

Edit

related questions