views:

267

answers:

4

Would it be possible to do a little more 'advanced' string replacing in a MySQL query? I read about the REPLACE() method, but as far as I know, it cannot be used for fancy stuff.

Basically, what I'm trying to do is remove all HTML tags from a specific column in a large dataset. Being able to do this in a single query would be a lot cooler than having to use PHP to do the processing in between. Can I do something elegant in order to remove everything encapsulated in < >?

I just realized there's another small catch to it... due to some improper data conversions before, some of the tags now contain < and > themselves... Worst cases look like <(><<)>b>. Don't even ask how, but something seems to have gone terribly wrong. Anyhow, great minds with great ideas on this?

+2  A: 

Short answer: no, this cannot be done in SQL. This is the sort of work you'd need to do in a programming language — any programming language you have handy, really.

Given that this isn't proper HTML, parsing it with traditional methods may even be out of the question. Perhaps some horribly hackish regular expressions would help, or perhaps you'll get some mileage out of finding particular "buggy" strings (e.g., if (><<)> appears commonly) and deleting them witha traditional REPLACE().

The details, of course, would depend on your data and what tools you have available, but MySQL won't be able to do the bulk of that work.

VoteyDisciple
A: 

Sorry, you're out of luck. MySQL has no regex or comparable replacement faculty.

chaos
+1  A: 

You can hack up a function, but I don't think that's the way to go.

CREATE FUNCTION strip_tags( String VARCHAR(9999) )
RETURNS VARCHAR(9999)
DETERMINISTIC 
BEGIN
  DECLARE start, end, len INT;
    SET start = LOCATE('<', String);
    SET end = LOCATE('>', String, start);
    WHILE start> 0 AND end > 0 DO
      BEGIN
        SET String = INSERT(String, start, end, '');
        SET start = LOCATE('<', String);
        SET end = LOCATE('>', String, start);
      END;
    END WHILE;
  RETURN String;
END;

Disclaimer: the code is untested.

Zed
Interesting... but I guess it won't help me with the messed-up tags. Creating a function that _will_ do this probably isn't a considerable option. Thanks anyway!
JorenB
You can implement anything you want to. It's just a matter of taste. For such a complex logic SQL would not be my choice :)
Zed
A: 

mysql has regex and pattern matching functions

ssss