views:

18

answers:

1

I want to remove some rogue HTML from a DB field that is supposed to contain a simple filename. Example of ok field:

myfile.pdf

Example of not ok field:

myfile2.pdf<input type="hidden" id="gwProxy" />...

Does anyone know a query I can run that can remove the HTML part but leave the filename? i.e. remove everything from the first < character onwards.

Lets assume the field is called myattachment and is defined as a varchar(250) and the table is called mytable in a MySQL database.


Background info (not necessary to read):

The field in our database is supposed to contain filenames however, due to a issue (documented here) some of the fields now contain a filename and some rogue HTML. We have fixed the root issue and now need to fix the corrupt fields. In the past I have replaced text using this kind of query:

UPDATE mytable SET myattachment = replace(myattachment, 'JPG', 'jpg') WHERE myattachment LIKE '%JPG';
+1  A: 

This query seems to work ok, can anyone see any issues with it?

UPDATE mytable
   SET myattachment = SUBSTRING_INDEX(myattachment, '<', 1) 
 WHERE `myattachment` LIKE '%<%';

For docs on SUBSTRING_INDEX see the mysql manual page.

Tom