I recently tried to import a bunch of blog posts from an old blog (SharePoint) to my current blog (WordPress). When the import completed, a lot of nasty <div>
tags and other HTML made it in to the content of the post, which screwed up the way my site was rendering.
I'm able to view the offending rows in the MySQL database and want to know if there's a way to selectively remove the HTML text that may be causing problems. I could probably hack this in C# by parsing through the text, but I'd like to figure out how I can do this using SQL if I can.
If you want to see a full text sample of what one of these files looks like as it exists in the database text field, I uploaded a full sample file to my web site.
Here's want I want to do:
- Remove
<![CDATA[<div><b>Body:</b>
from the beginning of every file Remove the meta information at the end of every file, which might look like this:
<div><b>Category:</b> SharePoint</div> <div><b>Published:</b> 11/12/2007 11:26 AM</div> ]]>
Remove every
<div>
and closing</div>
tag, which might have a class attribute like:<div class=ExternalClass6BE1B643F13346DF8EFC6E53ECF9043A>
Note: The hex string at the end of the ExternalClass can be different
I haven't used an Update statement in MySQL before and I'm at a loss for where to begin to selectively replace text within a text field. Would I use regex from within a SQL statement to help? How would I execute a statement against the remote DB?