I'm trying to split an HTML string by a token in order to create a blog preview without displaying the full post. It's a little harder than I first thought. Here are the problems:
- A user will be creating the HTML through a WYSIWYG editor (CKEditor). The markup isn't guaranteed to be pretty or consistent.
- The token,
read_more()
, can be placed anywhere in the string, including being nested within a paragraph tag. - The resulting first split string needs to be valid HTML for all reasonable uses of the token.
Examples of possible uses:
<p>Some text here. read_more()</p>
<p>Some text read more() here.</p>
<p>read_more()</p>
<p> read_more()</p>
read_more()
So far, I've tried just splitting the string on the token, but it leaves invalid HTML. Regex is perhaps another option. What strategy would you use to solve this and make it as bulletproof as possible? Any code snippets or hints would also be appreciated (I'm using PHP).