I'm currently writing a function for parsing some HTML and adding tags where necessary. Basically i have a piece of HTML like this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse feugiat, nunc at vestibulum egestas.
<script type="c">
#include <stdio.h>
#define debug(var) printf(#var " = %d\n", var)
int main(void)
{
int x = 12;
debug(x)
return 0;
}
</script>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse feugiat, nunc at vestibulum egestas.
<h3>Test Heading</h3>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ultricies luctus metus ut cursus.
<ol>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ol>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ultricies luctus metus ut cursus.
If you notice there are no <p>
tags around the paragraphs. I would like to parse this HTML and add the correct tags to the different paragraphs of text. Also whatever parser is used, it cannot touch any of the other valid HTML. For example, the headings and list should not be altered.
I've hacked together a solution using PHP and although it works, it's not fast or pretty to look at.
What is the best way to accomplish this?
Is there a nice PHP or Javascript based parser i could use for this?
I need to break the HTML down into elements, add tags and write the assembled HTML back to the page(?)