I'm looking to write an algorithm to compress HTML output for a CMS I'm writing in PHP, written with the CodeIgniter framework.
I was thinking of trying to remove whitespace between any angle brackets, except the <script>
, <pre>
, and <style>
elements, and simply ignoring those elements for simplicity. I should clarify that this is whitespace between consecutive tags, with no text between them.
How should I go about parsing the HTML to find the whitespace I want to remove?
Edit:
To start off, I want to remove all tab characters that are not in <pre>
tags. This can be done with regex, I'm sure, but what are the alternatives?