Imagine the following.
- Html is parsed into a dom tree
- Dom Nodes become available programmatically
- Dom Nodes may-or-may-not be augmented programmatically
- Augmented nodes are reserialised to html.
I have primarily a question on how one would want the "script" tag to behave.
my $tree = someparser( $source );
....
print $somenode->text();
$somenode->text('arbitraryjavascript');
....
print $tree->serialize();
Or to that effect.
The problem occurs when deciding how to appropriately treat the contents of this field in regards to ease of use, and portability/usability of its emissions.
What I'm wanting to do myself is this:
$somenode->text("verbatim");
-->
<script>
// <!-- <![CDATA[
verbatim
// ]]> -->
</script>
So that what i produce is both somewhat safe, and validation friendly.
But I'm indecisive if doing this magically is a good idea, and whether or not I should have code that tries to detect existing copies of 'safety blocks' and replace them/strip them on the 'parse' phase.
If I don't strip it from input, I'm likely going to double up on the output phase, especially problematic if the output of this code is later wanted to be re-parsed.
If i strip it from input It will have the beneficial effect that programmatically fetching the content of the script element wont see the safety blocks at either end.
Ultimately there will be a way of toggling out some of this behaviour, but the question is what the /default/ way of handling this should be, and why.
Its possible my entire reasoning is flawed here and the text contents should go totally unprocessed unless wanted to be processed.
What behaviour do you look for in such a tool? Please point out anything in reasoning I may have overlooked.
TLDR Summary:
How should i programmatically handle the escaping mechanism in these scripts, namely the '//<
' safey padding at either end, with respect to input/output !--<![CDATA[