views:

180

answers:

3

I have a string which contains HTML text. I need to escape just the strings and not tags. For example, I have string which contains,

<ul class="main_nav">
<li>
<a class="className1" id="idValue1" tabindex="2">Test & Sample</a>
</li>
<li>
<a class="className2" id="idValue2" tabindex="2">Test & Sample2</a>
</li>
</ul>

How to escape just the text to,

<ul class="main_nav">
<li>
<a class="className1" id="idValue1" tabindex="2">Test &amp; Sample</a>
</li>
<li>
<a class="className2" id="idValue2" tabindex="2">Test &amp; Sample2</a>
</li>
</ul>

with out modifying the tags.

Can this be handled with HTML DOM and javascript?

Thanks

+5  A: 
T.J. Crowder
@T.J. Crowder:Thanks for the explanation.Let me give details of my exact problem.I want to save the HTML source file to disk. And I tried to get content using,document.childNodes[i].outerHTML;and also,document.getElementsByTagName('html')[0].innerHTML;Some of the entities will be decoded by DOM. I want to retain entities as original.I do not know alternate approach to get html content (while maintaining current state) and retain entities.I am using QT webkit and javascript.Any help regarding this.Thanks
kokila
T.J. Crowder
kokila
@kokila: Ah, now, that completely changes the nature of the question. Entirely possible to deal with those, I'll update the answer to show you how (it's tricky).
T.J. Crowder
@T.J. Crowder: Hopefully I will get some solution. I guess innerHTML or outerHTML do not belong to DOM standards. Even I tried using "d.firstChild.nodeValue", which is supposed to be part of DOM standard. Even it decodes the entities.
kokila
@kokila: Okay, I've updated the answer. The problem isn't with `innerHTML`, it's doing its job. You're trying to get entities for perfectly valid literal characters. See my update for more. (`innerHTML` and `outerHTML` have been added to the most recent HTML specification, btw: http://www.w3.org/TR/html5/dom.html#htmlelement)
T.J. Crowder
@kokila: ...and I've updated it again to handle surrogate pairs (I hope). It was easier than I thought, thanks to the intelligence of the people defining UTF-16.
T.J. Crowder
@T.J. Crowder: Thank u very much for sharing your insights and giving the exact solution. The information and links provided by you are very useful. I had spent lot of time browsing for required solution but nowhere had found such valuable information regarding HTML and DOM. Thanks again for your valuable time.
kokila
A: 

What server-side language are you using?

if you're using PHP you could use htmlentities

Example:

<?php

$myHTML = "<h1>Some HTML Tags</h1><br />";
echo htmlentities($myHTML);

?>
MrSoundless
No, that will escape the tags as well. The OP said he/she wants to escape only the bits *between* the tags (a very tricky and error-prone operation).
T.J. Crowder
A: 

Did you tried escape() function in Javascript? JavaScript escape() Function

Anuraj
This function was made for escaping URL parameters, not HTML. And even in URL parameters it is the wrong thing to use.
Tomalak
@Tomalek: `escape` wasn't for escaping URL parameters specifically (it doesn't do URL-encoding), it was just for escaping data to be sent as part of a GET or POST into a specific, well-controlled format (different from URL-encoding). And yes, it has absolutely nothing to do with escaping HTML.
T.J. Crowder