views:

31

answers:

2

Hi,

Ultimately my application is pattern matching based on a text selection within a web browser, generating regEx code so that a portion of a page can be revisited and read anagrammatically.

I'm currently pulling a portion of the text by walking up the dom and then returning innerHtml.

The issue that I have is the text value of the innerHtml property is not consistent, and not properly representative of the literal text.

Tags are capitalised, quotes removed etc, and it varies between browsers.

Is there a way to deal with this client side. I can already read in the literal page text using HttpRequest, but matching using the whole page may be less accurate.

Is this a common issue in JavaScript and is there a way around it?

A: 

Don't use innerHTML it is a source of bugs and terrible code! Just Google about it and you will see that is very bad practice to use it.

fuzzy lollipop
+2  A: 

innerHTML is indeed bad practice. Apparently it is something introduced by Microsoft in IE that got popular. The thing is, HTML is a DOM; it is not a string. Because innerHTML is not a standard, there is not standard way for converting the DOM into a string, and therefore you will get inconsistent results.

The HTML DOM is very extensive - you can do everything you might want to do with innerHTML with standardized DOM instead. If you really need to get the text value of a node, the use the property nodeValue of that node.

Most of what I am saying came from an article that is a little outdated but still accurate, about innerHTML alternatives.

Ani B
Thanks for the link. How can I achieve what I need to do here then. After all, what I really want is the raw html text inside an element, not a propriety version translated though the dom. Is there a way to achieve this?
BombDefused
Just to clarify - that's raw html text, angle brackets and all
BombDefused