My current project requires locating an array of strings within an element's text content, then wrapping those matching strings in <a>
elements using JavaScript (requirements simplified here for clarity). I need to avoid jQuery if at all possible - at least including the full library.
For example, given this block of HTML:
<div>
<p>This is a paragraph of text used as an example in this Stack Overflow
question.</p>
</div>
and this array of strings to match:
['paragraph', 'example']
I would need to arrive at this:
<div>
<p>This is a <a href="http://www.example.com/">paragraph</a> of text used
as an <a href="http://www.example.com/">example</a> in this Stack
Overflow question.</p>
</div>
I've arrived at a solution to this by using the innerHTML()
method and some string manipulation - basically using the offsets (via indexOf()
) and lengths of the strings in the array to break the HTML string apart at the appropriate character offsets and insert <a href="http://www.example.com/">
and </a>
tags where needed.
However, an additional requirement has me stumped. I'm not allowed to wrap any matched strings in <a>
elements if they're already in one, or if they're a descendant of a heading element (<h1>
to <h6>
).
So, given the same array of strings above and this block of HTML (the term matching has to be case-insensitive, by the way):
<div>
<h1>Example</a>
<p>This is a <a href="http://www.example.com/">paragraph of text</a> used
as an example in this Stack Overflow question.</p>
</div>
I would need to disregard both the occurrence of "Example" in the <h1>
element, and the "paragraph" in <a href="http://www.example.com/">paragraph of text</a>
.
This suggests to me that I have to determine which node each matched string is in, and then traverse its ancestors until I hit <body>
, checking to see if I encounter a <a>
or <h_>
node along the way.
Firstly, does this sound reasonable? Is there a simpler or more obvious approach that I've failed to consider? It doesn't seem like regular expressions or another string-based comparison to find bounding tags would be robust - I'm thinking of issues like self-closing elements, irregularly nested tags, etc. There's also this...
Secondly, is this possible, and if so, how would I approach it?