I want to implement in desktop application in java searching and highlighting multiple phrases in html files, like it is done in web browsers, so html tags (within <
and >
) are ignored but some tags like <b>
arent ignored. When searching for example each table
in text ...each <b>table</b> has name...
will be highlighted, but in text ...has each</p><p> Table is...
it will be not highlighted, because the <p>
tag interrupts the text meaning.
in web browser is this somehow implemented, how can I get to this implementation? or is there some source on the net? I tried google, but without success :(
views:
76answers:
3Instead of searching inside the actual HTML file the browsers search on the rendered output of that HTML.
Get a suitable HTML renderer and get its output as text. Then search on that text output using appropriate string searching algorithms.
The example that you highlighted in your question would result in a newline character in the rendered HTML output and hence a normal string searching algorithm will behave as you expect.
This seems pretty easy.
1) Search for the last word in the string.
2) Look at what's before the last word.
3) Decide if what's before the last word constitutes and interruption (<p>, <br />, <div>
).
4) If interruption, continue
5) Else
evaluate previous word against the search query.
I don't know if this is how browsers perform this operation, but this approach should work.