



I'm programming a spell checker in Javascript in combination with OpenOffice dictionary, and I have a serious problem.

I can find whole words using RegEx, but if the word looks like prog<b>ram</b>ing, I can find it if I remove all html tags with the .text() method from jQuery. But how can I replace this word and rebuild the original html structure? does it very smartly - the spell check recognizes even words like prog<b>ram</b>ing if they are misspelled!


sorry the word programing

I've edited the post. This space is meant for answers. Please delete this answer. You can make use of comments. Welcome to SO :)
There's a edit link under your question, use that instead if you need to clarify your question.
Tatu Ulmanen
I have done it ;-) thanks

I would use something to pull out any HTML so that you are dealing with plaintext. I cannot speak for any tools like this in javascript but I'm sure they exists. If you can find something to 'scrub' the html out of your .text() you can run a search this way.

Try something like this:

+1  A: 

will match your example

So roughly the following regex will find all instances of the word, even those broken with html

 var regExp = new RegExp('([\s>"\'])' + word.split('').join('(<[^>]+>)') + '([\s\.,:;"\'<])',g);

God knows how that'll help you build a spellchecker though. I suspect the approach used in spellcheckers would be more like 'do a spellcheck assuming no html, and if there is html in a word then strip it out using something like the method below, and do a spellcheck as normal for the string you get:

String.prototype.stripHtml = function() {
  return this.replace(/(<[^>]+>)/, '');