views:

123

answers:

3

I'm programming a spell checker in Javascript in combination with OpenOffice dictionary, and I have a serious problem.

I can find whole words using RegEx, but if the word looks like prog<b>ram</b>ing, I can find it if I remove all html tags with the .text() method from jQuery. But how can I replace this word and rebuild the original html structure?

Spellchecker.com does it very smartly - the spell check recognizes even words like prog<b>ram</b>ing if they are misspelled!

A: 

sorry the word programing

yas
I've edited the post. This space is meant for answers. Please delete this answer. You can make use of comments. Welcome to SO :)
codaddict
There's a edit link under your question, use that instead if you need to clarify your question.
Tatu Ulmanen
I have done it ;-) thanks
yas
A: 

I would use something to pull out any HTML so that you are dealing with plaintext. I cannot speak for any tools like this in javascript but I'm sure they exists. If you can find something to 'scrub' the html out of your .text() you can run a search this way.

Try something like this: http://search.cpan.org/~podmaster/HTML-Scrubber-0.08/Scrubber.pm

Rabbott
+1  A: 
/([\s>"'])prog(<[^>]+>)ram(<[^>]+>)ing([\s\.,:;"'<])/g 

will match your example

So roughly the following regex will find all instances of the word, even those broken with html

 var regExp = new RegExp('([\s>"\'])' + word.split('').join('(<[^>]+>)') + '([\s\.,:;"\'<])',g);

God knows how that'll help you build a spellchecker though. I suspect the approach used in spellcheckers would be more like 'do a spellcheck assuming no html, and if there is html in a word then strip it out using something like the method below, and do a spellcheck as normal for the string you get:

String.prototype.stripHtml = function() {
  return this.replace(/(<[^>]+>)/, '');
}
wheresrhys