ansaurus

Question

Answer 1

A:

Not sure to understand everything you said but the use of regular expression seems a good way to overcome the problem you're talking about.

claferri 2009-02-19 14:39:36

I would need an example of that.

sirrocco 2009-02-19 15:07:32

Ah, so now he has two problems.

Albert 2009-02-19 16:02:12

Answer 2

+1 A:

Strip the HTML tags from the search text and do a plain text search first. Then, part by part (i.e., text node by text node), take the element path of the search text's parts, and compare these with their counterparts in the found text. If the paths for all parts match, you're done.

Edit: By path, I meant something similar to XPath, or the path notion of the TinyMCE editor. Example: plain text part of the search text is "colortext®". The path of this text node in the search text is /. Search for the same plain text in the text body (trivial), and take it's path, which is also /. (Compare this with the path of "other text..", which is /, and of "text text", which is .) The two paths are the same, so this is a real match. If you have a DOM tree representation, determining the path shouldn't be difficult.

David Hanak 2009-02-19 15:33:48

"take the element path of the search text's parts"I don't really understand what you were trying to say. Can you give a short example ?

sirrocco 2009-02-19 16:02:27

Answer 3

A:

If it's valid XML, an XSLT would be trivial for this kind of exercise. Use an identity template, and then add an XPath to find the specific node you want:

<xsl:template match="//strong/font">
    <xsl:copy>
        <!-- Insert the replacement text here -->
    </xsl:copy>
</xsl:template>

When working with XML, this would be a maintainable, extensible solution.

Jweede 2009-02-19 15:45:57

Answer 4

+1 A:

You're asking for several related, but discrete, abilities:

Search and Replace content
Search and Replace formatting
Search and Replace similar (ie, ignore trivial differences in whitespace)

You should take this in steps - otherwise it becomes overwhelming and a single search algorithm won't be able to do all three without intense effort and resulting in difficult to maintain code.

First, look at the similar problem. Make a search that ignores spaces and case. You might want to get into Lucene or another search engine technology if you also need to deal with "bowl" vs "bowls" and "intelligent" vs "smart" - though I expect this is beyond your current needs.

Once you have that working, it becomes one layer in your stack of searches.

Second, look a formatting search. This is typically done using tokens or tags - which you already have in the form of HTML. However, you have to be able to deal with things out of sequence - so text needs to be caught in a search for text and the malformed representation where tags aren't nested properly, such as text.

One method of this is to pre-parse the string and apply the formatting styles to each character. So you'd have a t that's bold and italic, an e that's bold and italic, etc. to make this easier and faster use a hash to represent the style combination - Read the first character, figure out what style it is (keep track of this turning styles on and off and you find tags) and if it already exists in the hash, assign that hash number to the letter. If it doesn't, get the new hash number and assign that.

Now you can compare the letter and its style hash against your search and get format and content matches. Stack that on top of your similar match and you have what you need.

Adam Davis 2009-02-19 15:58:56

ansaurus

tags:

views:

answers:

Find and Replace and a WYSIWYG Editor

related questions