views:

150

answers:

4

My problem is as follows :

I have a column : ProductName. Now, the text entered here is entered from tinyMCE so it has all kinds of tags. The user wants to be able to do a Find-And-Replace on all products, and the it has to support coloring.

For example - let's say this is a portion of a ProductName:

other text.. <strong>text text <font color="#ff6600">colortext&reg;</font></strong> ..other text

Now, the user wants to replace the :

<font color="#ff6600">colortext&reg;</font>

The original name has the <strong> tag in it so it appears bold. So the users makes it bold - now the text he is searching for is :

<strong><font color="#ff6600">colortext&reg;</font></strong>

Obviously I'm not going to find it. plus there's the matter of spaces : in one place it has a space in another it doesn't.

Is there a way to overcome this ? Any Ideas ?

A: 

Not sure to understand everything you said but the use of regular expression seems a good way to overcome the problem you're talking about.

claferri
I would need an example of that.
sirrocco
Ah, so now he has two problems.
Albert
+1  A: 

Strip the HTML tags from the search text and do a plain text search first. Then, part by part (i.e., text node by text node), take the element path of the search text's parts, and compare these with their counterparts in the found text. If the paths for all parts match, you're done.

Edit: By path, I meant something similar to XPath, or the path notion of the TinyMCE editor. Example: plain text part of the search text is "colortext&reg;". The path of this text node in the search text is <strong>/<font color="#ff6600">. Search for the same plain text in the text body (trivial), and take it's path, which is also <strong>/<font color="#ff6600">. (Compare this with the path of "other text..", which is /, and of "text text", which is <strong>.) The two paths are the same, so this is a real match. If you have a DOM tree representation, determining the path shouldn't be difficult.

David Hanak
"take the element path of the search text's parts"I don't really understand what you were trying to say. Can you give a short example ?
sirrocco
A: 

If it's valid XML, an XSLT would be trivial for this kind of exercise. Use an identity template, and then add an XPath to find the specific node you want:

<xsl:template match="//strong/font">
    <xsl:copy>
        <!-- Insert the replacement text here -->
    </xsl:copy>
</xsl:template>

When working with XML, this would be a maintainable, extensible solution.

Jweede
+1  A: 

You're asking for several related, but discrete, abilities:

  • Search and Replace content
  • Search and Replace formatting
  • Search and Replace similar (ie, ignore trivial differences in whitespace)

You should take this in steps - otherwise it becomes overwhelming and a single search algorithm won't be able to do all three without intense effort and resulting in difficult to maintain code.

First, look at the similar problem. Make a search that ignores spaces and case. You might want to get into Lucene or another search engine technology if you also need to deal with "bowl" vs "bowls" and "intelligent" vs "smart" - though I expect this is beyond your current needs.

Once you have that working, it becomes one layer in your stack of searches.

Second, look a formatting search. This is typically done using tokens or tags - which you already have in the form of HTML. However, you have to be able to deal with things out of sequence - so <b><i>text</i></b> needs to be caught in a search for <i><b>text</b></i> and the malformed representation where tags aren't nested properly, such as <b><i>text</b></i>.

One method of this is to pre-parse the string and apply the formatting styles to each character. So you'd have a t that's bold and italic, an e that's bold and italic, etc. to make this easier and faster use a hash to represent the style combination - Read the first character, figure out what style it is (keep track of this turning styles on and off and you find tags) and if it already exists in the hash, assign that hash number to the letter. If it doesn't, get the new hash number and assign that.

Now you can compare the letter and its style hash against your search and get format and content matches. Stack that on top of your similar match and you have what you need.

Adam Davis