views:

340

answers:

1

Hey guys, I've managed to strip HTML from content when indexing data in SOLR.

But is it possible to strip HTML from data when simply storing data?

This is my field:

<field name="Content" type="textNoHTML" indexed="true" stored="true"/>

And, the field type "textNoHTML" implements the solr.HTMLStripCharFilterFactory:

<charFilter class="solr.HTMLStripCharFilterFactory" />

As I said, this works fine for indexing, but is it possible to apply a similar filter for storing?

cheers!

+1  A: 

If you're using the DataImportHandler you can use the HTMLStripTransformer.

Otherwise, you'll have to implement this client-side on your own. If your client is .NET you could use HtmlAgilityPack.

Mauricio Scheffer
+1 I see. So, if I'm importing data from a data store using the DataImportHandler, I can use that transformer... but if I'm adding via the XML commands, i can't? Why's that? Anyways, cool, I'll check out the agilitypack. cheers!
andy
AFAIK stored fields are always stored verbatim. The DIH acts as a client so it can have transformers.
Mauricio Scheffer
ahh, I see. cheers mauricio
andy