views:

346

answers:

1

The documentation states for sphinx-0.9.9-rc2:

The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on.

However, I can't find any documentation on setting up a a source besides SQL. The config file doesn't seem to indicate that the source can be anything but a database. Anyone have any helpful links for setting up sphinx with an HTML source?

+1  A: 

Are you looking for the xmlpipe (now called xmlpipe2) feature on Sphinx? I've tried it out for XML files and it works just like it does for SQL.

I haven't tried out Sphinx with vanilla HTML files, so I'm guessing you'll need to parse your HTML file and create XML files with the attributes/fields that you want indexed and feed them to Sphinx using xmlpipe.

You can see here and here for more.

HTH

Arun
No, I specifically wanted to read in html files, index them and then use that to build a search engine for my site.I've given up on trying to use Sphinx and have approached the problem from another way.Here is the most information I was able to find, for anyone else looking:http://www.sphinxsearch.com/forum/view.html?id=3867
Tyler K