tags:

views:

681

answers:

1

I have a website where users upload documents in .doc and .pdf format. I am using Sphinx to conduct full text searches on my SQL database (MySQL). What is the best way to index these file formats with Sphinx?

+1  A: 

Unfortunately, Sphinx can't index those file types directly. You'll need to either import the textual contents into a database, or into an XML format that Sphinx can understand.

pat
Would you recommend one method over another?
Jared Brown
Depends what server-side language you're using. If it's Ruby/Rails, I know all the libraries don't support XML out of the box, unless you're building a system from scratch (instead of, say, using ActiveRecord). So I'd use the database.Otherwise, it's completely up to you. If you're not using Ruby, have a look at what libraries are out there for your language of choice, see what they can/can't do.
pat