I have a website where users upload documents in .doc and .pdf format. I am using Sphinx to conduct full text searches on my SQL database (MySQL). What is the best way to index these file formats with Sphinx?
+1
A:
Unfortunately, Sphinx can't index those file types directly. You'll need to either import the textual contents into a database, or into an XML format that Sphinx can understand.
pat
2009-07-30 21:16:12
Would you recommend one method over another?
Jared Brown
2009-07-31 04:04:59
Depends what server-side language you're using. If it's Ruby/Rails, I know all the libraries don't support XML out of the box, unless you're building a system from scratch (instead of, say, using ActiveRecord). So I'd use the database.Otherwise, it's completely up to you. If you're not using Ruby, have a look at what libraries are out there for your language of choice, see what they can/can't do.
pat
2009-08-02 20:30:04