views:

47

answers:

0

I have recently added search capabilities to my django-powered site to allow employers to search for employees using keywords. When the user initially uploads their resume, I turn it into text, get rid of stop words, and then add the text to a TextField for that user. I used Django-Haystack with the Whoosh search back engine.

Three things-

1) Aside from extra features which I'll probably not use, is there any concrete advantage to switching to Solr or Xapian?

2) In turning the resume into text, I essentially index the pdf myself. I know both Xapian and Solr support .pdf indexing, however, from the looks of it Haystack does not. Any tips on how to get around this? Or should I keep indexing it myself? If so, should I be doing more than simply providing a text file of keywords?

3) Whoosh only return a result if the keyword matches itself exactly. If a user has 'mathematics' as his keyword, and I search 'math', I want that user to appear. I couldn't definitively tell whether Xapian or Solr support this. Thoughts?

Thanks for any suggestion. I'm going to continue digging into this myself for the time being.