I'm working on a system that will need to store a lot of documents (PDFs, Word files etc.) I'm using Solr/Lucene to search for revelant information extracted from those documents but I also need a place to store the original files so that they can be opened/downloaded by the users.
I was thinking about several possibilities:
- file system - probably not that good idea to store 1m documents
- sql database - but I won't need most of it's relational features as I need to store only the binary document and its id so this might not be the fastest solution
- no-sql database - don't have any expierience with them so I'm not sure if they are any good either, there are also many of them so I don't know which one to pick
The storage I'm looking for should be:
- fast
- scallable
- open-source (not crucial but nice to have)
Can you recommend what's the best way of storing those files will be in your opinion?