views:

42

answers:

3

I'm working on a system that will need to store a lot of documents (PDFs, Word files etc.) I'm using Solr/Lucene to search for revelant information extracted from those documents but I also need a place to store the original files so that they can be opened/downloaded by the users.

I was thinking about several possibilities:

  • file system - probably not that good idea to store 1m documents
  • sql database - but I won't need most of it's relational features as I need to store only the binary document and its id so this might not be the fastest solution
  • no-sql database - don't have any expierience with them so I'm not sure if they are any good either, there are also many of them so I don't know which one to pick

The storage I'm looking for should be:

  • fast
  • scallable
  • open-source (not crucial but nice to have)

Can you recommend what's the best way of storing those files will be in your opinion?

+3  A: 

A filesystem -- as the name suggests -- is designed and optimised to store large numbers of files in an efficient and scalable way.

LukeH
Make sure it's a good filesystem. FAT probably isn't up to this. ReiserFS or XFS is. YMMV with extN, NTFS, etc. See also http://serverfault.com/questions/43133/filesystem-large-number-of-files-in-a-single-directory .
Tom Anderson
@Tom: Indeed, good point, but it (almost) goes without saying... If you went for an RDBMS you'd want to make sure that it's a good one too; likewise if you went for a nosql database. Whatever system you choose, make sure it does its job well.
LukeH
@LukeH: True! But sometimes things that go without saying need saying.
Tom Anderson
A: 

In my opinion...

I would store files compressed onto disk (file system) and use a database to keep track of them.

and posibly use Sqlite if this is its only job.

Mark Redman
A: 

File System : While thinking about the big picture, The DBMS use the file system again. And the File system is dedicated for keeping the files, so you can see the optimizations (as LukeH mentioned)

Chathuranga Chandrasekara