A customer need a document managment system and I'm building information about this.
I know about sharepoint & alfresco, but in this case I'm evaluating the necesary info for build it from scratch, so please refrain to suggest the use of any of these (we are doing the evaluation of them separately, this is all about develop, not implement a existent solution).
This are the requeriments:
- Have a very specific requeriment from legal managment of the documents that is specific to our local goverment, but apart from this:
- A operation similar to google docs from the point of view of the end-user
- Need store info from 200 + end-users (UPDATE: Are really +700 end-users)
- Mainly office documents, pdf, text. I already have the extraction of plain text from this binary files.
- No wiki, no portal creation, barely workflow but very simple, is only managment of files
- Central repository, share across the company, integrated with the Active directory
- Fast searching
- Transparent desktop integration
- Web interface
- Multiplataform, if possible
So, this is the things I have on top of my head:
- Storage: I know that sharepoint save all in the db (Alfresco too?). That is a nightmare, IMHO. I prefer put the metadata in a DB, and the files on disk.
I thinking about force the use of ZFS in this case & leverage their capabilities for versioning, snapshots & scaling. Or maybe use git as storage backend (git will work fine?)
So, where I can know more about how handle a large pool of documents, in ZFS or any regular file system? For example, how layout the folder structure to easy managemnt & fast responses, easy backup, etc.
- Metadata: I think in a regular DB here, but wonder if have more merit save everything in Lucene (I have some experience on Lucene, but worry because Lucene can't be federated, rigth?).
If I use a search engine as metadata database I can save some work (not need a second pass for indexing), but a regular database engine is more standard.
- Tech: I probably will build this in Django, PyLucene, Postgress, and do the shell integration for windows (I have not problems for do that).
I will apreciate any hints or info in how properly implement this solution.