views:

53

answers:

3

I am planning to build a simple document management system. Preferably built around the java platform. Are there are best practices around this? The requirements are :

  1. Ability to upload documents
  2. Ability to Tag documents
  3. Version the documents
  4. Comment on documents

There are a couple of options that I am currently considering. The first option would be a simple API on top of SVN or CVS and use a DB backend to track tags, uploader, comments etc

Another option is to use the filesystem. Version the documents as copies in a versions folder and work with filenames.

Or, if there is an Open non GPL'ed doc management system, we could customize it to our needs and package it in our application. Does anybody have any experience building something like this?

A: 

The best way is to reuse the efforts of others. This particular wheel has been invented quite a bit of times.

Who will use this and for what purpose?

Thorbjørn Ravn Andersen
Will be part of a larger collaboration stack. It will serve as a platform for knowledge management.
Ritesh M Nayak
Considered just using a Wiki?
Thorbjørn Ravn Andersen
Well, we already have a wiki. This is to address legacy document store needs.
Ritesh M Nayak
A: 

Take a look at the many Document Oriented Database systems out there. I can't speak about MongoDB or any of the others, but my experience with Couchdb has been fantastic.

http://couchdb.apache.org/

best part of it is that you communicate with it via a REST protocol.

WeNeedAnswers
Can I actually store documents, I mean word documents, pdfs, txt files etc on it? Isn't it just a document oriented database?
Ritesh M Nayak
You can store anything in it. The database then acts as the meta data stuff that your after.
WeNeedAnswers
http://blog.couchone.com/post/632718824/simple-document-versioning-with-couchdb
WeNeedAnswers
I read through the article. CouchDB functions well as a document oriented datastore. A document in couch terms is a Structured JSON stored with versioning infomration. What I am talking about is storage of files. Files like PDF, PPT's , Word Documents etc which cannot be JSON'ized and stored.
Ritesh M Nayak
Yes you can store these documents in couchdb. They get stored as binary along side the other json elements as attachments.
WeNeedAnswers
http://wiki.apache.org/couchdb/HTTP_Document_API
WeNeedAnswers
The documents get stored as binary along side the string formatted json, where it is really nifty though is that the actual binary data is stored in the json structure which means that you get replication, REST access and fast queries based on B tree.
WeNeedAnswers
One thing I will add mind, is that the version control stuff they speak of has nothing to do with Version Control as we speak of in Code. Its to solve the problem of CRUD without actual overwriting data. Its very clever. I don't think it would be too difficult to create a versioning app in couchdb though, just create a new Json structure with attachment, and change the Json meta data to reflect what has changed in the document since the last version. You could do the delta compare then by using some embedded diff tool, but my experience of this with pdf and word documents has not been good.
WeNeedAnswers
+1  A: 

You may want to take a look at Content repository API for Java and the several implementations (some of them free).

renick
We are looking at JackRabbit for this purpose. Would you suggest something else which is Open but can be packaged with a commercial solution.
Ritesh M Nayak
For commercial packaging its better to stay with Apache License so you can look at projects using Jackrabbit (such as Hippo Repository)
renick