views:

209

answers:

5

I have an application that generates around 10000 printed pages per month. Each report (around 2000/month) is archived as PDF on a simple network file share. I am searching for a Document Management System meeting the following requirements:

  • watch the archive folder and update the index either on regular basis or when changes are detected
  • provide an Intranet Webpage where users can search documents based on filenames, timespans and other relevant file attributes
  • fulltext search
  • can handle large/substantially growing archives

To be clear, I am searching for a pre-built solution here, commercial products are accepted.

+1  A: 

I can suggest you google docs. AFAIK It can handle all your requirements.

whoi
I thought google docs is only some kind of "online-word"? Could you please elaborate?
Johannes Rudolph
it is a late response but yes it is true it can import many types of documents supported in MS Office, Open Office and can make online versions. Also you create new ones.
whoi
+1  A: 

This is a very vague question and I'm not quite sure how to respond.

It looks like you want a way to index all your files and ensure that the information is kept up to date in the database. What I can suggest is you look into some search servers like:

Sphinx

Solr

These both take some setup but they handle all your requirements: They can easily be setup to watch a folder and keep your index up to date, they provide great fulltext search, they can be accessed via an intranet webpage if you setup a page to search your database, and they are used for enormous operations so large archives shouldn't be a problem.

If you're looking for a pre-built solution, I'm not sure what to mention.

Bartek
Yes I am looking for a pre-built solution. Thanks for your hints and the suggestions, I have edited my question accordingly.
Johannes Rudolph
+2  A: 

Sounds like Microsoft Search Server 2008 Express would be a good candidate. Free and installs in a couple of minutes.

Jesper Palm
thanks, I will evaluate that.
Johannes Rudolph
after evaluation, I will settle with it. It's free but support options are available, easy to install and integrates well with the Microsoft Stack of the company it will be deployed to.
Johannes Rudolph
+1  A: 

Plone could work pretty well for your needs. It has plugins for indexing PDF content, and you can customize the metadata. Also, it has a fantastic web interface with built-in search. The best part is that it's free and easy-to-use, and if your needs grow, you can pay for support.

My only recommendation (at first glance) is that you store your content on the file system and not in the Zope OO database. You should only store your metadata and index data in the database. This is a pretty common way of storing large amounts of content in the document management world.

Hope that helps!

Tom Purl

Tom Purl
indeed it helps. Could you please point me in the correct direction for the pdf plugins?
Johannes Rudolph
Checkout the following link:http://plone.org/documentation/how-to/integrating-office-files
Tom Purl
A: 

As Tom said Plone does to what you describe. It has build in full text search that relies on the commandline programm pdftotext for pdfs to be in the path. There are several Extension you may me interested in:

(Sorry, missing links due to stackoverflows new user policy)

Carsten Senger