views:

130

answers:

3

I need to develop a basic .NET document management system with the following specifications:

  1. The data should be portable and self contained, so I will serialize the documents (typical formats include Word, PDF, Excel and Powerpoint) into binary data. I will then store said binary data in a SQL Server 2005 database. When a user needs to download a document, the system will deserialize the binary data and will present it in is original format.

  2. The average row size cannot be bigger than 200k.

  3. We expect a maximum of 500 documents will be uploaded monthly for a period of three years.

  4. We don't expect the size of the database to ever go over 6 GB

  5. We have maximum target of 20,000 people that potentially would access the system at the same time.

My question is: How robust does the technology need to be in order to offer solid performance, prevent site downtime, etc?

I am a novice developer and am not familiar with this kind of architecture and design.

+4  A: 

What's the reason for needing to store the files in the database, instead of just storing the path of the document on a file server or CDN? Would be a lot less load on your DB server, and give you more flexible options for document storage.

If you're having issues with moved/deleted files in a system like the one I suggested, then perhaps also consider other options, such as:

  • Locking permissions on the underlying file system to everyone except the role the application is running under (easiest option)
  • Running a background service which listens for changes to folders etc and updates the database accordingly

In the end, a database-only solution may be simpler, but I wouldn't underestimate the load you may hit upon by storing large files for tens of thousands of users.

Marcus
+1, but I would add that both Oracle 11 and SQL Server offer features where the files are "in" the database, but actually live on a file server. That is, they are managed by the DB. Oracle 11 even goes as far as to offer WebDAV access to the files, which is pretty neat. We've been using SecureFiles for Oracle 11 with great success for very large files (500mb+).
sixlettervariables
+1 for the info - good tip.
Marcus
Hi Marcus. We want to replace precisely that kind of system. We've had many examples of broken links because of documents being removed from the directory and the path not being updated or deleted from DB.
aspNetAficionado
Hi sixlettervariables, is SQL Server 2008 that offers that feature you describe? I'm using 2005 which I'm sure it doesnt. We've looked at that possiblity of upgrading to 2008.
aspNetAficionado
@aspNetAficionado - Edited my response to include these concerns.
Marcus
+4  A: 

This is more than just a "basic" system. So here would be my concerns right off the bat:

  • 500 documents a month for 3 years seems like it could well exceed 6 GB for database size. You may want to determine the maximum document size and see if that calculation holds up.
  • 20,000 users is a lot. How many can you expect at once? If it is more than 100 concurrent users I would start to investigate server clustering/web farms to be able to handle the load
  • just a nit pick, but you won't be "serializing" in the .NET "Serializable" sense. You'll just be storing the raw document bytes in the DB
  • if you need high availability you will need to look at DB replication to another DB instance just in case your DB Server goes down

Lastly. I have to believe that there are off-the-shelf systems that do what you want, and also include more advanced functionality like permission-based access and document revision.

Mike

mjmarsh
Hi Mike for your input. Can you explain ""just a nit pick, but you won't be "serializing" in the .NET "Serializable" sense. You'll just be storing the raw document bytes in the DB "" Thanks
aspNetAficionado
Serializing tends to mean representing a constructed object in another format, such as storing an object class in XML. What you're doing is enveloping one file within another.
overslacked
Oops - that should read "object instance in XML" - not object class.
overslacked
Thanks for this.
aspNetAficionado
Correct. Serializing usually means storing structured object data in a persistent format so that you can load the objects from that format at some later time. If you store your docs in the DB you'll just be reading the raw bytes of the document and storing them in a DB field.
mjmarsh
A: 

An important part of programming is knowing when you're in over your head. If the CTQ's you've posted are real, specifically the concurrent access requirement, then you're in for a world of hurt. Even those of us with quite a lot of time in the trenches are going to be in a world of hurt with that sort of requirement. I'd tackle the problem with the following mindset:

I'm going to get this wrong in more ways that I can currently imagine.

Knowing this much, the simpler you keep this architecture, the more likely it is to scale. However, the company I work for is absolutely massive and I doubt even we have any systems that truely have 20,000 concurrent users. So don't bite off more than you can chew.

Design your architecture to be simple and robust (a tall order) and you'll find it will scale naturally until you eventually need to call in the big guns.

I can suggest that you should at least spend money on access to SQL Server 2008. With that version your problem should be fairly elementary for starters. Use the FILESTREAM storage for the files. No serialization necessary. This will store the files on an NTFS file system and will maximize your ease of programming, maintenance, and scalability.

If you for some reason only have SQL Server 2005, you'll have to deal with BLOBs which isn't exactly difficult, but is somewhat messy. I suggest you read To BLOB or Not to BLOB from Microsoft Research to make the decision if storing the data in SQL Server 2005 is the best bet for you. If so, there are plenty of articles detailing how to put files into SQL Server BLOBs. Just be aware this is rarely the most efficient or scalable solution.

sixlettervariables
Hi, many thanks for this.
aspNetAficionado