I have an interesting problem which I've been looking into and would appreciate some advice:
I'm trying to create a tool which mimics the basic capabilities of a requirements management tool as part of a company project.
The basic design is a Windows Explorer-like setting of folders and documents. Documents can be opened in a GUI, editted, and saved.
The document itself contains a hierarchical spreadsheet (think of Excel with Chapters, if that makes any sense). Each chapter contains rows, which are really just some requirements text + some other values which complement it. When displayed, the requirement text and attribute values show up as independent columns (much like Excel), with filtering capabilities.
Representing the user/permissions/folder hierarchy/etc for this type of program is pretty straightforward, but where I get hung up on is the document content itself...
My biggest concern is size and how it relates to performance: As part of this tool, I not only intended to store the current state of each document, but also the entire list of changes that have been made since day 1 (much like SVN), and then provide fast access to the history of changes.
On average, I expect ~500 documents in the repo; Each document will probably have ~20,000 active rows; Over the course of a year, it's not unreasonable to assume ~20,000 edits (meaning each document itself will acquire an additional 20,000 rows year-in and year-out).
Multiplied by the number of documents, that amounts to nearly 10,000,000 rows (with an additional 10,000,000 the next year, and the next year, and so on). Old histories can be purged, but it would only be performed by an admin (and it's not preferable that he/she do so).
As I see it, there are two ways for me to handle this situation:
I can try to represent a list all rows of all documents in a single table (much like how phpBB stores all posts of all forums in a single table), or...
I can try to store the rows of each document in a uniquely named table (meaning each document has it's own table); The table would have to be given a unique name, and a master table would contain the list of all documents and the table names that correspond to each.
So my question: Which really is preferable? Are neither really good options? Can anyone offer advice on which approach you would find more appropriate, given the needs?