I need to hold a representation of a document in memory, and am looking for the most efficient way to do this.
Assumptions
- The documents can be pretty large, up to 100MB.
- More often than not the document will remain unchanged - (i.e. I don't want to do unnecessary up front processing).
- Changes will typically be quite close to each other in the document (i.e. as the user types).
- It should be possible to apply changes fast (without copying the whole document)
- Changes will be applied in terms of offsets and new/deleted text (not as line/col).
- To work in C#
Current considerations
- Storing the data as a string. Easy to code, fast to set, very slow to update.
- Array of Lines, moderatly easy to code, slower to set (as we have to parse the string into lines), faster to update (as we can insert remove lines easily, but finding offsets requires summing line lengths).
There must be a load of standard algorithms for this kind of thing (it's not a million miles of disk allocation and fragmentation).
Thanks for your thoughts.