Ok, this is a bit of a cheeky question. I want to build a simple text editor (using my own text mode screen handling). I just want a good example of data structures that can be used to represent the text buffer, and some simple examples of char/text insertion/deletion. I can handle all the rest of the code myself (file i/o, console i/o etc). A link to a nice simple editor source would be great (C or C++).
The "Gang of Four" book (Design Patterns) has a GUI-based text editor as it's main source of examples and is a worthwhile book to own.
The general "pure text" editor probably uses ropes, which SGI's STL has an implementation of. Basically, they are a linked list of character buffers. That way, inserting/deleting characters involves changing smaller buffers and a few pointers, instead of storing the entire document in a single buffer and having to shift everything.
check out vim, it's open-source. poke around in it to see how it handles what you want.
A simple approach would be line oriented -- represent the file as an array/vector of char/wchar_t arrays/vectors, one per line. Insertions and deletions work the way you'd expect, although end of line is a special case.
I'd start with that and possibly replace the line data structure with something more efficiently supporting inserts/deletes on long lines after you have everything else working.
This really depends on your design. Couple of years back, I wrote a small editor using curses. I used doubly linked list where each node was a character (quite a wasteful design.. but it makes formatting and screen refresh routines really easy).
Other data structures used by my friends were (this was a homework project): 1)linked list of arrays with each array representing a line. 2)a 2D linked list (just made up that name).. it was a linked list of characters but each character was linked to the character above and below. 3)Array of linked list
However, I would suggest you to go through the source code of some simple editors like pico to see what ds they are using.
Your primary data structure is one to contain the text. Rather than using a long buffer to contain the text, you'll probably want an array of lines because it's faster to insert a character into the middle of a line then it is to insert a character into the middle of a large buffer.
You'll need to decide if your text editor should support embedded formatting. If, for example, you need to use fonts, bolding, underlining, etc, then your data structure will need to include ways of embedding formatting codes within your text. In the good old days of 8-bit characters we could use the upper 8-bits of an integer to store any formatting flags and the lower 8-bits to store the character itself.
The actual code will depend on the language you're using. In C# or C++ you'll probably use an array of strings for the lines. In C you'll have an array of heap-based character arrays.
Separate out the display code from the text handling code as much as possible. The center of your code will be a tight loop something like:
while (editing) {
GetCharacter();
ProcessCharacter();
UpdateDisplay();
}
A more sophisticated editor will use separate threads for the character getting/processing and the display updating.
I used to work for a company whose main product was a text editor. While I mainly worked on the scripting language for it, the internal design of the editor itself was naturally a major topic of discussion.
It seemed like it broke down into two general trains of thought. One was that you stored each line by itself, and then link them together in a linked list or other overall data structure that you were happy with. The advantage was that any line-oriented editing actions (such as deleting an entire line, or moving a line block within a file) were beyond trivial to implement and therefore lightning fast. The down side was that loading and saving the file took a bit more work, because you'd have to traverse the entire file and build these data structures.
The other train of thought at that time was to try to keep hunks of text together regardless of line breaks when they hadn't been changed, breaking them up only as required by editing. The advantage was that an unedited hunk of the file could be blasted out to a file very easily. So simple edits where you load a file, change one line, and save the file, were super fast. The disadvantage was that line-oriented or column-block operations were very time consuming to execute because you would have to parse through these hunks of text and move alot of data around.
We always stuck with the line-oriented design, for whatever that is worth, and our product was considered one of the fastest editors at the time.
This is 2008. Don't write a text editor; you're reinventing fire.
Still here? I'm not sure if this applies or what platforms you plan to support, but the Neatpad series of tutorials is a great place to start thinking about writing a text editor. They focus on Win32 as the basic platform, but many of the lessons learned will apply anywhere.
My favorite solution is the gap buffer, because it's pretty easy to implement and has good amortized efficiency. Just use a single array of characters, with a region designated as the gap. Once you understand the concept, the code follows almost naturally.
You also need an auxilliary array [vector<int>] to track the index of the beginning of each line--so that you can easily extract a particular line of text. The auxilliary array only needs to be updated when the gap moves, or when a newline is inserted/removed.
These two online documents present a small, but useful cornucopia of "well-known" data structures/techniques for text editors.
- Data Structures for Text Sequences describes, and experimentally analyses a few data structures, leaning towards piece tables as the data structure of choice. Net.wisdom however seems to lean towards gap buffers as being more than adequate for text editing, and simpler to implement/debug.
- "The craft of text editing" (www.finseth.com/craft/) is an older work, and addresses more than just data structures, and is oriented towards Emacs-style editors; but the concepts are generally useful.