views:

511

answers:

2

I've been thinking a lot about making an editor core functionality wise compatible to vim, similar to yzis.

The biggest questions are what buffer type to use.

Requirement are:

  • possibility to implement fast syntax highlighting, regex on top of it.
  • possibility to implement multiple syntax highlightings in a single file. similar to textmates scopes
  • proper moving marks on delete inserts. so that they properly adjust in column. unlike in vim.
  • handle and highlight at least 100 mb files without too big issues and memory overhead.

Possible buffer types:

  • gap buffers
  • line based editing

I read that gap buffers can cause rather big memory fragmentation at longer runs. Also the emacs syntax highlighting engine is very slow.(dunno why, maybe not really related to the buffer type)

So the questions:

  1. what buffer type would be best for a fast programming editor?
  2. what's a fast/complete regex engine? (maybe this includes the next point). TextMate uses oniguruma, is that a wise choice?
  3. what's fast syntax highlighting engine?
  4. About the mark and syntax highlighting. How do emacs overlays work, would they help?

Thanks, Reza

A: 

You can use the Scintilla Class. Your view can be derived from the Scintilla View and it provides the syntax highlighting.

Vinay
+1  A: 

A good text editor should be useful for all kinds of work a programmer might do, and that includes opening files that may sometimes be several gigabytes in size. Therefore I would not recommend a mind set where everything is to be buffered in RAM.

I would recommend setting up a search tree of slices representing the file, where a single slice may be:

  1. A reference to a range of bytes in the actual file on disk, or
  2. A reference to an edited "page".

When you open a file you start by inserting a single item into the tree, which is simply a range representing the whole file, e.g. for a 10-MiB file:

std::map<size_t, slice_info> slices;
slices[0].size = 10*1024*1024;

When the user edits the file, create a "page" which is some reasonable size, say 4 KiB, around the edit point. The tree is spliced at that point. In the example, the edit point is at 5 MiB:

size_t const PAGE_SIZE = 4*1024;
slices[0].size = 5*1024*1024;
slices[5*1024*1024].size = PAGE_SIZE;
slices[5*1024*1024].buffer = create_buffer(file, 5*1024*1024, PAGE_SIZE);
slices[5*1024*1024 + PAGE_SIZE].size = 5*1024*1024 - PAGE_SIZE

You can use memory-mapped files both for the read-only buffer (the source file) and for the copied editable buffers (the latter would be placed in a temp directory). This also allows recovery should the editor crash.

Using fixed-size pages will reduce fragmentation of the memory heap a lot since all blocks have the same size, and inserting text will never require moving more than 4 KiB of data ahead of you.

This is a simplified description to give the general idea without getting into too many gritty details. A real implementation would most likely need to be more sophisticated, e.g. allow for a variable amount of data in a page to cope with pages that overflow, and merge together many small slices so that running a regex substitution across a large file does not create too many small buffers. There probably needs to be a limit for the number of slices you should have in the tree simultaneously, but a key point is that when you start inserting somewhere you should make sure that you are working with a slice that isn't too big.

For regex, I don't think the performance is much of a problem as long as the whole editor doesn't hang while running it. Try Boost.Regex, it will most likely be fast enough for your needs, and it is also generic enough to plug in any buffering strategy you need.

The same applies to syntax highlighting, if you run it in the background it won't disturb the user so much while he is typing. You can use the slice approach to your benefit here:

  • Each slice can have a mutex that can be locked during an editing operation, allowing syntax highlighting or "intellisense" type analysis to run in a background thread.
  • You can store the state of the syntax highlighting engine so that whenever you make edits in a slice you can restart the syntax highlighting from the beginning of that slice, rather than from the beginning of the file.

I am not aware of any freestanding syntax highlighting engines, but they are usually based on regex substitution (see e.g. the syntax highlighting files in vim).

flodin
hi flodin, very nice answer. i never thought of running the regex in background(and also the syntax hl, unless i implement lexers).however, in the end it seems like you're describing the piece table method. my concern is that if you have a lot of uncommited work, the regex will get hard to manage?
Reza Jelveh