views:

286

answers:

4

While editing this and that in Vim, I often find that its syntax highlighting (for some filetypes) has some defects. I can't remember any examples at the moment, but someone surely will. Usually, it consists of strings badly highlighted in some cases, some things with arithmetic and boolean operators and a few other small things as well.

Now, vim uses regexes for that kinda stuff (its own flavour).

However, I've started to come across editors which, at first glance, have syntax highlighting better taken care of. I've always thought that regexes are the way to go for that kind of stuff.

So I'm wondering, do those editors just have better written regexes, or do they take care of that in some other way ? What ? How is syntax highlighting taken care of when you want it to be "stable" ? And in your opinion what is the editor that has taken care it the best (in your editor of choice), and how did he do it (language-wise) ?

Edit-1: For example, editors like Emacs, Notepad2, Notepad++, Visual Studio - do you perchance know what mechanism they use for syn. high. ?

+1  A: 

I suggest the use of REs for syntax highlighting. If it's not working properly, then your RE isn't powerful or complicated enough :-) This is one of those areas where REs shine.

But given that you couldn't supply any examples of failure (so we can tell you what the problem is) or the names of the editors that do it better (so we can tell you how they do it), there's not a lot more we'll be able to give you in an answer.

I've never had any trouble with Vim with the mainstream languages and I've never had a need to use weird esoteric languages, so it suits my purposes fine.

paxdiablo
@Pax, you would really use REs over a full blown parser for Syntax Highlighting? I would of thought this would of been one of those cases you want to use a parser.
Simucal
Parsers are better but they generally have to process more of the source, and are more complex to write. REs (if done right) can be faster and work in the vast majority of circumstances because source tends to have natural checkpoints (e.g., semicolon for C, assuming it's not inside quotes). Keep in mind this is colored by my experience - I've never had to write Forth code in Vim so, for all I know, REs might be crap at that. The languages I do use seem to work fine although I could probably break them if I made my source code look ugly enough.
paxdiablo
+3  A: 

The thought that immediately comes to mind for what you'd want to use instead of regexes for syntax highlighting is parsing. Regexes have a lot of advantages, but as we see with vim's highlighting, there are limits. (If you look for threads about using regexes to analyze XML, you'll find extensive material on why regexes can't do what parsers do.)

Since what we want from syntax highlighting is for it to follow the syntactic structure of the language, which regexes can only approximate, you need to perform some level of real parsing to go beyond what regexes can do. A simple recursive descent lexer will probably do great for most languages, I'm thinking.

chaos
+1  A: 

If you want accurate highlighting one needs real programming not regular expressions. RegExs are rarely the answer fir anything but trivial tasks. To do highlighting in a better way you need to write a simple parser. Parses basically have separate components that each can do somethinglike identify and consume a quoted string or number literal. If said component when looking at it's given cursor can't consume what's underneath it foes nothing. From that you can easily parse or highlight fairly simply and easily.

Given something like

static int field = 123;

• The first macher would skip the whitespace before "static". The keyword, literal etc mstchers would to nothing because handling whitespace is not their thing.

• The keyword matched when positioned over "static" would consume that. Because "s" is not a digit the literal matched does nothing. The whitespdce skipper does nothing aswell because "s" is not a whitespace character.

Naturally your loop continues to advance the cursor over the input string until the end is reached. The ordering of your mstchers is of course important.

This approach is both flexible in that it handles syntsctically incorrect fragments and is also easy to extend end reuse infividusl mstchers to support highlighting g other languages...

mP
+2  A: 

Some programming languages have a formal definition/specification written in Backus-Naur Form. All*) programming languages can be described in it. All you then need, is some kind of parser for the notation.

*) not verified

For instance, C's BNF definition is "only five pages long".

Henrik Paul