views:

268

answers:

4

I am embarking on some learning and I want to write my own syntax highlighting for files in C++.

Can anyone give me ideas on how to go about doing this?

To me it seems that when a file is opened:

  1. It would need to be parsed and decided what type of source file it is. Trusting the extension might not be fool-proof

  2. A way to know what keywords/commands apply to what language

  3. A way to decide what color each keyword/command gets

I want to do this on OS X, using C++ or Objective-C.

Can anyone provide pointers on how I might get started with this?

A: 

I think (1) isn't possible, since the only way to tell if a file is valid C++ is to run it through a C++ parser and see if it parses... but if you used that as your standard, you couldn't operate on code that doesn't compile because it is a work-in-progress, which you probably want to do. It's probably best just to trust the extension, as I don't think any other method will work better than that.

You can get a list of C++ keywords here: http://www.cppreference.com/wiki/keywords/start

The colors are up to you (or if you want, you can make them configurable and leave the choice to the user)

Jeremy Friesner
+3  A: 

Syntax highlighters typically don't go beyond lexical analysis, which means you don't have to parse the whole language into statements and declarations and expressions and whatnot. You only have to write a lexer, which is fairly easy with regular expressions. I recommend you start by learning regular expressions, if you haven't already. It'll take all of 30 minutes.

You may want to consider toying with Flex ( the lexical analyzer generator; http://flex.sourceforge.net/ ) as a learning exercise. It should be quite easy to implement a basic syntax highlighter in Flex that outputs highlighted HTML or something.

In short, you would give Flex a set of regular expressions and what to do with matching text, and the generator will greedily match against your expressions. You can make your lexer transition among exclusive states (e.g. in and out of string literals, comments, etc.) as shown here: http://flex.sourceforge.net/manual/How-can-I-match-C_002dstyle-comments_003f.html . Here's a canonical example of a lexer for C written in Flex: http://www.lysator.liu.se/c/ANSI-C-grammar-l.html .

Making an extensible syntax highlighter would be the next part of your journey. Although I am by no means a fan of XML, take a look at how Kate syntax highlighting files are defined, such as this one for C++ . Your task would be to figure out how you want to define syntax highlighters, then make a program that uses those definitions to generate HTML or whatever you please.

Joey Adams
+1  A: 

You may want to look at how GeSHI implements highlighting, etc. In addition, it has a whole bunch of language packs that contain all the keywords you'll ever want.

Dave DeLong
+1  A: 

Assuming that you are using Cocoa frameworks you can use UTIs to determine the file type.

For an overview of the api:

http://developer.apple.com/mac/library/documentation/FileManagement/Conceptual/understanding_utis/understand_utis_intro/understand_utis_intro.html#//apple_ref/doc/uid/TP40001319-CH201-SW1

For a list of known UTIs:

http://developer.apple.com/mac/library/documentation/Miscellaneous/Reference/UTIRef/Articles/System-DeclaredUniformTypeIdentifiers.html#//apple_ref/doc/uid/TP40009259-SW1

The two keys are you probably most interested in would be kUTTypeObjectiveC​PlusPlusSource and kUTTypeCPlusPlusHeader.

For the highlighting you might find the information on this page helpful as it discusses syntax highlighting with an NSView and temporary attributes:

http://www.cocoadev.com/index.pl?ImplementSyntaxHighlightingUsingTemporaryAttributes

sosborn