views:

52

answers:

2

I am writing a simple syntax highlighter in JavaScript, and I need to find a way to test with multiple regular expressions at the same time.

The idea is to find out which comes first, so I can determine the new set of expressions to look for.

The expressions could be something like:

/<%@/, /<%--/, /<!--/ and /<[a-z:-]/

First I tried a strategy where I combined the expressions in groups like:

/(<%@)|(<%--)|(<!--)|(<[a-z:-])/

That way I could find out which matched group was not undefined. But the problem is, when some of the subexpressions contain groups or backrefferences.

So my question is this:

Does anyone know a good and reasonable way the look for matches with multiple regular expressions in a string?

+5  A: 

Is there any particular reason why you can't tokenize the input and then test the beginning of each token to see what type it is for the purposes of highlighting? I think you're overthinking this one. A simple cascade of if-elseifs will cover this just fine:

if (token.startsWith("<%@")) {
  // paint it red
}
else if (token.startsWith("<%--")) {
  // paint it green
}
else if (token.startsWith("<!--")) {
  // paint it blue
}
else if (token.matches("^<[a-z:-]")) {
  // paint it black
}

The above is pseudocode and needs to be magically translated into JavaScript. I leave this as an exercise for the reader.

Welbog
Though this example is a bit simple, the syntax highlighter is a bit more complex, with dynamic grammars and scopes with posibility of injection of other grammar rules... and stuff! But yeah I might be overthinking it a bit. Your solutions is a step in the right direction. Thanks
Otey
+2  A: 

ANTLR is an excellent grammar development system. There's a project to build a JavaScript back-end for it at http://code.google.com/p/antlr-javascript/

I agree with Welbog's answer to your regex question, but you can probably learn a lot about implementing JavaScript grammars by looking at the ANTLR generated ones.

Ken Fox