views:

2573

answers:

5

I've recently come across this piece of JavaScript code:

if (",>=,<=,<>,".indexOf("," + sCompOp + ",") != -1)

I was intrigued, because to write this test I would have done:

if (/(>=|<=|<>)/.test(sCompOp))

Is this just a stylistic difference, or does the author of the other code know something about optimization that I don't? Or perhaps there is a different good reason to do this, or to not use regexes...?

It seems to me that using String.indexOf() for this is a little more difficult to read (but then, I'm quite comfortable with regular expressions), but are there instances where it might be "better" than writing an equivalent regex?

By "better" that might be quicker or more efficient, (although obviously that depends on the browser's JavaScript engine), or some other reason I'm not aware of. Can anyone enlighten me?

+1  A: 

Is it a really old piece of code? It may have been written before regexes were widely used in javascript. Overall it looks like someone was trying to be too clever and "optimized" that statement or came from a C background and wasn't used to regexes. A regular expression can be expensive to use but string concatenation can be too and if it's not in a loop I would go with whichever one is easier to understand (for me, the regex).

sk
+1  A: 

There might have been a noticeable speed difference once upon a time, but it's not the case anymore. I think this is either:

  1. Legacy code from (godless) The Land Before REGEX.
  2. Written by somebody who doesn't know about REGEX or is afraid of it.
Oli
+4  A: 

I ran some tests. The first method is slightly faster, but not by enough to make any real difference even under heavy use... except when sCompOp could potentially be a very long string. Because the first method searches a fixed-length string, its execution time is very stable no matter how long sCompOp gets, while the second method will potentially iterate through the entire length of sCompOp.

Also, the second method will potentially match invalid strings - "blah blah blah <= blah blah" satisfies the test...

Given that you're likely doing the work of parsing out the operator elsewhere, i doubt either edge case would be a problem. But even if this were not the case, a small modification to the expression would resolve both issues:

/^(>=|<=|<>)$/


Testing code:

function Time(fn, iter)
{
   var start = new Date();
   for (var i=0; i<iter; ++i)
      fn();
   var end = new Date();
   console.log(fn.toString().replace(/[\r|\n]/g, ' '), "\n : " + (end-start));
}

function IndexMethod(op)
{
   return (",>=,<=,<>,".indexOf("," + op + ",") != -1);
}

function RegexMethod(op)
{
   return /(>=|<=|<>)/.test(op);
}

function timeTests()
{
   var loopCount = 50000;

   Time(function(){IndexMethod(">=");}, loopCount);
   Time(function(){IndexMethod("<=");}, loopCount);
   Time(function(){IndexMethod("<>");}, loopCount);
   Time(function(){IndexMethod("!!");}, loopCount);
   Time(function(){IndexMethod("the quick brown foxes jumped over the lazy dogs");}, loopCount);
   Time(function(){IndexMethod("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<");}, loopCount);

   Time(function(){RegexMethod(">=");}, loopCount);
   Time(function(){RegexMethod("<=");}, loopCount);
   Time(function(){RegexMethod("<>");}, loopCount);
   Time(function(){RegexMethod("!!");}, loopCount);
   Time(function(){RegexMethod("the quick brown foxes jumped over the lazy dogs");}, loopCount);
   Time(function(){RegexMethod("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<");}, loopCount);
}

timeTests();

Tested in IE6, FF3, Chrome 0.2.149.30

Shog9
+1  A: 

I doubt it is a question of performance or optimization. I would suspect the author of that code was simply not comfortable or familiar with regular expressions. Also notice how the comma-separated string isn't split apart in order to leverage object properties - possibly also a case of lack of familiarity with the language.

For example, another way of testing for an operator in a comma-separated list of allowable operators would be to split the comma-separated list of allowable operators and do a one-time initialization of an object with the operators as properties:

var aOps = ">=,<=,<>".split(",");
var allowableOps = {};
for (var iLoop = 0; iLoop < aOps.length; iLoop++) {
  allowableOps[aOps[iLoop]] = true;
} //for

This small initialization overhead would likely be offset by the ability to do speedy lookups:

if (allowableOps[sCompOp]) { ... }

Of course, this could end up being slower overall, but it is arguably a cleaner approach.

J c
I implemented your suggestion as follows: var allowedOperators = {'>=' : true, '<=' : true, '<>' : true};function OpTest(op) { return !!allowedOperators[op]; }...it's roughly twice as fast as either of the other two methods in FF3, but slower in IE6 (which is already 10x slower than FF3).
Shog9
It is cleaner though. I like it. And since it could double as a dispatch table, it might well be an overall improvement regardless of browser.
Shog9
A: 

That reminds me of some early javascript-based getElementsByClassName implementations.

indexOf was much faster than using a regular expression, but the code that made use of indexOf started from the assumption that the developer would separate class names with spaces (and not tabs and line feeds). To be fair, some regexp-based implementations were using \b (word boundary), which is incompatible with the CSS spec (because CSS allows hyphens in class names).

Using indexOf to support getElementsByClassName in IE can really make a difference, because there are no faster alternatives, and the className property's underlying getter/setter conveniently replace tabs and linefeeds with spaces.

Leo