ansaurus

Question

why the third option is better than regex?

Answer 1

+2 A:

The first option involves potentially two string compares.
The second option involves a parse each time.
The third option does a simple hash of the string and then a hash table look up, which is the most efficient in this case, in terms of the amount of work that needs to be done.

The third option also scales better than the other two as more alternative strings are added, because the first two are O(n) and the third is O(1) in the average case.

If we want to talk about which option is prettier / more maintainable, that's a whole separate conversation.

Michael Goldshteyn 2010-10-15 18:40:21

The third option is O(n) also: you have to create hash first. (unless you want cache it manually, of course)

Nikita Rybak 2010-10-15 18:43:53

Of course, it also involves a completely unnecessary memory allocation and still requires the string comparisons for the true case (the hashes *can* be sufficient for the false case, but not the true). It's also difficult for maintainers to read and any performance benefit almost certainly implementation-dependent and not worth the trouble in any but edge cases.

T.J. Crowder 2010-10-15 18:45:11

Yes, but presumably the hash is created once and then reused with a good javascript interpreter, because the values are constant. I do agree with T.J. Crowder on it being more difficult to read, though.

Michael Goldshteyn 2010-10-15 18:45:21

@Michael: So, not the one in the most popular browser on the planet, then.

T.J. Crowder 2010-10-15 18:46:06

@TJ, There's always IE9 :) .

Michael Goldshteyn 2010-10-15 18:46:49

@Michael: We're all hoping, yeah. :-)

T.J. Crowder 2010-10-15 21:22:22

Answer 2

A:

The object literal lookup is optimized with a hash lookup which only requires one logical check instead of n. In a longer list you also won't have to repeat "type == " a zillion times.

Peter DeWeese 2010-10-15 18:41:28

but you have to allocate and create the object

olliej 2010-10-15 19:15:20

Yes, but with a well implemented JS interpreter it doesn't have to be created multiple times. I'm not saying that it is necessarily implemented well.

Peter DeWeese 2010-10-15 20:13:02

You can't not allocate the object every time, as the object can escape. For example:

olliej 2010-10-16 01:05:37

erk, hitting enter in comments submits why?

olliej 2010-10-16 01:05:57

Object.prototype.__defineGetter__("wibble", function(){this.x = old.x+1; old=this;}); then if ({....}[someUnknownValue]) { ... } can escape and you have to create a new object. A lot of the slowness in JS is caused by the _possibility_ of someone doing something stupid.

olliej 2010-10-16 01:08:43

Answer 3

+6 A:

I'll humbly disagree with Rebecca Murphey and vote for simplicity, for the first option.

I think regex is pretty fast
Machine code is even faster, but we don't use it.

the third option is confusing
It's only confusing if you're unfamiliar with the trick. (And for people not used to seeing regex to compare two strings, second option will be even more confusing.)

Nikita Rybak 2010-10-15 18:41:42

Answer 4

A:

For simplicity and readability the first will win every time. It might not be as fast, but who cares unless it is in a heavily run loop.

Good compilers should optimize things like this away.

Patrick 2010-10-15 18:44:43

what are they optimising away? This is javascript is surprisingly difficult to optimise things out.

olliej 2010-10-15 19:14:55

When you are comparing two items you could argue the readability is better in the first one but if you have 4 or more it is going to start to fall onto more than one line and readability becomes an issue.

Toby 2010-10-15 19:27:34

@olliej: Lots of things are difficult to optimize in dynamic languages like JavaScript, but not `if (type == 'foo' || type == 'bar')` :-)

T.J. Crowder 2010-10-15 21:23:45

@T. J. Crowder: what are you going to optimise out? You have to do the checks (note i am assuming that strings are interned so you can typically get away with a pointer compare)

olliej 2010-10-16 01:10:18

I took a graduate level CS course where I optimized the heck out of a C++ program. My optimized program faired worse than what the compiler could do with "standard" code. Everyone in the class was the same way. Algorithm design is far more important then agonising over a single line or two. C++ is not Javascript, but the more vanilla your code is the easier for machine and human to work with.

Patrick 2010-10-18 19:48:32

Answer 5

+1 A:

The first case should really be done with === to avoid any type coercions, but depending on the number of alternatives you need to check it can become O(N), however depending on your code most JS engines will be able to a simple pointer check for the comparison.

In the second case you use a RegExp, and while RegExps are very fast, they tend to be slower for simple equality decisions than more direct equality comparisons. Simple string comparisons like yours are likely to be a pointer compare in a modern JS engine, but if you use a regexp the regexp must read every character.

The third case is more tricky -- if you do have a lot of values to check it may be faster, especially if you cache the object rather than repeatedly recreating it as it will simply be a hash lookup -- the exact performance of the lookup depends on the engine though.

I suspect a switch statement would beat the object literal case though.

Out of curiosity I made a test (which you can see here), the fastest approach (in a webkit nightly at least) seems to be a switch statement, followed by if, followed by the object, with regexp's last.

olliej 2010-10-15 19:13:43

+1, I honestly don't understand how she can call case 2 or 3 "best practices" and I really hope we won't start seeing everyone doing simple comparisons with `regexp`

David Titarenco 2010-10-15 19:43:44

Answer 6

+5 A:

I just made a rudimentary benchmark and I'm honestly not sure how she got those results... http://jsbin.com/uzuxi4/2/edit

Regex seems to scale the best, but the first is by far the fastest on all modern browsers. The last is excruciatingly slow. I understand the complexity theory between the three, but in practice, it doesn't seem that she's correct.

Let alone the fact that the first also has the best readability, it also seems to be the fastest. I even nested loops to take advantage of any browser caching of literal tables or constants (to no avail).

Edit: It appears that when an object is explicitly created, she is indeed correct, however: http://jsbin.com/uzuxi4/4/edit

function __hash() {
  ...

  var type = 'bar';
  var testobj = { foo : 1, bar : 1 };
  var c = 0;
  for (i = 0; i < 1000; i++) {
    if (testobj[type]) {
      for (j = 0; j < 10000; j++) {
          if (testobj[type]) { c++; }
      }
    }
  }

  ...
}

We see that once the object has an internal reference, the seek time drops to about 500 ms which is probably the plateau. Object key lookup may be the best for larger data-sets, but in practice I don't really see it as a viable option for every-day use.

David Titarenco 2010-10-15 19:16:12

She's ignoring the cost of what she's doing -- a [] access to an object is not cached in any modern engine, and of course you have to allocate that object (and then pay gc cost later).

olliej 2010-10-15 19:37:25

My brief testing showed switch to be the fastest :D

olliej 2010-10-15 19:37:54

Yes, but of course! Switch ftw :)

David Titarenco 2010-10-15 19:40:34

Answer 7

A:

Just wanted to weigh in here and remind everyone that this is an open-source book with contributions from many people! The section being discussed, indeed, is based on content provided by a community member. If you have suggestions for improving the section, by all means, please open an issue on the repository, or better, fork the repo and send me a pull request :)

That said, I have just set up a jsPerf test (http://jsperf.com/string-tests), and at least in Chrome, the results are the opposite of what the book says. I've opened an issue on the book, and will try to deal with this in the near future.

Finally, two things:

I want to echo what another commenter said: perf optimizations are fun to talk about, and while there are some that really do matter, many don't. It's important to keep perspective on how much -- or little -- of a difference stuff like this makes.
I also want to echo the commenter who said, essentially, that readability is in the eyes of the beholder. Something confusing to one person may be perfectly clear to another. I do believe we should strive for readability, but I think there's a happy medium. Reading code that was a bit perplexing to me at first opened my eyes to a lot of great techniques; I'd have hated if it had been written so the complete newb that I was at the time could understand it.

rmurphey 2010-10-16 12:38:50

ansaurus

tags:

views:

answers:

why the third option is better than regex?

related questions