ansaurus

Question

High-performance way to compile regular expressions in AS3?

Answer 1

+2 A:

For the given scenario, I wrote a test class which gives me all the info I need on which type of regular expression to use:

package {
    import flash.utils.getTimer;
    import flash.text.TextFormat;
    import flash.text.TextField;
    import flash.display.Sprite;
    public class RegExpTest extends Sprite {

        private var textfield:TextField;

        public function RegExpTest() {
            this.textfield = new TextField();
            this.textfield.x = this.textfield.y = 10;
            this.textfield.width = stage.stageWidth - 20;
            this.textfield.height = stage.stageHeight - 20;
            this.textfield.defaultTextFormat = new TextFormat("Courier New");

            this.addChild(textfield);

            this.runtests();
        }

        private function runtests():void {
            output("Running Tests");
            output("-------------------------------");
            test("test1", 50000);
            test("test2", 50000);
            test("test3", 50000);
            test("test4", 50000);
            test("test5", 50000);
            test("test6", 50000);
        }

        private function test(methodName:String, iterations:int = 100):void {
            output("Testing method: " + methodName + ", " + iterations + " iterations...");    

            var wholeTimeStart:Number = getTimer();
            var iterationTimes:Array = [];

            for (var i:uint = 0; i < iterations; i++) {
                var iterationTimeStart:Number = getTimer();

                var tester:RegExpTester = new RegExpTester();
                // run method.
                (tester[methodName] as Function).apply();

                var iterationTimeEnd:Number = getTimer(); 
                iterationTimes.push(iterationTimeEnd - iterationTimeStart);
            }

            var wholeTimeEnd:Number = getTimer();

            var wholeTime:Number = wholeTimeEnd - wholeTimeStart;

            var average:Number = 0;
            var longest:Number = 0;
                var shortest:Number = int.MAX_VALUE;

            for each (var iteration:int in iterationTimes) {
                average += iteration;

                if (iteration > longest)
                    longest = iteration;

                if (iteration < shortest)
                    shortest = iteration;
            }

            average /= iterationTimes.length;

            output("Test Complete.");
            output("\tAverage Iteration Time: " + average + "ms");
            output("\tLongest Iteration Time: " + longest + "ms");
            output("\tShortest Iteration Time: " + shortest + "ms");
            output("\tTotal Test Time: " + wholeTime + "ms");
            output("-------------------------------");
        }

        private function output(message:String):void {
            this.textfield.appendText(message + "\n");
        }

    }
}

class RegExpTester {

    private static const expression4:RegExp = /.*a$/;

    private static const expression3:RegExp = new RegExp(".*a$");

    private var value:String = "There is a wonderful man which is quite smelly.";

    private var expression1:RegExp = new RegExp(".*a$");

    private var expression2:RegExp = /.*a$/;

    public function RegExpTester() {

    }

    public function test1():void {
        var result:Array = value.split(expression1);
    }

    public function test2():void {
        var result:Array = value.split(expression2);
    }

    public function test3():void {
        var result:Array = value.split(new RegExp(".*a$"));
    }

    public function test4():void {
        var result:Array = value.split(/.*a$/);
    }

    public function test5():void {
        var result:Array = value.split(expression3);
    }

    public function test6():void {
        var result:Array = value.split(expression4);
    }

}

The results retrieved by running this example are as follows:

Running Tests
-------------------------------
Testing method: test1, 50000 iterations...
Test Complete.
Average Iteration Time: 0.0272ms
Longest Iteration Time: 23ms
Shortest Iteration Time: 0ms
Total Test Time: 1431ms
-------------------------------
Testing method: test2, 50000 iterations...
Test Complete.
Average Iteration Time: 0.02588ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1367ms
-------------------------------
Testing method: test3, 50000 iterations...
Test Complete.
Average Iteration Time: 0.0288ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1498ms
-------------------------------
Testing method: test4, 50000 iterations...
Test Complete.
Average Iteration Time: 0.0291ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1495ms
-------------------------------
Testing method: test5, 50000 iterations...
Test Complete.
Average Iteration Time: 0.02638ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1381ms
-------------------------------
Testing method: test6, 50000 iterations...
Test Complete.
Average Iteration Time: 0.02666ms
Longest Iteration Time: 10ms
Shortest Iteration Time: 0ms
Total Test Time: 1382ms
-------------------------------

Interesting to say the least. It seems that our spread really isn't too big and that the compiler is probably doing something behind the scenes to statically compile regular expressions. Food for thought.

TK Kocheran 2010-10-12 15:02:14

Answer 2

+3 A:

Your test is not very good. Here's why:

getTimer measures time, but cpu-time actually matters. If at some moment, for some reason the scheduler decides not to run the flash player, then you have less cpu-time within the same time frame. This is why results vary. It is the best you can use, but actually not much of a help if you're trying to track deviations of a few %.
The deviation is really small. About 8%. Part of it stems from the effect described in point 1. When I ran the tests, results did vary in fact. 8% can come from anywhere. They may even simply depend on your machine, OS or minor player version or whatever. The result is just not signifficant enough to rely on. Also 8% speedup is not really worth consideration unless you find, there's a horrible bottleneck in your string processing that can be fixed by this or that trick with RegExps
Most imporantly: the difference you measure has nothing to do with regular expressions, only with everything else in the test.

Let my explain this in detail.
Try public function test7():void{}. On my machine it takes about 30%-40% of the other tests. Let's have some numbers:

    Running Tests
    -------------------------------
    Testing method: test1, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01716ms
        Longest Iteration Time: 5ms
        Shortest Iteration Time: 0ms
        Total Test Time: 901ms
    -------------------------------
    Testing method: test2, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01706ms
        Longest Iteration Time: 5ms
        Shortest Iteration Time: 0ms
        Total Test Time: 892ms
    -------------------------------
    Testing method: test3, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01868ms
        Longest Iteration Time: 5ms
        Shortest Iteration Time: 0ms
        Total Test Time: 969ms
    -------------------------------
    Testing method: test4, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01846ms
        Longest Iteration Time: 5ms
        Shortest Iteration Time: 0ms
        Total Test Time: 966ms
    -------------------------------
    Testing method: test5, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01696ms
        Longest Iteration Time: 5ms
        Shortest Iteration Time: 0ms
        Total Test Time: 890ms
    -------------------------------
    Testing method: test6, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01696ms
        Longest Iteration Time: 5ms
        Shortest Iteration Time: 0ms
        Total Test Time: 893ms
    -------------------------------
    Testing method: test7, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.00572ms
        Longest Iteration Time: 1ms
        Shortest Iteration Time: 0ms
        Total Test Time: 306ms
    -------------------------------

But Why? The following few things are expensive:

getTimer() call to global functions (and static methods of other classes) is slow
(tester[methodName] as Function).apply(); - this is expensive as fuck. A dynamic property access requiring a closure creation, then a cast to an anonymous function and then invocation through apply. I couldn't think of a slower way to call a function.
var tester:RegExpTester = new RegExpTester(); - instantiation is expensive, since it requires allocation and initialization.

the following code will run signifficantly better. The overhead measured by test7 is reduced by factor 20 on my machine:

    private function test(methodName:String, iterations:int = 100):void {
        output("Testing method: " + methodName + ", " + iterations + " iterations...");    
        var start:Number = getTimer();
        var tester:RegExpTester = new RegExpTester();
        var f:Function = tester[methodName];
        for (var i:uint = 0; i < iterations; i++) f();//this call to f still is slower than the direct method call would be
        var wholeTime:Number = getTimer() - start;
        output("Test Complete.");
        output("\tAverage Iteration Time: " + (wholeTime / iterations) + "ms");
        output("\tTotal Test Time: " + wholeTime + "ms");
        output("-------------------------------");
    }

again, some numbers:

    Running Tests
    -------------------------------
    Testing method: test1, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01094ms
        Total Test Time: 547ms
    -------------------------------
    Testing method: test2, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01094ms
        Total Test Time: 547ms
    -------------------------------
    Testing method: test3, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01296ms
        Total Test Time: 648ms
    -------------------------------
    Testing method: test4, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01288ms
        Total Test Time: 644ms
    -------------------------------
    Testing method: test5, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01086ms
        Total Test Time: 543ms
    -------------------------------
    Testing method: test6, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.01086ms
        Total Test Time: 543ms
    -------------------------------
    Testing method: test7, 50000 iterations...
    Test Complete.
        Average Iteration Time: 0.00028ms
        Total Test Time: 14ms
    -------------------------------

so now the overhead is reduced to less than 1%, which makes it insignifficant (although in fact it can be reduced a lot more). However the deviation is now 16%. That's twice as much. And it's starting to look a little clearer. It is still insignifficant, IMHO, but actually it points to the two slowest methods: test3 and test4.

Why would that be? Simple: Both methods create a new RegExp object (one using a literal, the other using the constructor). This consumes the time difference we can measure. The difference is bigger now, since before, per iteration you created 3 regular expressions (the two instance variables are initialized every time you instantiate a RegExpTester). But the difference that is left now is that of creating 50000 RegExp instances. Anything else is about equally fast.

If there's a conclusion to be drawn in answer to your question: There is no difference between literals or constructed RegExps. So I am afraid, the answer is: "It doesn't really matter, as long as you keep general performance optimization rules in mind.". Hope that helps.

back2dos 2010-10-14 12:52:58

Excellent answer! Thanks a bunch! Could you recommend a guide of some sort to general AS3 performance optimization? I'm writing a library that does a lot of work with strings and regular expressions, and I want to optimize the heck out of it, if you know what I mean.

TK Kocheran 2010-10-15 19:34:06

ansaurus

tags:

views:

answers:

High-performance way to compile regular expressions in AS3?

related questions