views:

519

answers:

3

I am working to improve performance of a function that takes an XML string and replaces certain characters (encoding) before returning the string. The function gets pounded, so it is important to run as quickly as possible. The USUAL case is that none of the characters are present - so I would like to especially optimize for that. As you will see in the sample code, the strings to be replaced are short, and relatively few. The source strings will often be short (e.g. 10-20 characters) but could be longer (e.g. 200 characters or so).

So far I have ensured that the regexes are precompiled and I've eliminated nested functions which slowed down operation (partial milliseconds matter at this point.)

var objXMLToString = new XMLToStringClass();
function XMLToStringClass(){
    this.tester= /\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&amp/;
    this.replacements=[];
    var self=this;
    function init(){  
     var re = new regexReplacePair(/\\34/g,'"');
     self.replacements.push(re);
     re = new regexReplacePair(/\\39/g,"'");
     self.replacements.push(re);
     re = new regexReplacePair(/\\62/g,">");
     self.replacements.push(re);
     re = new regexReplacePair(/\\60/g,"<");
     self.replacements.push(re);
     re = new regexReplacePair(/\\13\\10/g,"\n");
     self.replacements.push(re);
     re = new regexReplacePair(/\\09/g,"\t");
     self.replacements.push(re);
     re = new regexReplacePair(/\\92/g,"\\");
     self.replacements.push(re);
     re = new regexReplacePair(/\&amp;/g,"&");  
     self.replacements.push(re);  
    }
    init();
}


function regexReplacePair(regex,replacementString){
    this.regex = regex;
    this.replacement = replacementString;
}

String.prototype.XMLToString = function() {
     newString=this;
     if(objXMLToString.tester.test(this)){
      for (var x = 0;x<objXMLToString.replacements.length;x++){
       newString=newString.replace(objXMLToString.replacements[x].regex,objXMLToString.replacements[x].replacement);
      }
      return newString;
     }
     return this;
}
  • Is it possible that in this scenario String.replace functions will be better?
  • Currently I am replacing all characters if a single character matches - is it possible that testing and then conditionally replacing would be better? If so, might indexOf be quicker than a regex for this dataset?
+6  A: 

You can use a hash lookup:

str.replace(
    /(\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&amp)/g,
    function () {
        return {
            "\\34": "\"",
            "\\39": "'",
            //...
            "&amp": "&"
        }[arguments(1)];
    }
);

Or you insists on extending the prototype:

var hash = {
    "\\34": "\"",
    "\\39": "'",
    //...
    "&amp": "&"
};

String.prototype.XMLToString = function () {
    return this.replace(
        /(\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&amp)/g,
        function () { return hash[arguments(1)]; }
    }
);

This might be faster (need to benchmark):

String.prototype.XMLToString = function () {
    var s = this;

    for (var r in hash) {
        s = s.split(r).join(hash[r]);
    }

    return s;
);


Update

You can generate your regex from the hash:

var arr = [];

for (var r in hash) {
    arr.push(r);
}

var re = new RegExp("(" + arr.join("|") + ")", "g");

and then use it as:

s = s.replace(re, function () { ... });
Ates Goral
Elegant solution - I forgot the newer versions of javascript let you use a function to determine the replace value.
Peter Bailey
Elegant solution, unfortunately it is slower then the original.
some
+2  A: 

Here's my stab at refactoring your code

  • Made objXMLToString a static object - didn't need to be instantiable
  • Got rid of inner init() function - used array literal instead
  • Converted regexReplacePair object with array literals
  • Converted for loop to while loop (usually faster)
  • Single return point
  • Scoped newString variable (now it's no longer global)

Here's the code

var objXMLToString = {
     tester: /\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&amp/
    ,replacements: [
         [/\\34/g,'"']
        ,[/\\39/g,"'"]
        ,[/\\62/g,">"]
        ,[/\\60/g,"<"]
        ,[/\\13\\10/g,"\n"]
        ,[/\\09/g,"\t"]
        ,[/\\92/g,"\\"]
        ,[/\&amp;/g,"&"]
    ]
}

String.prototype.XMLToString = function()
{
        var newString = this;
        if ( objXMLToString.tester.test( this ) )
        {
             var x = 0, replacer;
                while ( replacer = objXMLToString.replacements[x++] )
                {
                        newString = newString.replace( replacer[0], replacer[1] );
                }
        }
        return newString;
}
Peter Bailey
+6  A: 

I have benchmarked your original version, Ates Gorals hash, my improved hash, a version using switch, and the simple solution. The winner? The simple solution!

With matching data (string of 85 chars)

        original  simple  hash  switch  ag-hash
FF3          194     188   250     240     424
IE7          188     172   641     906    2203
Chrome1      161     156   165     165     225
Opera9       640     625   531     515     734

With non matching data (string of 85 chars):

        original  simple  hash  switch  ag-hash
FF3           39       4    34      34      39
IE7          125      15   125     125     156
Chrome1       45       2    54      54      57
Opera9       156      15   156     156     156

(tested on my window xp laptop, 1.7GHz, ymmv)

The simple solution:

function XMLToString(str) {
    return (str.indexOf("\\")<0 && str.indexOf("&")<0) ? str :
    str
    .replace(/\\34/g,'"')
    .replace(/\\39/g,"'")
    .replace(/\\62/g,">")
    .replace(/\\60/g,"<")
    .replace(/\\13\\10/g,"\n")
    .replace(/\\09/g,"\t")
    .replace(/\\92/g,"\\")
    .replace(/\&amp;/g,"&");               
}

Explanation:

First there is a check if there is a backslash or ampersand (it was faster to use indexOf instead of a regexp in all browsers). If there isn't the string is returned untouched, otherwise all the replacements are executed. In this case it don't makes mush difference to cache the regexps. I tried with two arrays, one with the regexps and one with the replacements, but it was not a big difference.

In your original version you have a test for all combinations just to detect if you should do the replacements or not. That is expensive since the regexp engine has to compare every position with every combination. I simplified it.

I improved Ates Gorals by moving the hash object outside (so it wasn't created and thrown away for every call to the inner function), moving the inner function outside so it can be reused instead of thrown away.

UPDATE 1 Bugfix: moved a parenthesis in the ampersand test.

UPDATE 2

One of your comments made me believe that you encode the string yourself, and if that is the case I suggest that you switch encoding to a standard one, so you can use the built in functions.

Instead of "\dd" where dd is a decimal number, use "%xx" where xx is the hexadecimal number. Then you can use the built in decodeURIComponent that is much faster and as a bonus can decode any characters, including unicode.

          matching    non match
FF3          44           3
IE7          93          16
Chrome1     132           1
Opera9      109          16

.

function XMLToString_S1(str) {
    return (str.indexOf("%")<0) ? str : decodeURIComponent(str).replace(/\x0D\x0A/g,"\n")
}

So instead of having a string like "\09test \60\34string\34\62\13\10\" you have a string like "%09test %3c%22string%22%3e%0d%0a".

some
That's great! I am surprised -- my understanding is that compiling regexes can be expensive, and in your final solution this happens everytime the function is called.
pc1oad1etter
Compiling regexpes is expensive, but the browser caches it.
some
pc1oad1etter
Good idea on the naive test up front.
pc1oad1etter