views:

111

answers:

5

In a follow-up to this question, I need to compare two strings in a case-insensitive manner, ignoring any non-alphanumeric characters except the comma and the semicolon, in JavaScript. So

Times New Roman, Times, Sans-Serif

matches

Times New Roman,Times,SansSerif

Can somebody get me started with the right function/approach? Is there something ready-made to do this in JS, or do I have to cut all clutter from both strings and compare them then?

+5  A: 

Normalize both strings and compare them:

str1.toLowerCase().replace(/[^a-z0-9,;]+/g, "") == str2.toLowerCase().replace(/[^a-z0-9,;]+/g, "")

Here the strings are converted to lowercase and then all characters except alphanumeric characters, the comma and semicolon are removed before comparison.

Gumbo
A: 

first search and replace on both strings:

s/[^a-zA-Z0-9,;]+/""/g

and then compare them.

ennuikiller
A: 

Or this:

var s1='Times New Roman, Times, Sans-Serif';
var s2='Times New Roman,Times,SansSerif';


/^(.+){2}$/i.test((s1+s2).replace(/[^\da-zA-Z,;]/g,''));
kennebec
Doesn't match the requirements of the OP - Specifically, the RE will not keep semi-colon's or numbers. Also, the `/^(.+){2}$/i` might be really clever, but it is more computationally expensive than "s1 == s2"
gnarf
kennebec
You meant `/^(.+)\1$/` - realized that `(.+){2}` will match any string with 2 characters. With that change added, I did some performance testing. Your regexp gets slightly worse performance than gumbos, occasionally beating mine. In the Match Case you provided (10000 iterations, 10 loops, throwing out variance and rounding): Gumbo 108, Me 131, You 132. Quadrupling string size Gumbo: 293, Me: 311, You: 415. Quad string + adding 1 char to start of s2: yours increases to 503, ours stay same. Also - yours would still match on `var s1='Times New Roman, Times, Times New Roman'; var s2=',Times,';`
gnarf
+3  A: 

Gumbo's method - but cleaner to read.

function compareStripped(str1, str2) {
  function strip(str) {
    // lower case and removes anything but letters, numbers, commas and semi-colons
    return str.toLowerCase().replace(/[^a-z0-9,;]+/g,'');
  }
  return strip(str1) == strip(str2);
}
gnarf
A: 

The suggested regex can be shortened with builtin character classes. Second, the normalization should be separate from the test for equality. Here's something that might not pass code review where I work, but it's so short I thought I'd post.

String.prototype.normalized = function() { 
 return this.replace(/[^\w\d,;]/g,"");
};

var s1='Times New Roman, Times, Sans-Serif';
var s2='Times New Roman,Times,SansSerif';

if(s1.normalized() == s2.normalized()) document.write("equality!");
billw