views:

295

answers:

4

I would like to craft a case-insensitive regex (for JavaScript) that matches street names, even if each word has been abbreviated. For example:

n univ av should match N University Ave

king blv should match Martin Luther King Jr. Blvd

ne 9th should match both NE 9th St and 9th St NE

Bonus points (JK) for a "replace" regex that wraps the matched text with <b> tags.

A: 

Simple:

var pattern = "n univ av".replace(/\s+/, "|");
var rx      = new RegExp(pattern, "gi");
var matches = rx.Matches("N University Ave");

Or something along these lines.

Paulo Santos
+4  A: 

You got:

"n univ av"

You want:

"\bn.*\buniv.*\bav.*"

So you do:

var regex = new RegExp("n univ av".replace(/(\S+)/g, function(s) { return "\\b" + s + ".*" }).replace(/\s+/g, ''), "gi");

Voilà!

But I'm not done, I want my bonus points. So we change the pattern to:

var regex = new RegExp("n univ av".replace(/(\S+)/g, function(s) { return "\\b(" + s + ")(.*)" }).replace(/\s+/g, ''), "gi");

And then:

var matches = regex.exec("N University Ave");

Now we got:

  • matches[0] => the entire expression (useless)
  • matches[even] => one of our matches
  • matches[odd] => additional text not on the original match string

So, we can write:

var result = '';
for (var i=1; i < matches.length; i++)
{
  if (i % 2 == 1)
    result += '<b>' + matches[i] + '</b>';
  else
    result += matches[i];
}
Fábio Batista
@fabio: those `.*` should be non-greedy - `.*?`. Also, your regex does not fulfil the matching requirement of *ne 9th*.
Andy E
You're right, it does not... I didn't noticed he wanted matches outside of the order as well.
Fábio Batista
@Fábio: You should regex-quote the input before you start. Otherwise +1, that's better than my approach.
Tomalak
@Fábio: So I've been tinkering with this for quite some time now. I'm struggling with the last bit. If I execute the code exactly as shown, I get this `matches` array: `["N University Ave", " ", "ersity ", "e"]`. Is this right?
nw
Sorry, the replace function was wrong. I fixed it now: `var regex = new RegExp("n univ av".replace(/(\S+)/g, function(s) { return "\\b(" + s + ")(.*)" }).replace(/\s+/g, ''), "gi");`
Fábio Batista
I'm on the right track now but a bit hung up on this now: http://stackoverflow.com/questions/2669861/regular-expression-test-cant-decide-between-true-and-false-javascript
nw
Yes, it worked! The only change I made was to append `(.*?)` to the beginning of the 2nd regex, to capture any text preceding the first match. I then changed `var result = ''` to `var result = matches[1]`, and adapted the loop accordingly.
nw
A: 

If these are your search terms:

  1. n univ av
  2. king blv
  3. ne 9th

It sounds like your algorithm should be something like this

  1. split search by space (results in search terms array) input.split(/\s+/)
  2. attempt to match each term within your input. /term/i
  3. for each matched input, replace each term with the term wrapped in <b> tags. input.replace(/(term)/gi, "<b>\$1</b>")

Note: You'll probably want to take precaution to escape regex metacharacters.

macek
@macek: Point #3 fails because `string.replace()` always begins at the start of the string, potentially leading to invalid tag nesting etc when the second search term is part of the first. Or the second search term is `"b"`. ;)
Tomalak
@Tomalak, thanks for catching this. Your method is highly superior.
macek
@macek: Fábio's is even better. ;) It addresses the issues mine has, and is's shorter, too.
Tomalak
+1  A: 
function highlightPartial(subject, search) {
  var special = /([?!.\\|{}\[\]])/g;
  var spaces  = /^\s+|\s+/g;
  var parts   = search.split(" ").map(function(s) { 
    return "\b" + s.replace(spaces, "").replace(special, "\\$1");
  });
  var re = new RegExp("(" + parts.join("|") + ")", "gi");
  subject = subject.replace(re, function(match, text) {
    return "<b>" + text + "</b>";
  });
  return subject;
}

var result = highlightPartial("N University Ave", "n univ av");
// ==> "<b>N</b> <b>Univ</b>ersity <b>Av</b>e"

Side note - this implementation does not pay attention to match order, so:

var result = highlightPartial("N University Ave", "av univ n");
// ==> "<b>N</b> <b>Univ</b>ersity <b>Av</b>e"

If that's a problem, a more elaborate sequential approach would become necessary, something that I have avoided here by using a replace() callback function.

Tomalak
@Tomalak, great response. +1. How would you only return results that matched all the terms?
macek
@macek: Well, that would require some work. I think writing a loop over all the parts and matching them individually against `subject`, incrementing a counter as you go would do the trick. If the counter matches the number of parts, all of them are in the input.
Tomalak