views:

570

answers:

3

I am building a 'keyword' highlighting script and I need to write a regular expression to parse the following url, to find the searched for keywords to highlight.

I am REALLY bad with regex, so I was hoping someone who rocks it, could help.

Our search string uses "skw" as the parameter and "%2c" (comma) to separate terms, with "+" for spaces.

Example URLS:

http://[url].com/Search.aspx?skw=term1 http://[url].com/Search.aspx?skw=term1%2c+term2

Is there a single RegExp that I can use that will give me a collection that looks like this?

var "http://[url].com/Search.aspx?skw=term+one%2c+term2".match([Expression]);

matches[0] = "term one"
matches[1] = "term2"

Thanks! Any help is greatly appreciated.

+1  A: 

This task doesn't really lend itself to a single regular expression. Check out Parsing Query Strings in JavaScript for a script to assist you.

laz
A more compact implementation: http://magnetiq.com/2009/07/08/parsing-query-string-parameters-into-a-collection/
Ates Goral
+1  A: 
https?://[^\?]+\?skw=([^%]*)(?:%2c\+*(.*))?

In javascript this is

var myregexp = /https?:\/\/[^\?]+(?:\?|\?[^&]*&)skw=([^%]*)(?:%2c\+*(.*))?/;
var match = myregexp.exec(subject);
if (match != null && match.length > 1) {
    var term1 = match[1];
    var term2 = match[2];
}

EDIT:

Sorry, I re-read your question, to handle multiple terms you need to combine this with a split

var subject = "http://[url].com/Search.aspx?skw=term1+one%2c+term2";
var myregexp = /https?:\/\/[^\?]+(?:\?|\?[^&]*&)skw=([^&]*)?/;
var match = myregexp.exec(subject);
if (match != null && match.length > 1) {
  var terms = unescape(match[1]).split(",");
}
cdm9002
cdm9002
Thank you! This worked beautifully!!
JasonW
+2  A: 

You can't do this with a single match, but you can do it with a matches, a replace and a split:

url.match(/\?(?:.*&)?skw=([^&]*)/)[1].replace(/\+/g, " ").split('%2c')

You may want to do the match separately and bail out if the match fails (which it could if yur URL didn't have an skw parameter).

You probably really want to do an unescape too, to handle other escaped characters in the query:

unescape(url.match(/\?(?:.*&)?skw=([^&]*)/)[1].replace(/\+/g, " ")).split(',')
Laurence Gonsalves
Great answer. It looks like he also has spaces following commas that I'm guessing he doesn't want to capture. (i.e. for "term1, term2" we get ["term1", " term2"]) You could split on /,\s*/ if you want to remove the extra space.
Prestaul