views:

553

answers:

2

I am using Javascript and trying to break out query string variables from their values. I made a regex that works just fine IF there are no other ampersands except for denoting variables, otherwise the data cuts off at the ampersand.

example: ajax=10&a=test&b=cats & dogs returns a = "test", b = "cats "

I cannot encode the ampersands before the string is made due to the nature of this project and the inefficiency with encoding/replacing characters in hundreds of locations upon entry.

What this piece of code should ultimately do is turn the querystring ajax=10&a=cats & dogs into ajax=10&a=cats%20%26%20dogs

list = [ 'ajax','&obj','&a','&b','&c','&d','&e','&f','&g','&h','&m' ];
ajax_string = '';
for (var i=0, li=list.length; i<li; i++) {
    variables = new RegExp(list[i] +"=([^&(.+)=]*)");
    query_string = variables.exec(str);
    if (query_string != null) {
        alert(query_string);
    }
}
+2  A: 

The query string should be split on ampersands. Any ampersands in the values of actual arguments should be converted to %26.

This is what the query string you posted should look like:

ajax=10&a=test&b=cats+%26+dogs

The query string you posted should give you this:

'ajax':  '10'
'a':     'test'
'b':     'cats '
' dogs': ''

Edit

It looks like you actually want to sanitize a query string that other developers have built lazily. If we assume that: a) every argument name matches /[a-zA-Z0-9]+/; and b) it is always followed by an equals sign, then this code will work:

var queryString = 'ajax=10&a=test&b=cats & dogs';
var parts = queryString.split(/&(?=[a-zA-Z0-9]+\=)/);
for(var i = 0; i < parts.length; i++)
{
  var index = parts[i].indexOf('=') + 1;
  if(index > 0)
    parts[i] = parts[i].substring(0, index) + escape(parts[i].substring(index));
  //else: error?
}
queryString = parts.join("&");
alert("queryString: " + queryString);
Kip
Problem is with multiple people working on a project it is easier to encode the ampersands ( which is what this little loop will ultimately do for each variable ) in one location before sending instead of individually encoding/replacing in the 120+ other places data is entered from and manipulated in.
I commented before you edited :) I know what the current query string's data looks like when I get it to the backend, that is what I am trying to get around. I just isn't possible to encode ampersands in all the places data is entered at, it's not clean or efficient.
updated my post to make it a little more clear I hope, to early in the morning -_-
@Kip: you're making it real easy for Clint to do the wrong thing here ;) Hope he/she appreciates!
Roatin Marth
Thanks Kip this is the solution I am looking for. Got too focused on regex grouping, just facepalmed that I didn't try splitting with it.
Matthew Lock
+1  A: 

> I cannot encode the ampersands before the string is made due to the nature of this project

Then you won't have a full-proof answer.

Ampersands ("&") separate query parameters in url query strings. You can't have it both ways where some of your query parameter values contain un-escaped "&" and expect a parser based on this simple rule to know the difference.

If you can't escape "&" as "%26" in each value component beforehand, then you can never know that the values you get are correct. The best you could do is: If the value to the right of an "&" and before the next "&" does not contain an equal sign "=", you append the value to the previous value read, or the empty string if this is the first value read.

This requires a proper parser as JavaScript does not support lookahead regular expressions that could help you do this.

Note however that an algorithm like that completely bypasses the spec. Presuming for a moment that the query string:

a=test&b=cats & dogs&c=test

is valid, technically that string contains 4 parameters: "a" (with a value of "test"), "b" (with a value of "cats "), " dogs" (with no value), and "c" (with a value of "test").

If you don't change the query string at the source (and properly escape the value component), you're just hacking in the wrong solution.

Good luck.

Roatin Marth
I understand how it all works, been doing this for many years. The issue that has come up is we have pages upon pages that all have their own functions before/after input and adding encoding on top of each of those hundreds of functions is unreasonable, and unmanageable. Performing the action in one location by default on all data seems the most reasonable at the moment.Thanks for your input.
"Ampersands ("'."
Matthew Lock
@Matthew Lock: Not period, right you are.
Roatin Marth