views:

208

answers:

1

Background (question further down)

I've been Googling this back and forth reading RFCs and SO questions trying to crack this, but I still don't got jack.

So I guess we just vote for the "best" answer and that's it, or?

Basically it boils down to this.

3.4. Query Component

The query component is a string of information to be interpreted by the resource.

query = *uric

Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.

The first thing that boggles me is that *uric is defined like this

uric = reserved | unreserved | escaped

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","

This is however somewhat clarified by paragraphs such as

The "reserved" syntax class above refers to those characters that are allowed within a URI, but which may not be allowed within a particular component of the generic URI syntax; they are used as delimiters of the components described in Section 3.

Characters in the "reserved" set are not reserved in all contexts. The set of characters actually reserved within any given URI component is defined by that component. In general, a character is reserved if the semantics of the URI changes if the character is replaced with its escaped US-ASCII encoding.

This last excerpt feels somewhat backwards, but it clearly states that the reserved character set depends on context. Yet 3.4 states that all the reserved characters are reserved within a query component, however, the only things that would change the semantics here is escaping the question mark (?) as URIs do not define the concept of a query string.

At this point I've given up on the RFCs entirely but found RFC 1738 particularly interesting.

An HTTP URL takes the form:

http://<host>:<port>/<path>?<searchpart>

Within the and components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.

I interpret this at least with regards to HTTP URLs that RFC 1738 supersedes RFC 2396. Because the URI query has no notion of a query string also the interpretation of reserved doesn't really let allow me to define query strings as I'm used to doing by now.

Question

This all started when I wanted to pass a list of numbers together with the request of another resource. I didn't think much of it, and just passed it as a comma separated values. To my surprise though the comma was escaped. The query page.html?q=1,2,3 encoded turned into page.html?q=1%2C2%2C3 it works, but it's ugly and didn't expect it. That's when I started going through RFCs.

My first question is simply, is encoding commas really necessary?

My answer, according to RFC 2396: yes, according to RFC 1738: no

Later I found related posts regarding the passing of lists between requests. Where the csv approach was poised as bad. This showed up instead, (haven't seen this before).

page.html?q=1;q=2;q=3

My second question, is this a valid URL?

My answer, according to RFC 2396: no, according to RFC 1738: no (; is reserved)

I don't have any issues with passing csv as long as it's numbers, but yes you do run into the risk of having to encode and decode values back and forth if the comma suddenly is needed for something else. Anyway I tried the semi-colon query string thing with ASP.NET and the result was not what I expected.

Default.aspx?a=1;a=2&b=1&a=3

Request.QueryString["a"] = "1;a=2,3"
Request.QueryString["b"] = "1"

I fail to see how this greatly differs from a csv approach as when I ask for "a" I get a string with commas in it. ASP.NET certainly is not a reference implementation but it hasn't let me down yet.

But most importantly -- my third question -- where is specification for this? and what would you do or for that matter not do?

+3  A: 
outis