views:

554

answers:

5

My issue is the following. I have a XHTML 1.1 page that has a form and input fields. One of the input fields contains a value which is an URI. This URI contains key-value pairs with ampersand (&) as argument separator, that will be passed as a GET request by another web application in the browser.

Usually I would use the entity & to create the ampersands to validate the code as XHTML 1.1. My problem here is that the application does not receive the GET request, since (as expected) the browser does not understand how to handle & in the URI.

So my question is really how to write an ampersand without using the HTML entity, so the browser still recognises it as the argument separator and the GET request is passed on properly to the web app.

I tried Hex (%26) encoding the ampersand but the browser still does not "translate" it back to a proper & character.

A related question, but it does not provide the exact answer to the question I am asking:

http://stackoverflow.com/questions/275150/xhtml-and-ampersand-encoding

+1  A: 

There is no way to include an ampersand character in an attribute value without using an entity.

There is no way to include an ampersand character as a textNode without using an entity or CDATA markers (but I bet you are serving as text/html so you can't use those).

That said — any browser which fails to decode the entity is broken. No mainstream browser fails there. You are either using an obscure and broken browser, or are misdiagnosing the problem.

David Dorward
Any major browser (IE or FF will do). The browser handles the decoding properly inside the HTML. I am referring to actually using the HTML entity in the address bar. Try it...
mr-euro
Well don't do that! You type plain URLs into the address bar, not HTML encoded URLs. That's like opening a Microsoft Word document in Notepad.
David Dorward
The browser is redirected to that URI as it was typed directly into the address bar, including HTML entities. That is my issue.
mr-euro
Either the browser is redirected to the URI, or the URI is typed into the address bar. It can't be both.
David Dorward
Obviously the former. What I am saying is that the effect is the same as if the URI was typed directly into the address bar (vs. being e.g. an anchor being clicked on).
mr-euro
So either your input is wrong (it still isn't clear what the input actually is, but it sounds like it should be an HTML encoded URI with any URI encoding of the ampersands being handled by the browser) or the server side form processor is broken.
David Dorward
A: 

Without the code its difficult to tell where you are trying to keep this information, if you could post the code we could do a better job understanding the problem.

One possible (if this is in fact what you are facing) is to move the items in the querystring into other form elements, such as:

<form action="example.com/?foo=1&bar=2>
    <!-- ... -->
</form>

to:

<form action="example.com">
    <input type="hidden" name="foo" value="1" />
    <input type="hidden" name="bar" value="2" />
    <!-- ... -->
</form>
mynameiscoffey
The querystring is not in the actual form action, but inside an input field's value field. It is a value which gets passed to a web app, which later returns the user's browser to that same URI (with the query-string in it). This is where it fails since the browser can not understand the HTML entity in the address bar.
mr-euro
Gotcha, my bad. If that is the case can't you just escape it when you stick it in the input field (which is probably best to do anyway to avoid any XSS attacks) and then un-escape it before you do the redirect server-side?
mynameiscoffey
Unfortunately the redirect comes from a 3rd party. So I need to send the URI exactly as I need it to come back... the 3rd party simply receives it and returns the user's browser to it after it has done other work.
mr-euro
+1  A: 

As mentioned in the other question you referenced, the browser converts the &amp; to & when the page is processed, so the "&" (not &amp;) should be sent to the server in the GET request. Perhaps you are using Ajax to make the GET request, in which case, you may need to decode the HTML. The entity is required for XHTML--no alternative encoding, just make sure it is properly decoded.

Reference: The & changes to &amp; in a hyperlink

Doug D
The issue is that the browser receives the HTML entity directly into the address bar (as if it was typed directly). I am not referring to the decoding that happens automatically e.g. if you use the ampersand equivalent entity inside an anchor.
mr-euro
How is the URI being put in the input field? If the value is part of the HTML, then it should be the entity name, if set using JavaScript, then it should not.
Doug D
A: 

I could not bother spending more time on this. I simply changed the argument separator to also include semi-colon (;) so I can use it instead of ampersand:

#cat .htaccess
php_value arg_separator.input "&;"
mr-euro
+1  A: 

the escaped & should be converted by the client (browser) everywhere in the XHTML document. so you should escape every & with &amp;

KARASZI István