views:

1675

answers:

5

I'm writing a web application that dynamically creates URL's based off of some input, to be consumed by a client at another time. For discussion sake these URL's can contain certain characters, like a forward slash (i.e. '/'), which should not be interpreted as part of the actual URL, but just as an argument. For example:

http://mycompany.com/PartOfUrl1/PartOfUrl2/ArgumentTo/Url/GoesHere

As you can see, the ArgumentTo/Url/GoesHere does indeed have forward slashes but these should be ignored or escaped.

This may be a bad example but the question in hand is more general and applies to other special characters.

So, if there are pieces of a URL that are just arguments and should not be used to resolve the actual web request, what's a good way of handling this?

Update:

Given some of the answers I realized that I failed to point out a few pieces that hopefully will help clarify.

I would like to keep this fairly language agnostic as it would be great if the client could just make a request. For example, if the client knew that it wanted to pass ArgumentTo/Url/GoesHere, it would be great if that could be encoded into a unique string in which the server could turn around and decode it to use.

Can we assume that similar functions like HttpUtility.HtmlEncode/HtmlDecode in the .NET Framework are available on other systems/platforms? The URL does not have to be pretty by any means so having real words in the path does not really matter.

Would something like a base64 encoding of the argument work?

It seems that base64 encoding/decoding is fairly readily available on any platform/language.

+4  A: 

You didn't say which language you're using, but PHP has the useful urlencode function and C# has HttpUtility.URLEncode and Server.UrlEncode which should encode parts of your URL nicely.

In case you need another way this page has a list of encoded values. E.g.: / == %2f.

update

From what you've updated I'd say use Voyagerfan's idea of URLRewriting to make something like:

http://www.example.com/([A-Za-z0-9/]+) http://www.example.com/?page=$1

And then use the applications GET parser to filter it out.

Ross
A: 

I believe what you're looking for, if using .net, is the HttpUtility.EncodeUrl() method, as it has many overrides. Look here: http://msdn.microsoft.com/en-us/library/system.web.httputility.urlencode.aspx

codewright
A: 

Use the HtmlEncode and Decode methods on the server object. I believe that will remove most characters that should not be and takes care of other things such as spaces, etc.

Here's the MSDN Article: http://msdn.microsoft.com/en-us/library/ms525347.aspx

Adron
+3  A: 

You could use Apache rewrites to rewrite http:// mycompany.com/PartOfUrl1/PartOfUrl2 to http:// mycompany.com/path/to/program.php and then pass in ArgumentTo/Url/GoesHere as a standard GET parameter. So what the server actually sends back is the response for http:// mycompany.com/path/to/program.php?arg=ArgumentTo/Url/GoesHere

Rewriting is a good way to guard against technology changes (so switching from PHP to ASP, for example, won't change your URLs) and provide friendly URLs to your users at the same time.

Update

Using your example URLs and building on what I said before, I'd say to use this code in your httpd.conf or .htaccess:

RewriteEngine On

RewriteRule http:// mycompany.com/PartOfUrl1/PartOfUrl2/([A-Za-z0-9]) http://mycompany.com/path/to/program.php?arg=$1

(BTW, remove the space after the first http:// in the RewriteRule, plus that line needs to contain no line breaks.)

Changing the paths, the filenames, name of the arg, etc. is fine; the critical parts here are the regex (([A-Za-z0-9])) and the $1.

Voyagerfan5761
Can anyone fix my post so it doesn't pull the Apache link through all the way to the middle of the update?
Voyagerfan5761
That was a bit random - I added spaces after the http:// parts but it really shouldn't have done that massive link. *slaps markdown*
Ross
The help page _does_ say SO hacked the parser a bit to support bare URLs and such... Might have something to do with it.
Voyagerfan5761
Anyway, thanks Ross. I added a note to let Scott know that the RewriteRule needs to have that extra space removed if he plans to use it.
Voyagerfan5761
+1  A: 

Yes, Base64 encoding your argument will work for you, however you'll need to make sure your entire URL is under the size limit of your target browser (2083 characters for IE 4 - 7, according to this page).

Erik Forbes