views:

965

answers:

3

I am implementing URL rewriting in ASP.net and my URLs are causing me a world of problems.

The URL is generated from a database of departments & categories. I want employees to be able to add items to the database with whatever special characters are appropriate without it breaking the site.

I am encoding the data before I construct the URLs.

There are several problems...

  1. IIS decodes the URL before it reaches .net making it impossible to properly parse anything with a "/" in it.
  2. ASP.net gets confused by the url making "~" useless within certain pages
  3. I migrated from the built in test server to my local IIS server (XP machine) and any URL containing an encoded & (%26) gives me a "Bad Request" error.
  4. UrlEncode leaves some breaking characters untouched such as '.'

I did have two other related posts on this subject, at the time I only saw the small problems not the big problem upstream. I've found some registry tricks to solve the "Bad Request" issue but I'm going to be deploying to a shared hosting environment making that useless. I also know that this is a fix for some security issue so I don't want to necessarily bypass it without knowing what can of worms I'm opening.

Rather than trying to force .net to pass me the raw url, or override IIS settings i'd like to make truly safe URLs in the first place.

I'll note i've tried AntiXss.URLEncode, HttpUtility.URLEncode, URI.EscapeDataString. I've even tried stupid things like double URLEncodng. Is there a utility that does what I need, or do i really need to roll my own. I'm even considering doing something Hacky like replacing the % with an unusual string of characters. The end result should be at least readable which was the point of using URL rewriting in the first place.

Sorry for the long post- I just wanted to make sure that I've included all the necessary details. I can't seem to find any relevant information on this, and it seems like it would be a common problem - so maybe I'm missing something big. Thanks for your help, and patience with the long explanation!


Edit for clarity:

When I say the urls are being built from a database what I mean is that the directory structure is contstructed from the departments and categories in my database.

Some Example URLS -

Mystore/Refrigeration/Bar+Fridge.aspx
Mystore/Cooking+Equipment.aspx
Mystore/Kitchen/Cutting+Boards.asxpx

The problems come in when I use a department like "Beverage & Bar" or "Pastry/Decorating" to construct my URL. Despite being encoded first these cause the aforementioned issues.

My handlers are already implemented and working fine except for the special character encoding issues.

+1  A: 

I have a url rewrite i implement in the global.asax file in the begin authenticated request as I have some security. This is where I take the raw url and then do the db look up. this then rewrites the path to the aspx page and all the parameters are passed through the query string. No encoding is necessary.

However if you are using the url to actually change data then i can see that you will have huge problems as you are effectively using the http GET to change database. It is usually concidered a bad idead, and not something i do.

I only use a post request to do any databse manipulation. This keeps the url clean as all the data is in the page form.

The only issue i had was to set the correct url to the page.form.action which in most cases is the raw url.

If its the category names that are causing the issue then perhaps you should restrict the names to alpha numeric characters only and swap spaces for "-". IIS will throw a wobbly with periods "." as it looks for file names.

P.S. IIS does not understand the tilde "~", this is something that the compiler understands. so if you use it in an anchor tag it will not work as expected and you should use the application root instead of the tilde.

Edit:

OK, it looks like an issue with IIS having issues with certain characters such as . / and &. Even if you do urlencode these IIS will still try to implement its own meanings. As such consider removing them so:

Beverage & bar becomes BeverageBar

Pastry / decorating becomes PastryDecorating.

This will keep you urls clean, but does mean an extra column in the database so you can cheack the url against this shortened category name.

Daisy Moon
Sorry I should have been clearer- I am not doing any database manipulation with my URLs. My store is broken down into departments and categories. Rather than being hard coded the directory structure is built from the database. The various menus have links of the form Mystore/Department or Mystore/Department/Category that while encoded and technically correct are being broken by IIS before the request even makes it back to my httpHandler.
apocalypse9
That could be the best solution. I may have just been massively over-complicating things. My only concern is that i'm going to need to be able to lookup items from the URL which could be complicated by a non-reversible method of encoding. My only other idea was to use Uri.EscapeDataString(b).Replace("%", "_") which i'm fairly sure would condemn me to programmer hell. Thank you very much for your fast responses and help on this.. I'm taking another look at my code to see if this will work.
apocalypse9
Thank you very much for your help. This is one of those times where I am profoundly frustrated that I can't accept multiple answers. You pointed me in the right direction and got me back on track with this... Thank you!!
apocalypse9
+1  A: 

You should consider having a table off of your category/department table which has a unique URL for each category. Then you can use a special routine to generate the URLs. This can be a SQL scalar function, or a CLR function, but one of the things it would do is normalize the URL for the web. You can convert "Beverage & Bar" to "Beverage-And-Bar" and "Pastry / Decorating" to "Pastry-Decorating". Mainly, the routine needs to replace all invalid HTTP URL characters with something else. An example is this:

public static class URL
{
    static readonly Regex feet = new Regex(@"([0-9]\s?)'([^'])", RegexOptions.Compiled);
    static readonly Regex inch1 = new Regex(@"([0-9]\s?)''", RegexOptions.Compiled);
    static readonly Regex inch2 = new Regex(@"([0-9]\s?)""", RegexOptions.Compiled);
    static readonly Regex num = new Regex(@"#([0-9]+)", RegexOptions.Compiled);
    static readonly Regex dollar = new Regex(@"[$]([0-9]+)", RegexOptions.Compiled);
    static readonly Regex percent = new Regex(@"([0-9]+)%", RegexOptions.Compiled);
    static readonly Regex sep = new Regex(@"[\s_/\\+:.]", RegexOptions.Compiled);
    static readonly Regex empty = new Regex(@"[^-A-Za-z0-9]", RegexOptions.Compiled);
    static readonly Regex extra = new Regex(@"[-]+", RegexOptions.Compiled);

    public static string PrepareURL(string str)
    {
        str = str.Trim().ToLower();
        str = str.Replace("&", "and");

        str = feet.Replace(str, "$1-ft-");
        str = inch1.Replace(str, "$1-in-");
        str = inch2.Replace(str, "$1-in-");
        str = num.Replace(str, "num-$1");

        str = dollar.Replace(str, "$1-dollar-");
        str = percent.Replace(str, "$1-percent-");

        str = sep.Replace(str, "-");

        str = empty.Replace(str, string.Empty);
        str = extra.Replace(str, "-");

        str = str.Trim('-');
        return str;
    }
}

You could make this a SQL enhance function, or run URL generation as a separate process. Then to implement mapping, you would map the entire URL directly to a category ID. This approach is better in the long run for several reasons. First, you are not always generating URLs, you do this once and they stay static, you don't have to worry about your procedure changing, and then GoogleBot not being able to find old URLs. Also, if you get a collision, you may notice a potential duplicate category name, because a collision would only be different by special characters. Finally, you can always view your URLs from the database, without having to run the mapping function.

eulerfx
That is absolutely perfect. Thank you very much, you saved me more time than I care to admit.
apocalypse9