tags:

views:

818

answers:

4

When the .NET System.Uri class parses strings it performs some normalization on the input, such as lower-casing the scheme and hostname. It also trims trailing periods from each path segment. This latter feature is fatal to OpenID applications because some OpenIDs (like those issued from Yahoo) include base64 encoded path segments which may end with a period.

How can I disable this period-trimming behavior of the Uri class?

Registering my own scheme using UriParser.Register with a parser initialized with GenericUriParserOptions.DontCompressPath avoids the period trimming, and some other operations that are also undesirable for OpenID. But I cannot register a new parser for existing schemes like HTTP and HTTPS, which I must do for OpenIDs.

Another approach I tried was registering my own new scheme, and programming the custom parser to change the scheme back to the standard HTTP(s) schemes as part of parsing:

public class MyUriParser : GenericUriParser
{
    private string actualScheme;

    public MyUriParser(string actualScheme)
        : base(GenericUriParserOptions.DontCompressPath)
    {
        this.actualScheme = actualScheme.ToLowerInvariant();
    }

    protected override string GetComponents(Uri uri, UriComponents components, UriFormat format)
    {
        string result = base.GetComponents(uri, components, format);

        // Substitute our actual desired scheme in the string if it's in there.
        if ((components & UriComponents.Scheme) != 0)
        {
            string registeredScheme = base.GetComponents(uri, UriComponents.Scheme, format);
            result = this.actualScheme + result.Substring(registeredScheme.Length);
        }

        return result;
    }
}

class Program
{
    static void Main(string[] args)
    {
        UriParser.Register(new MyUriParser("http"), "httpx", 80);
        UriParser.Register(new MyUriParser("https"), "httpsx", 443);
        Uri z = new Uri("httpsx://me.yahoo.com/b./c.#adf");
        var req = (HttpWebRequest)WebRequest.Create(z);
        req.GetResponse();
    }
}

This actually almost works. The Uri instance reports https instead of httpsx everywhere -- except the Uri.Scheme property itself. That's a problem when you pass this Uri instance to the HttpWebRequest to send a request to this address. Apparently it checks the Scheme property and doesn't recognize it as 'https' because it just sends plaintext to the 443 port instead of SSL.

I'm happy for any solution that:

  1. Preserves trailing periods in path segments in Uri.Path
  2. Includes these periods in outgoing HTTP requests.
  3. Ideally works with under ASP.NET medium trust (but not absolutely necessary).
+1  A: 

You should be able to precent escape the '.' using '%2E', but that's the cheap and dirty way out.

You might try playing around with the dontEscape option a bit and it may change how Uri is treating those characters.

More info here: http://msdn.microsoft.com/en-us/library/system.uri.aspx

Also check out the following (see DontUnescapePathDotsAndSlashes): http:// msdn.microsoft.com/en-us/library/system.genericuriparseroptions.aspx

Brandon Black
Thanks, Brandon. The `DontUnescapePathDotsAndSlashes` option is one possible workaround, although to work effectively it needs to be applied to the existing HTTP and HTTPS parsers, which is only possible in .NET 4.0 (unless you use reflection as has been suggested in other answers here).
Andrew Arnott
+3  A: 

Microsoft says it will be fixed in .NET 4.0 (though it appears from the comments that it has not been fixed yet)

https://connect.microsoft.com/VisualStudio/feedback/details/386695/system-uri-incorrectly-strips-trailing-dots?wa=wsignin1.0#tabs

There is a workaround on that page, however. It involves using reflection to change the options though, so it may not meet the medium trust requirement. Just scroll to the bottom and click on the "Workarounds" tab.

Thanks to jxdavis and Google for this answer:

http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/5206beca-071f-485d-a2bd-657d635239c9

Maxx Daymon
The MS Connect bug is out of date, unfortunately. The .NET team has told me directly that .NET 4.0 does not fix the dot bug. But the workaround is interesting. Thanks.
Andrew Arnott
+2  A: 

I'm curious if part of the problem is that you are only accounting for "don't compress path", instead of all the defaults of the base HTTP parser: (including UnEscapeDotsAndSlashes)

  private const UriSyntaxFlags HttpSyntaxFlags = (UriSyntaxFlags.AllowIriParsing | UriSyntaxFlags.AllowIdn | UriSyntaxFlags.UnEscapeDotsAndSlashes | UriSyntaxFlags.CanonicalizeAsFilePath | UriSyntaxFlags.CompressPath | UriSyntaxFlags.ConvertPathSlashes | UriSyntaxFlags.PathIsRooted | UriSyntaxFlags.AllowAnInternetHost | UriSyntaxFlags.AllowUncHost | UriSyntaxFlags.MayHaveFragment | UriSyntaxFlags.MayHaveQuery | UriSyntaxFlags.MayHavePath | UriSyntaxFlags.MayHavePort | UriSyntaxFlags.MayHaveUserInfo | UriSyntaxFlags.MustHaveAuthority);

That's as opposed to the news that has flags (for instance):

 private const UriSyntaxFlags NewsSyntaxFlags = (UriSyntaxFlags.AllowIriParsing | UriSyntaxFlags.MayHaveFragment | UriSyntaxFlags.MayHavePath);

Dang, Brandon Black beat me to it while I was working on typing things up...

This may help with code readability:

namespace System 
{
    [Flags]
    internal enum UriSyntaxFlags
    {
        AllowAnInternetHost = 0xe00,
        AllowAnyOtherHost = 0x1000,
        AllowDnsHost = 0x200,
        AllowDOSPath = 0x100000,
        AllowEmptyHost = 0x80,
        AllowIdn = 0x4000000,
        AllowIPv4Host = 0x400,
        AllowIPv6Host = 0x800,
        AllowIriParsing = 0x10000000,
        AllowUncHost = 0x100,
        BuiltInSyntax = 0x40000,
        CanonicalizeAsFilePath = 0x1000000,
        CompressPath = 0x800000,
        ConvertPathSlashes = 0x400000,
        FileLikeUri = 0x2000,
        MailToLikeUri = 0x4000,
        MayHaveFragment = 0x40,
        MayHavePath = 0x10,
        MayHavePort = 8,
        MayHaveQuery = 0x20,
        MayHaveUserInfo = 4,
        MustHaveAuthority = 1,
        OptionalAuthority = 2,
        ParserSchemeOnly = 0x80000,
        PathIsRooted = 0x200000,
        SimpleUserSyntax = 0x20000,
        UnEscapeDotsAndSlashes = 0x2000000,
        V1_UnknownUri = 0x10000
    }
}
drachenstern
+1  A: 

Does this work?

public class MyUriParser : UriParser
{
private string actualScheme;

public MyUriParser(string actualScheme)
{
    Type type = this.GetType();
    FieldInfo fInfo = type.BaseType.GetField("m_Flags", BindingFlags.Instance | BindingFlags.NonPublic);
    fInfo.SetValue(this, GenericUriParserOptions.DontCompressPath);
    this.actualScheme = actualScheme.ToLowerInvariant();
}

protected override string GetComponents(Uri uri, UriComponents components, UriFormat format)
{
    string result = base.GetComponents(uri, components, format);

    // Substitute our actual desired scheme in the string if it's in there. 
    if ((components & UriComponents.Scheme) != 0)
    {
        string registeredScheme = base.GetComponents(uri, UriComponents.Scheme, format);
        result = this.actualScheme + result.Substring(registeredScheme.Length);
    }

    return result;
}}
Raj Kaimal
Sorry, should be reflecting on m_Table and removing existing entries.
Raj Kaimal
Why use reflection to set a flag that can be easily set in the base constructor if you derive from `GenericUriParser`?
Andrew Arnott
Thats exactly why I added the comment above :-) I meant to set m_Table and not m_Flags.
Raj Kaimal