tags:

views:

728

answers:

5

Hi there,

I am trying to extract just the domain name from a URL string. I almost have it... I am using URI

I have a string.. my first thought was to use Regex but then i decided to use URI class

http://www.google.com/url?sa=t&source=web&ct=res&cd=1&ved=0CAgQFjAA&url=http://www.test.com/&rct=j&q=test&ei=G2phS-HdJJWTjAfckvHJDA&usg=AFQjCNFSEAztaqtkaIvEzxmRm2uOARn1kQ

I need to convert the above to google.com and google without the www

I did the following

      Uri test = new Uri(referrer);
      log.Info("Domain part : " + test.Host);

Basically this returns www.google.com .... i would like to try and return 2 forms if possible... as mentioned...

google.com and google

Is this possible with URI?

Thanks in advance

+4  A: 

Yes, it is possible use:

Uri.GetLeftPart( UriPartial.Authority )
Dewfy
Thanks Dewfy, but this actually returns http://www.google.com .. hence the http:// and the www which is what i don't need
mark smith
I actually got it to return the same without the http:// but it has the www... using Uri.Host
mark smith
+3  A: 

google.com is not guaranteed to be the same as www.google.com (well, for this example it technically is, but may be otherwise).

maybe what you need is actually remove the "top level" domain and the "www" subodmain? Then just split('.') and take the part before the last part!

naivists
"google.com is not guaranteed to be the same as www.google.com" -- and in fact it isn't the same :)
Igor Korkhov
omg, really. www.google.com=209.85.129.104, google.com=209.85.129.147 :-)
naivists
+1  A: 

I think you are displaying a misunderstanding of what constitutes a "domain name" - there is no such thing as a "pure domain name" in common usage - this is something you will need to define if you want consistent results.
Do you just want to strip off the "www" part? And then have another version which strips off the top level domain (eg. strip off the ".com" or the ".co.uk" etc parts?) Another answer mentions split(".") - you will need to use something like this if you want to exclude specific parts of the hostname manually, there's nothing within the .NET framework to meet your requirements exactly - you'll need to implement these things yourself.

David_001
A: 

Because of the numerous variations in domain names and the non-existence of any real authoritative list of what constitutes a "pure domain name" as you describe, I've just resorted to using Uri.Host in the past. To avoid cases where www.google.com and google.com show up as two different domains, I've often resorted to stripping the www. from all domains that contain it, since it's almost guaranteed (ALMOST) to point to the same site. It's really the only simple way to do it without risking losing some data.

Chris
A: 

See Rick Strahl's blog recently as Reference for some c# and .net centric:

Making Sense of ASP.NET paths

Mark Schultheiss