views:

29

answers:

3

In my website everybody can send some links to other nice websites. All links in my database must by unique, but some links are with 'www.' prefix, and some without. Some ends for '/', some not. For example:

|http://www.example.com

|http://example.com

|http://example.com

|http://example.com/

and other problems can be with https or http.

I know that I should change address before saving to database, but what standard I should use?

A: 

I would use http://domain.com. Wether standard you choose, just stick with it throughout your code.

Femaref
+2  A: 

Well, you can't necessarily treat http://www.example.com and http://example.com as the same site, because they could serve up different content (although it would break a lot of peoples' expectations)

Similarly http:// and https:// addresses shouldn't be considered to point to the same content. If the server was set up correctly, duplicate URLs would have a canonical redirect set up to point one to the other. If the server isn't set up correctly it will be very difficult to tell whether the duplication is by design or accident.

The best approach would be to follow any URL you're given and see if it redirects to another. Whatever happens, use the URL you end up at after any redirects.

Gareth
So, Have I save prefix if it is in addredd?
Thomas
A: 

I think the shortest one, for example domain.com. But if you have ftp://domain.com, you need to add to your DB additional protocol columns.

mosg
yes, but I think it is too short :)Will be some problems if I display some address (www.domain.com) of popular website without 'www.' prefix?
Thomas
in my opinion, the good practice by system administrators, when they create domain names for there sites, is to make equal www.domain.com and domain.com
mosg