views:

256

answers:

5

So I've been noticing some strange results in how google peruses our site. One issue is that a url such as this:

http://example.com/randomstring

is showing up on google with all of the data of

http://example.com/

So in my mind there are two solutions. One is to add a 301 redirect whenever someone visits a sub-url of the main one, and redirect them to the parent URL, or just give a 404, with a nice message saying, "Maybe you meant parent-url".

Thoughts? I'm pretty sure I know where I want to send them, but what is the proper web-etiquette? 404 or 301?

A: 

If you know what URL they should go to, that's exactly what 301 is for.

Alex Martelli
+6  A: 

The correct http way would be a 404, as long as a request is made to something that doesn't exist.

301 is for something that is moved, which is not the case here.

However, 100% correct http convention is rarely followed today. Depending on the context it could be useful to redirect the user to the home page with a notification that the page wasn't found and that they were redirected. Though in this case you should use a 303 See Other code.

You should never redirect without letting the user know that a redirect happened, though. That confuses the user to think that maybe something is wrong.

Tor Valamo
+1 for 303. Not utilized enough today.
womp
@womp - I agree. Correct HTTP response codes can contain a lot of information. I really dislike when people send a 200 OK response and then an html page showing a 404.
Tor Valamo
Though there are sites that do a redirect-to-home for a resource that doesn't exist and never existed, this is really bad practice and causes many problems. Don't do it. For example the user agent might not be a browser asking for a page; it might be a tool trying to get a ‘special’ file like robots.txt, favicon.ico, crossdomain.xml or any number of other more- or less-well-known fixed addresses. Give that tool an HTML page in response (either directly via 200 or indirectly via 30x) and you have one messed up tool trying to treat HTML as some other type.
bobince
+2  A: 

I'd say a 404 is the right thing to do, as there never was a meaningful resource at the location, so nothing has "moved permanently" (which is the meaning of 301) and the client needs to know their URL was faulty and has not just changed in the meantime.

But I don't quite understand yet what the issue is. Is Google hitting your site with random URL requests? That would be odd. Or is it that your site is showing the same results for domain.com/randomstring as for domain.com/index.html? That you should change, methinks with a 404.

Pekka
Sorry for the misunderstanding, the bug I'm looking at here is the later case you mentioned, how that url is initially being arrived at I still need to look at.
icco
Then definitely a 404. A 301 would be plain wrong.
Pekka
+1 301 is wrong. Just because some sites in the wild do it doesn't make it right. 404 is not just correct, it is better in every way than 301 for a non-existant resource. There is no downside to 404-with-pretty-error-page-in-request-body.
bobince
A: 

So are you saying that your site is doing redirects without your control?

When you want to use a 301 (permanent redirect) is when that page originally existed but has moved somewhere else. It's a "Change of Address Card". Huge lifesaver when restructuring a site. If the page is just some wacky random URL, then passing a 404 tells spiders (and humans too but people do this less) that this page never existed so don't keep coming back and wasting my web-servers time. Some people disagree with this because they never want their users to see a 404 page. I think these codes were developed for good reason and are used pretty well by Search Engines.

Passing either of these status codes does not prevent you from serving "friendly pages" (although a 301 will typically just redirect you if the browser allows).

The thing to remember is that Google doesn't like duplicate content, so you want to make sure that your site does not appear to be serving the same content with different URL's.

zenWeasel
+3  A: 

The already posted answers cover your question nicely but I thought there may be some value in going to the source: rfc 2616

10.3.2 301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.

The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).

If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

Note: When automatically redirecting a POST request after receiving a 301 status code, some existing HTTP/1.0 user agents will erroneously change it into a GET request.

10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

Of course, with these things it tends to be that the common usage takes precedence over the actual text of the RFC. If the entire world is doing it one way, pointing at a document doesn't help much.

David Hall
Cool thanks! This definitely clarifies it that I need to use a 404.
icco