views:

93

answers:

8

I'm guessing a site like stack overflow doesn't keep an html file around for every question ever asked. Instead, server-side code creates the page every time a question is clicked on(I think). Is it possible for search engines to index every quesiton on Stack Overflow, or would a page-per-question need to be kept in the directory so the search engine can crawl it?

A: 

Yes it's perfectly possible - when a link is followed the server returns HTML just like any other web page. The only difference is that the server generated it, rather than a person.

Mark B
+4  A: 

Yes. Search engines can index dynamically generated pages no problem. In fact, from the search engine bot's perspective, it can't really even distinguish between a dynamically generated page and a static one.

Asaph
A: 

As far as the client (be it a browser or search engine) is concerned, there is no difference between a server-generated page and a static file. They're virtually indistinguishable (depending on how the page is generated, it might be missing Last-Modified headers, etc). As such, yes, search engines can index generated pages without a problem.

That said, there is something to be said for giving them a hint. Using sitemaps, for example, gives a search engine a nice listing of all your pages, so it's less likely to miss them. More importantly, it can summarize last modified times, to focus the search engine's attention on what has changed recently. This isn't mandatory, but it does help - regardless of whether the pages are static HTML or generated.

bdonlan
A: 

Any link that uses a GET can be followed by most crawlers. Anything that requires a POST will generally be ignored.

The mechanism for generating the page is irrelevant.

chris
A: 

yes if this is not restricted by robot.txt or meta tags.Search engine requests web page like normal user,no one have access to server side code(if your site isn't hacked))

SomeUser
A: 

Search engines can see pretty much anything on a given Web page that isn't hidden behind client-side code (i.e., JavaScript).

So, if there's a URL that you can enter into your browser's address bar to get this page, and this page is linked to from somewhere, a search engine will find it and "see" the same content that you do. The fact that the page was generated dynamically by a server is irrelevant to a search engine, since what is sent to a browser upon requesting a URL is still just an HTML file.

In other words, that HTML file doesn't exist in the same form on the server - i.e., it's actually some server-side code that generates HTML, not a static HTML file - but that's not what a search engine is crawling though and indexing, rather links to document URLs that are exactly what you see in your browser's address bar.

Bungle
+1  A: 

You might be interested by the Dynamic URLs vs. static URLs post on the Official Google Webmaster Central Blog.

Pascal Thivent
A: 

I don't want to repeat the same answer the fourth time, so I'm saying: NO. That's not possible. Search engines can sniff server generated HTML by detecting inline whitespace caused by server side tags with Bayesian approximation and forcefully ignore them.

The reason search engines ignore server-side generated content is simple: Anything a software program generates is a mere output of a mathematical algorithm which does not have any informative value therefore does not deserve to be served to human eye. Besides a server distributes such content without charging anything to clients and without differentiating among them which means it is a byproduct of communism. Any civilized search engine would treat that unwelcome.

ssg
I like the downvotes, I hear the cracking sounds from rigidity and stiffness of stackoverflow community.
ssg