views:

144

answers:

8

Sites like cnn.com or foxnews.com.
Where do they store all the articles? In html files? In database?
More logically to store everything in DB but how to generate a static link to something that is inside DB?

It's not that they have a a dynamic page load like: LoadArticle.aspx?ArticleID=123, every article has it's own address.

Please explain how this is done.

+4  A: 

They use a special content management library called VoodooLib.dll.

Seriously, when you write something to a database, you normally generate some kind of unique identifier - 123, for example. It gets permanently associated with that record (article content). After that it is used to generate the same id as part of an Url at any time later.

As for the static link, it is a simple matter of Url Rewriting.

You generate static links to display on a page because they work much better for SEO. When a request for that static Url hits the server, it gets substituted for something "server friendly" and then gets to be processed.

User
A: 

Look at the URl of this page, it doesn't have xxx.aspx?some-query-string

You are refering to using friendly URLs.

To do something like that, one common way is to use URL Rewrite and/or some custom HTTPModule

Here's a good reference: http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx

o.k.w
A: 

Just because a page has a normal URL does not mean that it isn't serving dynamic content. With the Apache mod_rewrite module, it is possible to manipulate URLs. So, for example, a page like http://www.domain.tld/permalink/12345/message-title-slug can be converted internally to http://www.domain.tld/permalink/index.php?id=12345&slug=message-title-slug.

I do not know exactly what cnn.com and foxnews.com use, but I would bet that they use a Content Management System (CMS) which serves all pages dynamically, with the content stored either in a database or on the filesystem, and with authoring/publishing all being performed through the particular CMS.

Michael Aaron Safyan
+1  A: 

They probably use some form of Content Management System (CMS). There are many different ones out there - most store the actual content in a database or as XML (some store XML in a database). They will the either publish that content as static HTML pages or, more commonly now, as dynamic pages that are cached. Many use what are known as "friendly URLs" that are virtual addresses that are mapped to the actual physical file path using URL-rewriting techniques.

Note you can't tell whether a page is dynamic or static simply from the extension. It is quite possible to have dynamic pages that end in the .html extension.

Dan Diplo
+1  A: 

Just because the URL looks "static" doesn't mean it is; they could be using something like mod_rewrite or an IIS ISAPI to make the URLs more search engine friendly.

For the high-volume news sites that you mention, however, they may very well generate the pages statically in order to prevent overloading the database with repeated requests for the same article.

Ken Keenan
A: 

Just checking cnn.com, the article links have in them

  • Year
  • Location (US or WORLD/specificlocationid)
  • Month
  • Day
  • Article name.

All of this information together can be used to uniquely identify any article (even less of it is probably actually needed). The dynamic content loading page address could easily be hidden by some method of URL rewriting, and then the information in the requested URL is used to determine which article in the DB is to be served up.

Sean Nyman
A: 

I don't know why all the other answerers seem to assume that some form of URL rewriting is necessary to create friendly URLs. It's not true at all.

It's perfectly possible to write web serving code that splits a URL into parameters - eg year, month, title - and pass that directly to the code that gets the content from the database, without any need to rewrite the URL. Most modern web frameworks such as Django and Rails include this functionality out of the box.

Daniel Roseman
A: 

This is done through mod-rewrite techniques.

Here's an article about the mod rewriting engine: http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html

And here's their "guide": http://httpd.apache.org/docs/2.0/misc/rewriteguide.html

I hope that helps. It should make for a good starting point. Goodluck.