Hi all,
I wondering, how those "text-heavy" websites, like stackoverflow.com, news.yahoo.com, bbc.co.uk store their text internally?
Are those texts stored as text files? or stored in database?
How do they cope with the ever-increasing content?
Hi all,
I wondering, how those "text-heavy" websites, like stackoverflow.com, news.yahoo.com, bbc.co.uk store their text internally?
Are those texts stored as text files? or stored in database?
How do they cope with the ever-increasing content?
Usually in a database. When using mysql, they usually use text, mediumtext or longtext. They are in a table together with columns like date, rating and tags. Tags might be in another table, but could also be separated by comma's in one table. It can differ per site.
It is always some sort of database behind those sites.
For relatively small ones (stackoverflow, bbc etc) a normal relational database should be enough - like MySQL, Postgresql or Oracle.
For the really big ones (ebay, amazon, google) they usually have some sort of proprietary database solution because standard databases can't handle that kind of load.
Most of these sites will use a content management system that will store the text in a database. Stackoverflow lets us all edit the content, while sites like the BBC only allow their journalists to actually add and edit content on their system.
Most of these sites will use some kind of markup language to encode the styling into the text. You can learn up the markup system used by StackOverflow here: http://stackoverflow.com/editing-help
Why use a markup language, instead of just storing the html? The markup allows the text to be converted in different ways for different outputs and devices. You might convert to html for display on a web page, but use a different conversion for emails and another one for certain mobile devices.
Search google for Markdown and Textile for examples. You can also look at Wikipedias information.
How do they cope with the ever-increasing content?
When the database load gets too heavy, they'll have to get a separate database server, and if that is not enough, more or less complex load-balancing setups are needed ;)
Using a database would be the better approach:
BBC uses Drupal, an open source content management system. I would think most of the papers use some form of commercial CMS, like Vignette. All of these CMSes store the text in a database and offer clients an easy way to add text. Take a look at Drupal.org for examples - Drupal is also used by theonion.com and other papers.