tags:

views:

507

answers:

3

What is a cluster in RDBMS?Thanks.

+1  A: 

From here:

High-availability clusters (also known as HA Clusters or Failover Clusters) are computer clusters that are implemented primarily for the purpose of providing high availability of services which the cluster provides. They operate by having redundant computers or nodes which are then used to provide service when system components fail. Normally, if a server with a particular application crashes, the application will be unavailable until someone fixes the crashed server. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as Failover

Mitch Wheat
A: 

In SQL, a cluster can also refer to a specific physical ordering of rows.

For example, consider a database with two tables: INVOICES and INVOICE_ITEMS. If many INVOICE_ITEMs are inserted concurrently, chances are that items of the same invoice end up on multiple physical blocks of the underlying storage. When reading such an invoice, unneeded data will be read together with the interesting rows. Clustering INVOICE_ITEMS over the foreign key to INVOICES groups rows of items the same invoice together in the same block, thus reducing the amount of necessary read operations when accessing the invoice.

Read about clustered index on wikipedia.


In system administration, a "cluster" is a number of servers configured to provide the same service, but look like one server to the users.

This can be done for performance reasons (two servers can answer more requests than a single one) or redundancy (if one server crashes, the others still work).

Such configurations often need special software or setup to work. Some services, like serving static web content, can be clustered very easily. Others, like RDBMS, need complicated replication schemes to coordinate.

Read about computer clusters on wikipedia.


In statistics, a cluster is a "group of items so that objects from the same cluster are more similar to each other than objects from different clusters."

Read about Cluster analysis on wikipedia.

David Schmitt
when talking about RDBMS's, clustering is almost always for availability.
Mitch Wheat
@Mitch: of course not. concepts like sharding are exact opposite of redundancy. Using DB clusters is principally about load balancing.
vartec
+1  A: 

In database context it can have two completely different meanings:

  • may either mean data clustering or index clustering, which is grouping of similar rows. This is useful for data mining, some databases (e.g. Oracle) also use it to optimize physical data organization;
  • or cluster as database running on many closely linked servers.
vartec

related questions