



When experimenting with Cassandra I've observed that Cassandra writes to the following files:


The general structure seems to be:


What is the Cassandra file structure? More specifically, how are the data, commitlog directories used, and what is the structure of the files in the data directory (Data/Filter/Index)?

+7  A: 

A write to a Cassandra node first hits the CommitLog (sequential). (Then Cassandra stores values to column-family specific, in-memory data structures called Memtables. The Memtables are flushed to disk whenever one of the configurable thresholds is exceeded. (1, datasize in memtable. 2, # of objects reach certain limit, 3, lifetime of a memtable expires.))

The data folder contains a subfolder for each keyspace. Each subfolder contains three kind of files:

  • Data files: An SSTable (nomenclature borrowed from Google) stands for Sorted Strings Table and is a file of key-value string pairs (sorted by keys).
  • Index file: (Key, offset) pairs (points into data file)
  • Bloom filter: all keys in data file
+1: Great answer! Thanks!
Thanks. The Cassandra wiki is a good place to start if you want to have a more in-depth understanding/description about terminology and nomenclature used in Cassandra