Saturday 24 September 2022

Cassandra internal architecture

Apache Cassandra is a NoSQL distributed database which can handle large amounts of data across multiple commodity servers to support highly available system in a distributed environment with no Single point of failure.

In this blog I will try to explain Cassandra basic architecture and working. I will also try to explain why it was designed in such a way and what are few best fit application usage examples.

Cassandra features

Let's try to understand what problems Cassandra wanted to solve for a distributed high scale system.

(Q) What were the problems that applications were facing while horizontally scaling with RDBMS systems ?

  • Data is normalised in 3NF form into multiple tables so sql Joins make query slower.
  • Master slave architecture is not Highly Available
  • Sharding in relational DB is very difficult. Schema design and code queries have to take care of shard key. Data movement between shards is done manually.

(Q) What lessons Cassandra learnt from RDBMS systems ?

  • Consistency is not practical so Cassandra gave it up in CAP.
  • Manual Sharding and Rebalancing is Hard. Cassandra will take care of that, developers don't have to think about adding/removing nodes. Adding/removing a node in cassandra cluster ring has no downtime.
  • Master slave is complex to manage, so cassandra used ring of master nodes.
  • Scaling up is expensive , so go with horizontal scaling and use commodity hardware.
  • Scatter/Gather is not good. Denormalize data for real time query performance. Goal is to hit 1 machine per query.

Cassandra solved many problems of high scale distributed systems by implementing various unique features. Let’s try to ponder few important features of cassandra :

  1. Data is in denormalized form, there are no joins.
  2. Tables are created in a way that supports queries. That means you first decide all your operational queries then you design your table structure.
  3. Masterless architecture or ring of master nodes (No Master-slave , no rebalance, no re-election of master)
  4. Sharding is implemented automatically on the basis of partition key
  5. Supports multi Datacenter

Cassandra Table examples

Let us consider a cassandra table where we want to store the population of all the cities in the world. You need to create a select query to provide country and city name and it will return its population count.

CREATE TABLE population_detail (
    country text,
    city text,
    population int,
    PRIMARY KEY ((country), city)

  • ((country), city)  is primary key
  • country is partition key
  • city is clustering key
  • population is non primary key

Partition key is used to calculate shard key and distribute your data across all shards/partitions.

Clustering key is used to sort your data within a single shard/partition.

So our final query will be like this : 

Select population from population_detail where country = 'USA' and city = 'New York';

Cassandra Shard nodes in a Ring
Cassandra Shard nodes in a Ring

Cassandra Shard key partitioning
Partition key hash to calculate node in the Ring

As you can see in above image, data is distributed across all the nodes on the basis of partitioner token which is calculated via partition key. You can also see how ring of nodes store data in a master master architecture.

Let’s take one another example where we want to collect customer shopping data in a Shopping mall. Our query can be like : fetch customer shopping amount from a shop on each day.

CREATE TABLE shopping_detail (
    customer_id text,
    shop_id text,
    year int,
    month int,
    date int,
    amount counter,
    PRIMARY KEY ((customer_id, shop_id), year, month, date)

  • ((customer_id, shop_id), year, month, date)  is primary key
  • (customer_id, shop_id) is partition key
  • (year, month, date) is clustering key
  • amount is non primary key

Let’s try to find some valid CQL queries for this data model. One general rule for Select query is : Cassandra query should contain all the partition keys, as it requires to be executed only on one single shard. So in the above example (customer_id, shop_id) combination is a must.

Valid CQL queries : 

select * from shopping_detail where customer_id='123' and shop_id='S1';
select * from shopping_detail where customer_id='123' and shop_id='S1' and year=2022;
select * from shopping_detail where customer_id='123' and shop_id='S1' and year=2022 and month=7;
select * from shopping_detail where customer_id='123' and shop_id='S1' and year=2022 and month=7 and date=15;

Invalid CQL queries :

(1) select * from shopping_detail where customer_id='123';
(2) select * from shopping_detail where shop_id='S1';
(3) select * from shopping_detail where customer_id='123' and shop_id='S1' and month=7 and date=15;

Query number (1) and (2) are invalid as the query does not include all partition keys.

Query number (3) is invalid because to add month and date column in the query you must require to add year column also as sorting order of data in the partition node is first on year and then on month and then on date. That means sorting of data is like (year, month, date).

Let’s also understand limitations on the counter column of a cassandra table.

Below are few conditions that you need to keep in mind while taking a Counter column:

  • It cannot be assigned to a column that serves as the primary key.
  • All non-counter columns must be part of the primary key.
  • A counter column cannot be indexed or deleted.
  • Use update command to decrease or increase its value.

Example of some valid or invalid counter columns :

(col1), col2, col3_counter    // invalid

((col1), col2), col3_counter  // valid

(Q) What learning we got from the above two table examples ?

  1. Cassandra table structure should be designed only when all your operational queries are understood and finalised. Once you are done with table structure and data is added to it, then you cannot modify it.
  2. Partition key and Clustering key pattern is crucial for your operational queries.
  3. Sharding is done automatically on the basis of the partition key. Data in the cassandra table is hashed and partitioned across all nodes in a Ring.
  4. Clustering key can be used to decide the sorting order of your data in a single partition.
  5. Every read and write query will hit only one shard to calculate or save the result. No Joins or no Scatter-Gather.

Above learning can help us to decide when to use Cassandra database over other database solutions.

There are few other important factors that you should know while considering cassandra as a database solution in your application and they are Read and Write operations with replication and consistency. Let’s try to figure them out.

Replication and Consistency :

Data in cassandra is replicated automatically on a count of Replication Factor (RF). Synchronous data replication depends upon write consistency level (CL) value. For example, if RF = 3 and write CL = 2 , then 2 writes are done synchronously and 3rd one is written asynchronously.

As you can see in the above diagram, how replication is done with RF = 3 and CL = quorum values. Here the coordinator node is acting as a proxy between the data node and client. It is responsible for managing the entire request path and to respond back to the client.

Reads and Writes to cassandra ring nodes depend on read CL and write CL values. CL values represent how many replica nodes for a query to respond OK.

How to decide values of RF, read CL and write CL based on your application requirements of consistency and availability :

  1. Low value of CL is less consistent and highly available
  2. High value of CL is highly consistent and less available

Let’s take few example values to understand above rules.

  • When write CL = ALL and read CL = ALL 

 Very Strong consistency

  • When write CL = quorum and read CL = quorum

Strong consistency (good for production environment)

  • When to use read CL = 1 and write CL = 1

-> When you want faster reads and writes

-> Your data is not so important with real time (eventual existence) for example : logs, events

Read and Write path of Cassandra queries

Writes are written to any node in the cluster which is called a coordinator node. Coordinator node decides where to store that data in the cluster. When a write occurs to a node, Cassandra stores the data in a structure in memory, the memtable, and also appends writes to the commit log on disk. Commit log is an append only data structure. After writing to commit log and memtable, it returns OK to the client and that is why writes are fast.

Every write includes a timestamp. All memtables flushes to disk periodically to sstable part. As all memtable writes to disk in one go, it is relatively fast. SStable and commit logs are immutable.

Below are few salient features of Sstable in cassandra :

  • They have immutable data files for row storage
  • Every write operation has a timestamp when it was written
  • Compaction merges sstables into one sstable
  • When one column has data in two different sstables, then it takes the latest by timestamp
  • Easy to take backups as every write has a timestamp

Reads in cassandra work in a similar way as writing work. Read query can be fired on any node acting as a coordinator node. Coordinator node contacts all nodes with requested key as shard key. As compaction is running on sstables, so for a read query on a single node,

It has to check multiple sstables within the same node and merge all records in memory and then get the latest record by timestamp. When consistency < ALL, then cassandra performs read repair in background (read_repair_chance).

From the above explanation you can easily understand why cassandra writes are relatively faster than read operations. Understanding cassandra Read and Write path with RF and CL values you can easily decide how cassandra should work in the production environment of your application to fit with consistency and availability requirements.

Some important links for reference :




1 comment:

  1. The video is very much self explanatory and images are perfect visual aids for the concepts explained.