Monday 1 May 2023

Calculate Kafka Consumer lag programmatically

In this blog we will discuss how to programmatically calculate offset lag in a kafka consumer group for a topic partition. Below is an example of kafka-consumer-groups.sh  which is a command line tool to calculate offset lag for a consumer group.

Kafka consumer group Command output
Sometimes we don’t have access to the production Kafka environment. For such scenarios we can calculate offset Lag programmatically for a consumer group.

Sunday 1 January 2023

Distributed Job Scheduler Using Redisson

Job Scheduling is one of the main components required for every kind of business unit.Supporting Job scheduling in a distributed and scalable environment is one of the challenging problems to solve. Redis provides multiple solutions for distributed and scalable systems with the help of unique data structures implementation. In this blog we will discuss how to implement a distributed Job Scheduler using Redis via Redisson with a working spring boot project example.

Wednesday 21 December 2022

Apache Storm - An Introduction

  Apache Storm is an open-source real-time solution for data stream processing. It accepts huge amount of data coming in extremely fast manner, can be from multiple sources, analyse it, and publish Real Time updates to some data source without storing any actual data. It is highly available for parallel execution, scalable, and fault-tolerant. It is generally used for real-time analytics, machine learning, and unbounded stream processing. Let's try to understand its basic terminology.

Apache Storm Basics

Topology : The logic for a realtime application is packaged into a Storm topology. A Storm topology is analogous to a MapReduce job. A topology is a graph of spouts and bolts that are connected with stream groupings.

Saturday 24 September 2022

Cassandra internal architecture

Apache Cassandra is a NoSQL distributed database which can handle large amounts of data across multiple commodity servers to support highly available system in a distributed environment with no Single point of failure.

In this blog I will try to explain Cassandra basic architecture and working. I will also try to explain why it was designed in such a way and what are few best fit application usage examples.

Sunday 21 August 2022

Master-master vs master-slave database architecture

In this blog we will understand single copy, master-slave and multi-master database architecture. We are going to understand various pros and cons of each architecture with some examples.

Database without Replication (Single Copy)

In this architecture one standalone database server is used for all read and write DB operations from the application.