Apache Kafka Partition

Till now we understand about how Apache Kafka topic's work and how its replication factor work, but the real question is that how topic is managed in Partition's.

In this post we are going to understand workflow of Kafka Partition's and its workflow.

Apache Kafka topic partitions:

Apache Kafka breaks topic logs up into partitions. a record is stored on a partition usually by record key if the key is present and round-robin if the key is missing (default behavior). the record key, by default, determines which partition a producer sends the record.

kafka uses partitions to scale a topic across many servers for producer writes. also, kafka also uses partitions to facilitate parallel consumers. consumers consume records in parallel up to the number of partitions.

the order guaranteed per partition. if partitioning by key then all records for the key will be on the same partition which is useful if you ever have to replay the log. kafka can replicate partitions to multiple brokers for fail-over.



Kafka  partition:

Kafka maintains record order only in a single partition. A partition is an ordered, immutable record sequence. 

kafka continually appended to partitions using the partition as a structured commit log. 

Records in partitions are assigned sequential id number called the offset. the offset identifies each record location within the partition. topic partitions allow kafka log to scale beyond a size that will fit on a single server. 

Topic partitions must fit on servers that host it, but topics can span many partitions hosted on many servers. also, topic partitions are a unit of parallelism - a partition can only be worked on by one consumer in a consumer group at a time. 

Consumers can run in their own process or their own thread. if a consumer stops, kafka spreads partitions across the remaining consumer in the same consumer group. 

Kafka Partition Replication:

Leaders handle all read and write requests for a partition.followers replicate leaders and take over if the leader dies. kafka uses also uses partitions for parallel consumer handling within a group. kafka distributes topic log partitions over servers in the kafka cluster. each server handles its share of data and requests by sharing partition leadership.

Some Key points:

  • An isr is an in-sync replica. if a leader fails, an isr is picked to be a new leader.
  • A partition can only be used by one consumer in a consumer group at a time. if you only have one partition, then you can only have one consumer.
  • Leaders perform all reads and writes to a particular topic partition. followers replicate leaders.
  • If a consumer in a consumer group dies, the partitions assigned to that consumer is divided up amongst the remaining consumers in that group.
  • If a broker dies, then kafka divides up leadership of its topic partitions to the remaining brokers in the cluster.

Post a Comment

0 Comments