In previous post we discussed about data stored in Topic now we need to understand that how replication factor work in Apache kafka.
In case if we increase the replication faction in that case it might be happened that due to master slave scenario its increase cpu utilization on master server, because all the slave servers try to get the new message from the slave.
What is Replication: As we know that in case of replication its create a copy on the each connected node, as already we know from zookeeper post that zookeeper manage the all event which happened on the node.
Load Balance: In production environment the first we need to take care of CPU utilization and load balance on each server. so in case of multiple replication factor its might be increase load on a single server.
We had also noticed that even without a load on the Kafka cluster (writes or reads), there was measurable CPU utilization which appeared to be correlated with having more partitions.
We had a theory that the overhead was due to (attempted) message replication – i.e. the polling of the leader partitions by the followers. If this is true then for a replication factor of 1 (leaders only) there would be no CPU overhead with increasing partitions as there are no followers polling the leaders. Conversely, increasing the replication factor will result in increased overhead.
Kafka topic partitions
kafka breaks topic logs up into partitions. a record is stored on a partition usually by record key if the key is present and round-robin if the key is missing (default behavior). the record key, by default, determines which partition a producer sends the record.
Kafka uses partitions to scale a topic across many servers for producer writes. also, Kafka also uses partitions to facilitate parallel consumers. consumers consume records in parallel up to the number of partitions.
0 Comments