For understanding the Apache Kafka, we need to first understand how zookeeper actually work internally, as per my understanding till now, zookeeper is used to monitor all clients and server and its very fault tolerant, it able to handle large number of client and server.
ZooKeeper follows a simple client-server model where clients are nodes (i.e., machines) that make use of the service, and servers are nodes that provide the service.
ZooKeeper service runs on odd number of network machines. This is called ZooKeeper cluster (or ensemble ).Once running, the ZooKeeper service will choose a leader. Both leaders and followers can get connected to clients.
Zookeeper ensemble: A collection of zookeeper servers became zookeeper ensemble.
Basically each client send a ping request after few time to zookeeper server for check that its connected or not, in response zookeeper server send a acknowledgment of the ping. if client not recieve any acknowledgment in specific time its going to connect other server from zookeeper ensemble and all the session open from client is now connected with other zookeeper server.
Each Zookeeper maintain a transaction log on disk, which log all request come from the client.
Each zookeeper server sync with logs and because of that when a client connection failed from once server its connected easily to other server and start its transaction without getting and data loss, similar to MySql replica, where all events written on the bin file and in each slave read those and follow same activity on them, so all slave sync with master server.
Each zookeeper server store their data in file system, its similar to UNIX like file server. Its called znodes which is directories. The default maximum size of znode file is 1 MB.
The reason behind storing 1MB data is, all we need to store small amount of information to share between distributed application.
Zookeeper follow ACl for writing the data on znode, only those server can write data on znode which have permission to do.
Zookeeper client Read:
When a client wanted to read particular content from znodes, in that case that client send request to connected server which bind with that client, and that server in response with requested znode.
In case of concurrency each client is connected with different server so read are quick and scalable.
From a very high level think of ZooKeeper as a centralized repository where our distributed applications can store data and read data.
Reads can be from any of the ZooKeeper server nodes (whether leader or follower). But all the writes go through the leader.
ZooKeeper Data Model: we can create four types of znodes in zookeeper.
- Persistent: These nodes always exists, its only remove when someone remove it explicitly.
- Ephemeral: Session based znodes, when a client make a session with zookeeper server its created and when the session is over its automatically deleted.
- Persistent_Sequential: Znode is given a sequence number by ZooKeeper as a part of its name. The value of a monotonicaly increasing counter is appended at the end of name.
- Ephemeral_Sequential: This is a hybrid of Ephemeral and sequential nodetypes.
Watches in Zookeeper: Its inbuilt functionality of zookeeper where client ask zookeeper to notify in case:
- A znode is created or deleted.
- Data on znode is changed.
- A child is added tor deleted to znode.
0 Comments