ZooKeeper, Distributed Process Coordination
Tagged as EN · book · review · java
Written on
ZooKeeper is a component that facilitates building distributed applications. It is:
- a distributed hierarchical key value store,
- chooses C(onsistency) and A(availability) in the CAP theorem,
- works best on read-dominated workloads (< 10% writes),
- keeps content in the memory of each instance, and
- expects the data stored on each node (key) to be small (maybe several KiB).
The data managed by ZooKeeper is presented in a file system like manner with directories and files whose names get separated by slashes (/). The difference to a file system is, that you can store information in the directories as well. Or seen differently: directories are files at the same time. Based on this simple abstraction, users of ZooKeeper can implement things like leader election in a cluster of software instances.
The book by Flavio Junqueira and Benjamin Reed
The book is written by two experts of ZooKeeper, that know how it works internally and what are the pitfalls in which the users can trap. Flavio Junqueria is one of the ZooKeeper's contributors. Benjamin Reed helped to start ZooKeeper.
I was reading the book, because I is the basis for other distributed software systems I made myself familiar with the last months, including Akka and Mesos. I always think, that it's a good idea to know at least one layer below the layer I am actually using. I allows me to understand better what I'm doing and how to do it right.
The book starts by giving an overview of the concepts and basics used by ZooKeeper. It introduces an example master-worker application, that is implemented using different languages afterwards:
- First it gets implemented using the command line client of ZooKeeper in a shell-script like way.
- Then it gets implemented using ZooKeeper's Java-Client-API.
- Afterwards a third implementation is made using the C client API.
- As a fourth implementation it gets implemented using Curator, a high level API for ZooKeeper.
Other topics discussed in the book are:
- Common errors in using ZooKeeper (and how to prevent them)
- How ZooKeeper works internally (with references to the source code and the protocol used by it).
- How to administrate a ZooKeeper cluster (configuration, what kind of hardware is required).
I think, that this book is a highly valuable resource for anybody working with ZooKeeper either directly or indirectly by using some other software, that uses it.