Is it possible for etcd to have a split-brain scenario?
What is 𝘀𝗽𝗹𝗶𝘁-𝗯𝗿𝗮𝗶𝗻 𝘀𝗰𝗲𝗻𝗮𝗿𝗶𝗼?
Split brain scenarios occur when a group of nodes in a distributed system loses communication with each other primarily due to network partitioning, resulting in inconsistent or conflicting system states.
etcd is designed to avoid split-brain scenarios, as it relies on a leader election mechanism to ensure that only one node is active and in control of the cluster at any given time.
The official documentation says, there is no “split-brain” in etcd. Here is why.
✅ A network partition divides the etcd cluster into two parts; one with a member majority and the other with a member minority.
✅ The majority side becomes the available cluster and the minority side is unavailable
✅ If the leader is on the majority side, then from the majority point of view the failure is a minority follower failure.
✅ If the leader is on the minority side, then it is a leader failure.
✅ The leader on the minority side steps down and the majority side elects a new leader.
✅ Once the network partition clears, the minority side automatically recognizes the leader from the majority side and recovers its state.
So how does the leader know weather its in the majority or minority?
etcd nodes regularly send “heartbeats” to each other.
If the leader is in the minority part of the split, it will not receive acknowledgments from the majority of the nodes.
When the leader doesn’t get enough responses to its heartbeats, it realizes that it might be in the minority.
To maintain the integrity of the system, it steps down from its leadership role.
On the other side of the partition, where the majority of nodes are, they also notice they’re not getting heartbeats from the leader. Since they are the majority, they can elect a new leader among themselves.
PS: ♻️ Repost if you find this useful. It helps the DevOps community 🙂
Have you faced split brain scenario before?
⬇️ Discuss in the comments below! ⬇️