Introduction to Paxos consensus protocol

Suman Shil
3 min readApr 16, 2022

One of the most interesting and difficult problem in distributed computing is achieving consensus in an unreliable network. A distributed system consists of multiple nodes or hosts which are connected by network. Network is not reliable. We can not make assumption about when and how a message will be delivered over network or it will be delivered at all. We also can not make assumption about the state of a host. It can be unavailable, disconnected or just slow to respond. All these factors make it difficult to achieve consensus in a distributed systems. Paxos protocol was published by Leslie Lamport to achieve consensus in a unreliable distributed systems.

Replicated state machine

The goal of Paxos is to implement a replicated state machine. State machine is a program which takes an input and produces output. The program also maintains an internal state. Replicated state machine ensures reliability by running the program in multiple hosts. At the heart of replicated state machine is replicated log. An example of a replicated state machine is a distributed key value store which takes key as an input and produces value as output.

Replicated Log

As the name indicates log is replicated across servers. Replicated state machine achieves reliability and consistency by replicating client commands in the same order in all the instances of logs. If the same commands are executed in the same order by the programs in different servers, all the programs will be in the same state even though running in different hosts. It also achieves better availability because if some of the servers crash, rest of the servers will continue to work as they maintain same state.

Components of a replicated state machine

Each server in a replicated state machine has following components

  • State machine: Application that maintains state.
  • Log: Append only storage that stores the client commands.
  • Consensus module (Paxos): Ensures commands are replicated in logs across servers.

Workflow

Lets say we have a cluster with three hosts server A, server B and server_C.

  • Client sends a command to server A which will be processed by consensus module running in the server.
  • Server A appends the command in local log and sends the command to server B and server C.
  • server B and server C append the command in their respective log files.
  • Once the append operation is completed, server B and server C notify server A.
  • Once Server A gets acknowledgement from all the servers, it applies the command in it’s local state machine. It also sends a notification to server B and server C to apply the command in their state machine.
  • Server B and Server C apply the command in their state machine and notify Server A.
  • Server A sends response to client about completion of the operation when it receives notification from the other servers.

Failure Model

Paxos works in a fail-stop environment which means servers can stop working but when they are up and running, they don’t behave maliciously. This failure model is different from byzantine failure model where the servers can behave differently from it’s specification.

Conclusion

Paxos is the most popular consensus algorithm. It is more relevant in modern times as the organizations are moving towards distributed architecture to address big data, availability, reliability challenges. Spanner is an example of a system which uses Paxos. In subsequect articles I will discuss Paxos in more details. Stay tuned.

--

--

Suman Shil

Software developer, Father, Optimistic, Eternal learner