Machinations


Consensus

Consensus[1] is arguably one of the most fundamental problem in distributed computing.  The basic idea of the problem is: n players each vote on one of two choices, and the players  want to hold an election to decide which choice “wins”.  The problem is made more interesting by the fact that there is no leader, i.e. no single player who everyone knows is trustworthy and can be counted on to tabulate all the votes accurately.

Here’s a more precise statement of the problem.  Each of n players starts with either a 0 or a 1 as input.  A certain fraction of the players (say 1/3) are bad, and will collude to thwart the remaining good players.  Unfortunately, the good players have no idea who the bad players are.  Our goal is to create an algorithm for the good players that ensures that

  • All good players output the same bit in the end
  • The bit output by all good players is the same as the input bit of at least one good player

At first, the second condition may seem ridiculously easy to satisfy and completely useless.  In fact, it’s ridiculously hard to satisfy and extremely useful.  To show the problem is hard, consider a naive algorithm where all players send out their bits to each other and then each player outputs the bit that it received from the majority of other players.  This algorithm fails because the bad guys can send different messages to different people.  In particular, when the vote is close, the bad guys can make one good guy commit to the bit 0 and another good guy commit to the bit 1.

The next thing I want to talk about is how useful this problem is.   First, the problem is broadly useful in the engineering sense of helping us build tools.  If you can solve consensus, you can solve the problem of how to build a system that is more robust than any of its components.  Think about it: each component votes on the outcome of a computation and then with consensus it’s easy to determine which outcome is decided by a plurality of the components (e.g. first everyone sends out their votes, then everyone calculates the majority vote, then everyone solves the consensus problem to commit to a majority vote.)   It’s not surprising then that solutions to the consensus problem have been used in all kinds of systems ranging from flight control, to data bases, to peer-to-peer; Google uses consensus in the Chubby system; Microsoft uses consensus in Farsite; most structured peer-to-peer systems (e.g. Tapestry, Oceanstore) use consensus in some way or another.  Any distributed system built with any pretension towards robustness that is not using consensus probably should be.

But that’s just the engineering side of things.  Consensus is useful because it allows us to study synchronization in complex systems.  How can systems like birds, bees, bacteria, markets come to a decision even when there is no leader.  We know they do it, and that they do it robustly, but exactly how do they do it and what is the trade-off they pay between robustness and the time and communication costs of doing it?  Studying upper and lower bounds on the consensus problem gives insight into these natural systems.  The study of how these agreement building processes, or quorum sensing, occur in nature has become quite popular lately, since they occur so pervasively.

Consensus is also useful because it helps us study fundamental properties of computation.  One of the first major result on consensus due to Fischer, Lynch and Patterson, in 1982, was that consensus is impossible for any deterministic algorithm with even one bad player (in the asynchronous communication model).  However, a follow up paper by Ben-Or showed that with a randomized algorithm, it was possible to solve this problem even with a constant fraction of bad players, albeit in exponential time.  This was a fundamental result giving some idea of how useful randomization can be for computation under an adversarial model.  As a grad student, I remember taking a class with Paul Beame telling us how impressed he was by what these two results said about the power of randomness when they first came out.  Cryptography was also shown to be useful for circumventing the Fischer, Lynch and Patterson result, and I’ve heard of several prominent cryptographers  who were first drawn to that area at the time because of its usefulness in solving consensus.

In the next week or two, I’ll go into some of the details of recent results on this problem that make use of randomness and cryptography.  Early randomized algorithms for consensus like Ben-Or’s used very clever tricks, but no heavy duty mathematical machinery.  More recent results, which run in polynomial time, make use of more modern tricks like the probabilistic method, expanders, extractors, samplers and connections with error-correcting codes, along with assorted cryptographic tricks.  I’ve been involved on the work using randomness, so I’ll probably start there.

[1] The consensus problem is also frequently referred to as the Byzantine agreement problem or simply agreement.  I prefer the name consensus, since it is more succinct and descriptive. While the research community has not yet reached a “consensus” on a single name for this problem, in recent years, the name consensus is being used most frequently.

Advertisements

7 Comments so far
Leave a comment

When you mention impossibility of deterministic solutions, you may want to be clear that you are talking the asynchronous case.

Enjoying your blog!

Comment by Jonathan Katz

Good point. The FLP result holds only for the asynchronous case. Thanks for the positive feedback!

Comment by Jared

[…] Consensus « Machinations (tags: decision-making concurrency complexity emergence) […]

Pingback by links for 2009-08-22 « Blarney Fellow

It might be worth clarifying the assumptions you’re making about the failure modes and synchronicity of the distributed system. For example, most of your discussion is about consensus under the assumption of Byzantine fault tolerance. How practical is that assumption? The original formulation of Paxos doesn’t handle Byzantine failure, for instance (although it can be extended to do so). As far as I know, it is rare for practitioners to account for the possibility of Byzantine failure.

Comment by neilconway

Neil, this is a good point. When writing this post, I mostly had in mind consensus with Byzantine failures in the asynchronous model since this is the problem I’ve spent a while working on, but I did not make this clear. While it is true that many applications of consensus focus just on fail-stop faults, there is increasing interest in Byzantine faults since these are important for web based systems that are not strictly client server, and where there is no admission control. For example, the Farsite system uses a consensus protocol that tolerates Byzantine faults (at least the last time I visited Douceur’s group at MSR, this is what they told me). I’m not 100% sure but I believe that Tapestry and Oceanstore also deal with Byzantine faults since they are peer-to-peer systems. If they don’t, it is a weakness. I don’t know anything about Google’s chubby system, but I’d guess that it does *not* tolerate Byzantine faults, since there is admission control, and so this added security is unnecessary.

You might also find the following paper interesting, it’s not written by practitioners, but it does give a very interesting systems-y result on performing consensus with Byzantine faults: http://www.cs.utexas.edu/~lorenzo/papers/kotla07Zyzzyva.pdf

Comment by Jared

Can you please discuss more how consensus is used by Google and Microsoft, and in P2P ? – In another, can you expand more on the applications of the consensus problem ?

Comment by Ahmed Jedda




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s



%d bloggers like this: