Machinations


Congratulations M!
August 29, 2011, 6:13 pm
Filed under: Uncategorized | Tags: , ,

Congratulations to my student Olumuyiwa (“M”) Oluwasanmi who successfully defended his dissertation on “Practical, Scalable Algorithms for Byzantine agreement” last week!

Usually on this blog, I allow only a 1 slide dissertation summary, but today I’m making an exception for 2 slides (at least they have the same x-axis :) .  The slides below summarize the performance of 4 Byzantine agreement algorithms.  The CKS algorithm is from the paper “Random oracles in Constantinople: Practical asynchronous Byzantine agreement using cryptography“.  The remaining 3 algorithms are from M’s dissertation, and the algorithms with the RB prefix assume the existence of a random beacon.



NSF Proposal
August 18, 2011, 3:01 pm
Filed under: Uncategorized | Tags: , , ,

The process of grant writing was completely opaque to me when I was a grad student. As a faculty member, I now take the position that grant writing is intrinsically important, and that all grad students should learn something about it in order to get insight into the long-term planning process for doing research.  For this reason, it’d be nice to see more examples of funded grants publicly available.

Last January, I posted a one page project summary for one of my funded NSF proposals.  This year again I was lucky enough to have a proposal funded and I’m posting the one page summary below.  I hope that some readers find it useful.

consensus-summary



September SIGACT Distributed Computing Column
September 9, 2010, 4:20 pm
Filed under: Uncategorized | Tags: , ,

The September SIGACT distributed computing column is now available at Technion and at MIT.  This month Valerie and I contributed a survey on scalable Byzantine Agreement – Marko Vukolic considers BA for cloud computing providers.  Idit’s intro to the column is below.

“After almost 30 years of research on Byzantine Agreement (BA), the problem continues to be relevant and to re-invent itself in new ways. This column discusses two new research directions that further push the scale of BA. It suggests new domains where BA can, and perhaps should, be deployed.  First, our main contribution, by Valerie King and Jared Saia, argues for running BA in setting with a large number of nodes (or processors). Valerie and Jared survey new BA protocols whose communication complexity is scalable in the number of participating processors. This, they argue, enables their deployment in larger-scale domains for which BA was considered infeasible before. The second contribution, by Marko Vukolic, considers another emerging domain for BA. It calls for wider-scale deployment of BA protocols, not among many processors, but rather over multiple cloud computing providers.

The column ends with a short announcement about Morgan Claypool’s new monograph series on Distributed Computing Theory, edited by Nancy Lynch.

Many thanks to Valerie, Jared, and Marko for sharing their insights!”



End of Semester

The end of the semester here at UNM just about killed me. In addition to the usual academic hubbub, I hosted a visitor, submitted a paper, finished up the camera ready for our PODC paper, and dealt with my toddler who decided it would be a good couple of weeks to wake up every night at 2am (teething?, headache?, likes a dose of cherry-flavored children’s tylenol as a nightcap?)

The camera-ready of our paper, “Breaking the O(n^2) Bit Barrier: Scalable Byzantine agreement with an Adaptive Adversary” (that I blogged about previously) is now available here.  This was probably the most time I’ve spent going from an accepted paper to a camera ready – mostly because the algorithm in the paper consisted of several small parts with many connections between the parts.  Hopefully, our new version makes everything much easier to understand.

Some great news we received, right after submitting the camera ready, is that our paper was selected to be in the “best paper session” of PODC.  The list of all such papers is below – I know that at least some of these papers will be invited to a special issue of the JACM, and all of them look interesting.   The best paper session is a new thing for PODC.  I definitely like the idea of having multiple best papers, it gives more information about the assessment of the PC than a single best paper.  I’ll probably read through most of these before the conference.

Deterministic Distributed Vertex Coloring in Polylogarithmic Time
Barenboim, Elkin

Breaking the O(n^2) Bit Barrier: Scalable Byzantine agreement with an Adaptive Adversary
King, Saia

Optimal Gradient Clock Synchronization in Dynamic Networks
Lenzen, Kuhn, Locher, Oshman

Online set packing and competitive scheduling of multi-part tasks
Emek, Halldorsson, Mansour, Patt-Shamir, Radhakrishnan, Rawitz

How to Meet when you Forget: Log-space Rendezvous in Arbitrary Graphs
Czyzowicz, Kosowski, Pelc

A Modular Approach to Shared-memory Consensus, with Applications to the Probabilistic-write Model
Aspnes

Constant RMR Solutions to Reader Writer Synchronization
Bhatt, Jayanti



Scalable Byzantine Fault Tolerance
March 5, 2010, 7:25 pm
Filed under: Uncategorized | Tags: , , ,

Valerie King and I recently finished a paper that I wanted to talk about here.  First, some motivation.  For a whole slew of reasons, adversarial, aka Byzantine, attacks are garnering increasing attention now in the systems community.  For example, the following position paper, from ’08, makes a great case for Byzantine Fault Tolerance (BFT): “BFT: The Time is Now“.  Unfortunately, system folk have been less than enamored with the efficiency of Byzantine agreement algorithms:

  • “Unfortunately, Byzantine agreement requires a number of messages quadratic in the number of participants, so it is infeasible for use in synchronizing a large number of replicas” [1]
  • “Eventually batching cannot compensate for the quadratic number of messages [of Practical Byzantine Fault Tolerance (PBFT)]” [2]
  • “The communication overhead of Byzantine Agreement is inherently large” [3]

There have been many attempts to circumvent this inefficiency by 1) trying to ensure Byzantine fault tolerance in a particular system without solving Byzantine agreement; and 2) trying to solve Byzantine agreement in special cases for a type of adversary that is unique to that paper.

I think this is wrong.  Why?  To build secure systems, we need secure and trusted components, and so we shouldn’t create new components (i.e. subroutines robust to Byzantine faults) from scratch for each new system.

So why the problem with Byzantine agreement?  The problem is not the problem. Despite what naysayers may say about Byzantine agreement(BA) being “too paranoid”, “too hard”, or  “too abstract”, BA is the right problem formulation for building secure networks.  To see this, read the intros of the dozen systems papers from the last several years that say that a key component of what they want to do reduces to BA.  The problem is the solution.  Or rather the lack of a solution.  We still don’t have a general algorithm to solve BA that is practical enough to plug in to the many systems that want to solve it.  So despite all the work in this area, I think BA is still a major unsolved problem in distributed computing.

So now is the point where I’d like to tell you that we solved this problem.  Sadly we haven’t.  However, we have been working hard in the past couple of years designing algorithms for BA that reduce communication overhead.  In particular, we’ve been able to design algorithms where the number of messages each node sends out is about O(sqrt(n)) instead of O(n) (technically soft-O(sqrt(n)), which ignores log factors).  The price we pay is a non-zero probability of failure, but a probability of failure that is polynomially small in the network size.  The bigger the network, the smaller the probability of failure.

So then what about this new paper?  Our new paper describes an algorithm that solves BA with about O(sqrt(n)) messages and polylog latency, against an adaptive adversary, which can choose which nodes to take over at any time during the algorithm, up to taking over up to a 1/3 fraction of the nodes.  This is the first time we went head-to-head with an adaptive adversary and it definitely makes the problem more challenging and interesting.  We had to use a slew of new techniques (samplers, a kind of iterated secret sharing, a new technique for going from almost everywhere to everywhere BA) that I’d like to talk about in another post.  Standard caveats for theory research apply (hidden constants, hidden log terms, etc, etc) but I do think there are new ideas in this paper that can be useful in designing a practical algorithm.

OK, without any additional fanfare,  here is the paper.

References:

[1] S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz. Pond: the OceanStore prototype. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 1–14, 2003.

[2] J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. Hq replication: A hybrid quorum protocol for byzantine fault tolerance. In In Proceedings of Operating Systems Design and Implementation (OSDI), San Diego, CA, USA, 2005.

[3] Chien-Fu Cheng, Shu-Ching Wang, and Tyne Liang. The anatomy study of server-initial agreement for general hierarchy wired/wireless networks. Computer Standards & Interfaces, 31(1):219 – 226, 2009.



Consensus

Consensus[1] is arguably one of the most fundamental problem in distributed computing.  The basic idea of the problem is: n players each vote on one of two choices, and the players  want to hold an election to decide which choice “wins”.  The problem is made more interesting by the fact that there is no leader, i.e. no single player who everyone knows is trustworthy and can be counted on to tabulate all the votes accurately.

Here’s a more precise statement of the problem.  Each of n players starts with either a 0 or a 1 as input.  A certain fraction of the players (say 1/3) are bad, and will collude to thwart the remaining good players.  Unfortunately, the good players have no idea who the bad players are.  Our goal is to create an algorithm for the good players that ensures that

  • All good players output the same bit in the end
  • The bit output by all good players is the same as the input bit of at least one good player

At first, the second condition may seem ridiculously easy to satisfy and completely useless.  In fact, it’s ridiculously hard to satisfy and extremely useful.  To show the problem is hard, consider a naive algorithm where all players send out their bits to each other and then each player outputs the bit that it received from the majority of other players.  This algorithm fails because the bad guys can send different messages to different people.  In particular, when the vote is close, the bad guys can make one good guy commit to the bit 0 and another good guy commit to the bit 1.

The next thing I want to talk about is how useful this problem is.   First, the problem is broadly useful in the engineering sense of helping us build tools.  If you can solve consensus, you can solve the problem of how to build a system that is more robust than any of its components.  Think about it: each component votes on the outcome of a computation and then with consensus it’s easy to determine which outcome is decided by a plurality of the components (e.g. first everyone sends out their votes, then everyone calculates the majority vote, then everyone solves the consensus problem to commit to a majority vote.)   It’s not surprising then that solutions to the consensus problem have been used in all kinds of systems ranging from flight control, to data bases, to peer-to-peer; Google uses consensus in the Chubby system; Microsoft uses consensus in Farsite; most structured peer-to-peer systems (e.g. Tapestry, Oceanstore) use consensus in some way or another.  Any distributed system built with any pretension towards robustness that is not using consensus probably should be.

But that’s just the engineering side of things.  Consensus is useful because it allows us to study synchronization in complex systems.  How can systems like birds, bees, bacteria, markets come to a decision even when there is no leader.  We know they do it, and that they do it robustly, but exactly how do they do it and what is the trade-off they pay between robustness and the time and communication costs of doing it?  Studying upper and lower bounds on the consensus problem gives insight into these natural systems.  The study of how these agreement building processes, or quorum sensing, occur in nature has become quite popular lately, since they occur so pervasively.

Consensus is also useful because it helps us study fundamental properties of computation.  One of the first major result on consensus due to Fischer, Lynch and Patterson, in 1982, was that consensus is impossible for any deterministic algorithm with even one bad player (in the asynchronous communication model).  However, a follow up paper by Ben-Or showed that with a randomized algorithm, it was possible to solve this problem even with a constant fraction of bad players, albeit in exponential time.  This was a fundamental result giving some idea of how useful randomization can be for computation under an adversarial model.  As a grad student, I remember taking a class with Paul Beame telling us how impressed he was by what these two results said about the power of randomness when they first came out.  Cryptography was also shown to be useful for circumventing the Fischer, Lynch and Patterson result, and I’ve heard of several prominent cryptographers  who were first drawn to that area at the time because of its usefulness in solving consensus.

In the next week or two, I’ll go into some of the details of recent results on this problem that make use of randomness and cryptography.  Early randomized algorithms for consensus like Ben-Or’s used very clever tricks, but no heavy duty mathematical machinery.  More recent results, which run in polynomial time, make use of more modern tricks like the probabilistic method, expanders, extractors, samplers and connections with error-correcting codes, along with assorted cryptographic tricks.  I’ve been involved on the work using randomness, so I’ll probably start there.

[1] The consensus problem is also frequently referred to as the Byzantine agreement problem or simply agreement.  I prefer the name consensus, since it is more succinct and descriptive. While the research community has not yet reached a “consensus” on a single name for this problem, in recent years, the name consensus is being used most frequently.




Follow

Get every new post delivered to your Inbox.