# Machinations

Sneak Preview

A sneak preview of the September SIGACT News Distributed Computing Column is now available here (Technion) and here (MIT).  Before I get into the newest column, I would just like to give a shout out to Idit Keidar, who I think has done an exceptional job of maintaining and advertising the SIGACT Distributed Computing column over the past few years.  Archived copies of past columns are on the above web sites – check them out.

The column for September, by Lidong Zhou lead researcher at Microsoft Research Asia , is about the tension between theoretical and practical research in distributed systems.  In fact, a major focus of the article (and a timely one for this blog) is practical ways to solve consensus in order to ensure reliability in large-scale distributed systems.  There are several interesting things I learned (or relearned) from this column including:

• The replicated machine approach based on consensus is considered by industry (well one good researcher at Microsoft at least) to be directly applicable to the problem of cloud computing.
• Algorithmic simplicity and graceful degradation of security guarantees are two properties desired by practitioners that currently seem to be ignored by theoreticians.
• One of the reasons for the popularity of the Paxos algorithm for solving consensus is its flexibility: it captures critical parts of the consensus problem but leaves “non-essential” details such as the exact protocol for choosing a leader unspecified.
• Lidong mentions that much of the inspiration for the column came from his “responsibility for the distributed systems behind various on-line services, such as Hotmail and the Bing search engine.”  I’m curious if this means that algorithms for consensus, such as Paxos, are at the heart of the robustness mechanisms for such systems.  If so, then the consensus problem is even more pervasive than I thought.

Consensus

Consensus[1] is arguably one of the most fundamental problem in distributed computing.  The basic idea of the problem is: n players each vote on one of two choices, and the players  want to hold an election to decide which choice “wins”.  The problem is made more interesting by the fact that there is no leader, i.e. no single player who everyone knows is trustworthy and can be counted on to tabulate all the votes accurately.

Here’s a more precise statement of the problem.  Each of n players starts with either a 0 or a 1 as input.  A certain fraction of the players (say 1/3) are bad, and will collude to thwart the remaining good players.  Unfortunately, the good players have no idea who the bad players are.  Our goal is to create an algorithm for the good players that ensures that

• All good players output the same bit in the end
• The bit output by all good players is the same as the input bit of at least one good player

At first, the second condition may seem ridiculously easy to satisfy and completely useless.  In fact, it’s ridiculously hard to satisfy and extremely useful.  To show the problem is hard, consider a naive algorithm where all players send out their bits to each other and then each player outputs the bit that it received from the majority of other players.  This algorithm fails because the bad guys can send different messages to different people.  In particular, when the vote is close, the bad guys can make one good guy commit to the bit 0 and another good guy commit to the bit 1.

The next thing I want to talk about is how useful this problem is.   First, the problem is broadly useful in the engineering sense of helping us build tools.  If you can solve consensus, you can solve the problem of how to build a system that is more robust than any of its components.  Think about it: each component votes on the outcome of a computation and then with consensus it’s easy to determine which outcome is decided by a plurality of the components (e.g. first everyone sends out their votes, then everyone calculates the majority vote, then everyone solves the consensus problem to commit to a majority vote.)   It’s not surprising then that solutions to the consensus problem have been used in all kinds of systems ranging from flight control, to data bases, to peer-to-peer; Google uses consensus in the Chubby system; Microsoft uses consensus in Farsite; most structured peer-to-peer systems (e.g. Tapestry, Oceanstore) use consensus in some way or another.  Any distributed system built with any pretension towards robustness that is not using consensus probably should be.

But that’s just the engineering side of things.  Consensus is useful because it allows us to study synchronization in complex systems.  How can systems like birds, bees, bacteria, markets come to a decision even when there is no leader.  We know they do it, and that they do it robustly, but exactly how do they do it and what is the trade-off they pay between robustness and the time and communication costs of doing it?  Studying upper and lower bounds on the consensus problem gives insight into these natural systems.  The study of how these agreement building processes, or quorum sensing, occur in nature has become quite popular lately, since they occur so pervasively.

Consensus is also useful because it helps us study fundamental properties of computation.  One of the first major result on consensus due to Fischer, Lynch and Patterson, in 1982, was that consensus is impossible for any deterministic algorithm with even one bad player (in the asynchronous communication model).  However, a follow up paper by Ben-Or showed that with a randomized algorithm, it was possible to solve this problem even with a constant fraction of bad players, albeit in exponential time.  This was a fundamental result giving some idea of how useful randomization can be for computation under an adversarial model.  As a grad student, I remember taking a class with Paul Beame telling us how impressed he was by what these two results said about the power of randomness when they first came out.  Cryptography was also shown to be useful for circumventing the Fischer, Lynch and Patterson result, and I’ve heard of several prominent cryptographers  who were first drawn to that area at the time because of its usefulness in solving consensus.

In the next week or two, I’ll go into some of the details of recent results on this problem that make use of randomness and cryptography.  Early randomized algorithms for consensus like Ben-Or’s used very clever tricks, but no heavy duty mathematical machinery.  More recent results, which run in polynomial time, make use of more modern tricks like the probabilistic method, expanders, extractors, samplers and connections with error-correcting codes, along with assorted cryptographic tricks.  I’ve been involved on the work using randomness, so I’ll probably start there.

[1] The consensus problem is also frequently referred to as the Byzantine agreement problem or simply agreement.  I prefer the name consensus, since it is more succinct and descriptive. While the research community has not yet reached a “consensus” on a single name for this problem, in recent years, the name consensus is being used most frequently.

Day 3 (the final) from PODC
Last Installment from Amitabh – Ed

I should sell this pic to the Canadan government, or to the Calgarians!

A riveting talk was the keynote  given by Bruce Hendrickson:Emerging Challenges and Opportunities in Parallel computing: The Cretaceous Redux?’.  He discussed the state of parallel computing and the wide gap between theory and practice in the area.  With the advancement of Moore’s law and new technologies like multi-core emerging, a revolutionary change of environment is upon us and many  present technologies in practice like MPI may face extinction. He compared it to the Cretaceous era with the well adapted and highly specialized Dinosaurs about to feel the meteors come plummeting through the atmosphere. Upon some rocks, there scurry about scruffy rodents, the people in theory, oblivious to reality and immune in some ways to the incoming meteors. This change would thrust upon the rodents the task of rebuilding the world, moving the dead bodies of the dinosaurs out of the way. All in all, great opportunities await in this scenario for both the theoretician and the practitioner.

George Giakkoupis presented The Effect of Power-Law Degrees on the navigability of Small Worlds, his work with Pierre Fraigniaud. This work has a somewhat surprising extension to Kleinberg’s famous results on small world networks . The Kleinberg model  has a n-node d-dimensional lattice where each node u has directed edges to it’s closest neighbors plus a long range link with target chosen among all nodes v with probability proportional to 1/d(u,v)^r where d(u,v) is the distance between the node u and node v.  Kleinberg showed that when r=d (the dimension of the lattice), greedy search needs O(log^2 n) expected steps to locate a target from a source.  In contrast, for r not equal to d, any decentralized algorithm will need  polynomial(n) expected steps. This showed that in a small world network, not only is the diameter small but there is a way to find paths in a decentralized way, provided each agent has a sense of direction. George and Pierre modified the Kleinberg model so that the number of long range links follows a power-law distribution. They discovered that power-laws help navigation iff 2 < a < 3, where a is the exponent of the  power law.  In particular, they showed that greedy search requiring O(log^(a-1)n) steps for a in such a range.  The effect of the power-law distribution is significant: As a approaches 2 from above, the expected number of steps of greedy routing in the augmented lattice with power-law degrees approaches the square-root of the expected number of steps of greedy routing in the augmented lattice with fixed degrees, even with both networks having the same average degree. The other surprising thing is that their results don’t seem to depend on the dimensionality of the lattice.

A short note on two other papers. Chen Avin et al’s paper SINR Diagrams: Towards Algorithmically Usable SINR Models of Wireless Networks (presented by Yuval Emek) discusses the properties of the Signal-to-Interference and Noise Ratio (SINR) model which seems to be a more realistic alternative to simpler models such as Unit Disk Graph (UDG).  Their model allows for the possible convexity of  reception zones which seems to be observed in practice.   Prosenjit Bose, Paz Carmi and Stephane Durocher’s Bounding the Locality of Distributed Routing Algorithms (presented by Durocher) discusses various parameters necessary and/or sufficient to permit local routing on a network modelled by a connected undirected graph, and establish bounds on locality of routing for these parameters, as well as bounds on dilation (worst-case ratio of actual route lengths to shortest path lengths).

… and that’s all from PODC.

Day 2 PODC’09

New Animals at the Zoo!

More from Amitabh – Ed

Day 2 began for me early at 8:20 when I presented our paper ‘The Forgiving Graph: A distributed data structure for low stretch under adversarial attack ‘. The early morning start gave me an opportunity to wish good evening to my European friends who were still suffering from jet-lag! Our paper gives provably good bounds on degree increase and stretch for a network under attack by an omniscient adversary which adds or removes nodes from the system, and the algorithm repairs only by adding edges. We also introduce a new data structure which we call the half-full tree (haft). The talk was well received and I was actually asked some questions after the talk which is usually a good sign! The slides from the talk are here .

Another talk in the same session I found particularly engaging was Andrew Handley’s presentation of his paper The Flip Markov Chain and a Randomizing P2P Protocol’ (work with Colin Cooper and Martin Dyer). The general problem is of obtaining random graphs from regular graphs using the simple operations switch’ (take two independent edges and swap their endpoints) and ‘flip'(take a 3-path, break and reconnect the side edges). Cooper, Dyer and Greenhill (CDG)( (journal version , SODA) have shown that random switches make a random graph quickly via a complicated analysis of a markov chain that applies the switch to random edges. However, flips are a better operation for P2P networks since they are local, preserve connectivity and allow simple protocols. In this paper, the authors improve the bounds on making random graphs by flips by constructing a new markov chain using the markov chain of CDG, but replacing a switch by a series of flips, and analysing this new chain.

The last session had another set of elegant talks. One would think that the area of random walks has been studied to death but here was another paper on it. Atish Das Sarma presented his paper with Danupon Nanonkai and Gopal Pandurangan Fast Distributed Random walks. Their results include showing a random walk of length \ell can be done in Õ(l^2/3D^1/3) rounds on an undirected unweighted network where D is the diameter of the network, and k random walks can be done in Õ((kl)^2/3D^1/3) if k <= l^2 and Õ((kl)^1/2)  otherwise . They do this by the simple idea of nodes starting to random walk in anticipation of future use, thus, a few short walks are done in the beginning and stitched’ together as required, if more short walks are required they are done on the fly. They show how to do many short walks in parallel without causing too much congestion.

The second talk was Distributed and Parallel Algorithms for Weighted Vertex Cover and other Covering Problems by Christos Koufogiannakis and Neal Young. They have developed algorithms for covering problems including a parallel 2-approximation algorithm for Weighted Vertex Cover in polylog rounds. The idea is to use edge-discount operations where a node with higher weight discounts an node with a lower weight node taking the connecting edge weight to zero. In the distributed version, over a few rounds, a bipartite version of the network is formed in each round. Independent rooted stars are taken from this graph and a coordinated probabilistic discount operation is done within the stars, till the covering is obtained. Christos explained the algorithm by using the classic model of the nodes of the bipartite graph being boys and girls, and as in the real world, the girls jilting a number of boys!

The third and final talk Return of the Primal-Dual: Distributed Metric Facility Location by Saurav Pandit and Sriram Pemmaraju , was introduced by Sriram as one in the Star Wars’ series following on from Greedy strikes back.: Improved facility location algorithms . Informally, the facility location problem is that given a set of facilities and clients, how to assign clients to open facilities such as to minimize overall costs which include costs of opening the facility and connection costs. In this paper, they obtain a 7-approximation in O(log(number of facilities) + log(number of clients)) round distributed algorithms for the metric facility location problem in the CONGEST model (message sizes bound by O(log network size)). For this, they have used the elegant primal-dual method method (based on Jain and Vazirani’s   A primal-dual schema based approximation algorithm for the element connectivity problem) and a quick randomized sparsification of graphs due to Gfeller and Vicari (A randomized distributed algorithm for the maximal independent set problem in growth-bounded graphs ). A question posed at the end is that can the technique of primal-dual plus rapid sparsification be applied to other settings, where generally the technique of LP relaxation is used (e.g. What cannot be computer locally!, Kuhn et al )).

Note: Calgary zoo is very nice especially when there is a conference banquet there 🙂 – Banff forests are supposed to be even nicer, and there I head tomorrow, hopefully I can get the last installment of the reports out on time.

Day 1 PODC ’09
August 11, 2009, 4:11 pm
Filed under: Uncategorized | Tags: , , , ,

Editors Note: This is the first day of reports from PODC ’09 by my  student Amitabh Trehan.  Principles of Distributed Computing (PODC) is one of the premiere conferences focusing on theory of distributed computing.

An interesting talk in the morning was the presentation of the best student paper  Max Registers, Counters and Monotone Circuits (Aspnes, Attiya and Censor) presented by Keren Censor. The final  message of  the talk was Lower bounds do not always have the final say ‘ (quoted from the talk). The paper deals with implementation of concurrent data structures in shared memory systems, where n processes communicate by reading and writing to shared multi-reader multi-writer registers. A lower bound given by Jayanti et al shows that these operations take Omega(n) space and Omega(n) time steps in the worst case, for many common data structures. On careful analysis of the lower bound proof, the authors realised that they could develop sub-linear algorithms for many useful applications where the worst case would not occur. These are for algorithms where the number of operations will be bounded e.g. for applications which have a limited life-time or several instances of the data structure can be changed. They go on to show how some of these structures can be constructed with some nice use of recursion and my favorite data structures – trees!  This reminded me of a discussion I had the previous evening with Victor Luchangco about Nancy Lynch. He contended that one trait contributing to the success of her work (and of her students) was the rigorous questioning of every assumption and definition of interest they came across. A good lesson!

Robert Van Renesse‘s keynote talk Refining the way to Consensus‘ set up the stage for later talks on consensus.  His talk described several refinements to basic consensus algorithms from simple to non-blocking consensus, to election consensus, to voters consensus, and finally to recommender consensus. He stressed the importance of  exposing undergraduates to the consensus problem, because of the fact that this is a problem they are likely to see again the real world. What drew chuckles was a cartoon showing the ‘ideal’ programmer, a faceless human stuck to the computer screen and keyboard coding to strictly meet the specifications, and an ‘intuitive’ programmer, a happy guy with his back to the computer  smiling having no idea what the specifications he was implementing were but knowing when he was done with his coding job.  Editor Note: Look for more information on the consensus problem in a blog post in the next week or two.

There were two  excellent game theoretic’ talks in the afternoon. Georgios Piliouras presented his paper with Robert Kleinberg and Eva Tardos Load Balancing Without Regret in the Bulletin Board Model‘. They consider load balancing games in the bulletin board model (players can find out delay on all machines, but have no information on what their delay would have been if they had selected another machine) and show solutions using regret-minimization which are exponentially better than the correlated equilibrium. No-regret algorithms are an example of alternative (to the weaknesses of Nash equilibria) solution concepts based on average outcome of self-adapting agents who react to each other’s strategies in repeated play of the game. Martin Hoefer presented his paper with Ackermann, Berenbrink and Fischer Concurrent Imitation Dynamics in Congestion Games‘ which discusses the dynamics emerging when agents sample and possibly  imitating other agent’s strategies when to do so will improve their utility.  Their main result is to show that this imitation strategy leads to rapid convergence (logarithmic in number of players) to approximate equilibria for congestion games.  An approximate equilibrium is one where only a small fraction of the players have latency much better or worse than the average latency.

PODC 2009
August 10, 2009, 12:34 pm
Filed under: Uncategorized | Tags: , ,

The carousels are not boring at Calgary Airport!

Editor’s Note: My intrepid student Amitabh Trehan is live blogging from PODC.  The following series of posts will be guest posts from him about the conference.  Enjoy!

Welcome to the reports from PODC2009. For the next three days, I will be blogging from Calgary, Canada about the things I find interesting here.
The start of the morning was hardly a good advertisement for computing. As I waited for my flight from Albuquerque to Denver (which was to connect me to the Denver-Calgary flight), there was an announcement from the captain ‘Our computer’s warning light is on, and we cannot take off unless it switches off’. He followed up with some details- ‘we have many on-board computers looking after the same systems. Sometimes they do not agree with each other and if that happens, the warnings go on and we are not allowed to take off. Usually we just reset some breakers and we are ready to go. Unfortunately, this is not helping right now, so, I will just switch off the power to the plane for a while and switch it on again’. `This is like CTRL-ALT-DEL’, he added.
Well, rebooting did not work, so he had to inform us that we will all have to wait for the single all-powerful mechanic on the whole airport who at that moment was fixing some competitor’s plane.  We were all deboarded and told that with high probability we would be rebooked to other flights. But a miracle happened – the mechanic not only found the power for the computer but also switched it on. We got back on the flight and I made it to Calgary on time.
The University of Calgary campus is green and beautiful. A reception was held for PODC participants at the Nickle museum which houses currency coins from the roman era and other times besides other exhibits. The kit bag had SPAA proceedings in both book and CD-ROM form whereas PODC proceedings were  available only as a book.  The participants formed and dissolved groups of various sizes as new introductions were made, students eagerly talkled about  their research, and some old memories were reminisced. On to tomorrow.

The PODC reception at Nickel Museum

At the end of the reception!