Machinations


Sneak Preview

A sneak preview of the September SIGACT News Distributed Computing Column is now available here (Technion) and here (MIT).  Before I get into the newest column, I would just like to give a shout out to Idit Keidar, who I think has done an exceptional job of maintaining and advertising the SIGACT Distributed Computing column over the past few years.  Archived copies of past columns are on the above web sites – check them out.

The column for September, by Lidong Zhou lead researcher at Microsoft Research Asia , is about the tension between theoretical and practical research in distributed systems.  In fact, a major focus of the article (and a timely one for this blog) is practical ways to solve consensus in order to ensure reliability in large-scale distributed systems.  There are several interesting things I learned (or relearned) from this column including:

  • The replicated machine approach based on consensus is considered by industry (well one good researcher at Microsoft at least) to be directly applicable to the problem of cloud computing.
  • Algorithmic simplicity and graceful degradation of security guarantees are two properties desired by practitioners that currently seem to be ignored by theoreticians.
  • One of the reasons for the popularity of the Paxos algorithm for solving consensus is its flexibility: it captures critical parts of the consensus problem but leaves “non-essential” details such as the exact protocol for choosing a leader unspecified.
  • Lidong mentions that much of the inspiration for the column came from his “responsibility for the distributed systems behind various on-line services, such as Hotmail and the Bing search engine.”  I’m curious if this means that algorithms for consensus, such as Paxos, are at the heart of the robustness mechanisms for such systems.  If so, then the consensus problem is even more pervasive than I thought.