Filed under: Uncategorized | Tags: distributed computing, interactive computation, reliability, theory

How do you compute over a noisy channel? Say Alice and Bob want to run a distributed algorithm, but they can only communicate over a channel where each bit is flipped either stochastically or adversarially. What is the smallest number of bits they must send in order to obtain the correct output of the algorithm, with high probability? This general problem is called Interactive Communication, or sometimes Interactive Coding/Computation – let’s use IC for short. It doesn’t take much imagination to think of applications of IC: enabling sensor nodes to compute a function on their inputs in a noisy environment; designing reliable algorithms for natural algorithms [1]; and designing attack-resistant algorithms [2], just as some examples.

Wait, you may be thinking, isn’t this problem solved by error correcting codes? No – the bits sent in the distributed algorithm may be determined only one at a time, in a dynamic manner. Even for stochastic noise, if we naively encode each bit sent, in order to get a union bound ensuring the probability of any decoding error is small, we’ll need logarithmic blowup for each bit. We can do better.

In fact, in the first paper on this problem, Coding for Interactive Communication, Leonard Schulman described an algorithm that requires only constant blowup, even when a constant fraction of the bits on the channel are flipped **adversarially**. Schulman’s algorithm was based on a new type of code, tree codes, designed specifically for interactive communication. After this paper was published in 1996, subsequent work focused on improving the error tolerance of Schulman’s algorithm (i.e. tolerating a larger constant fraction of errors); tuning the communication blowup to the (known) error rate; and considering the situation where the computation occurs over more than 2 parties. See the survey paper Coding for interactive computation: progress and challenges (especially Sections 3 and 4) for details.

I’ll note here that in a sense all of information theory can be though of as a subset of IC. In particular, information theory considers “only” the class of distributed algorithms where Alice has an input that she wants to send to Bob. In IC, we need to handle all possible distributed algorithms, and so in a way, it’s pretty amazing that a constant adversarial noise rate can even be tolerated.

A very recent breakthrough in IC occurred in FOCS 2014, where Bernhard Haeupler in the paper Interactive Channel Capacity Revisited, made huge progress on the problem of tuning the communication blowup to a known error rate. In particular, he described an algorithm that given a fixed and known adversarial noise rate \epsilon, achieves a communication blowup as a function of \epsilon that is conjectured to be optimal.

So what if \epsilon is not known in advance? I’m going to end this post with a short advertisement for a recent paper with some collaborators of mine that begins to address that problem. The paper is Interactive Communication with Unknown Noise Rate, with Varsha Dani, Mahnush Movahedi, and Maxwell Young that will be in this upcoming ICALP. This is the first paper I’ve written about IC, and I hope it won’t be the last. In fact, I’d like to see more people in the distributed computing community consider the problem. In particular, it’d be nice to make progress on the (currently very hard) problem of somehow generalizing results to more than 2 parties.

[1] For example, see this paper for a closely related problem: Breathe before Speaking: Efficient Information Dissemination Despite Noisy, Limited and Anonymous Communication

[2] This last domain is particularly interesting to me. An obvious connection is attacks on communication channels. A less-obvious connection is attacks on nodes. If communication from each node is bounded, as is increasingly common for algorithms on large networks, there may be a fairly tight connection between attacks on communication channels and attacks on nodes. Thus, there may be connections to attacks like Byzantine faults.

Filed under: Uncategorized | Tags: high performance computing, reliability, theory

A few weeks ago I gave a talk on network reliability at the ICIS “Future of the Field” workshop on High Performance Computing[1]. This talk surveys three main theoretical approaches we have for building reliable distributed systems from unreliable components. When writing the talk, I was struck by how few theoretical tools we actually have for this problem: Byzantine agreement and state replication; Secure Multiparty Computation; and a circuit based approach proposed by Von Neumann which seems to have been more or less abandoned 20 years ago. Am I missing anything?

[1] How I got invited to such a workshop is a topic for another post…

In this post, I want to finish up the result that for any function of n inputs, there is a network of O(2^n/n) noisy gates that will compute that function with probability better than 1/2. Since most functions require O(2^n/n) **noise-free **gates, this will show that, for most functions, there is at most a constant blowup in gates when all gates are noisy.

Recall from last time that we can build a MUX over 2^x signals with O(2^x) noisy gates. Using a similar technique, for any set of x inputs, we can build an “All Functions” circuit that computes all 2^(2^x) possible functions of these inputs with O(2^(2^x)) gates. Our result today will make use of both 1) an “All Functions” circuit and 2) a MUX.

The technique is illustrated in the figure below. Assume we’re trying to compute a function f of n inputs. Let **a** = log (n-log n) Basically, we send **a** of the inputs the “All Functions” circuit and we send n-**a **of the inputs to the MUX. The MUX selects, based on the first n-**a **inputs, the exact function of the remaining **a **inputs that determines the function f over all n inputs. The MUX selects 1 of 2^(n-a) such “completion functions”.

The beautiful thing here is that both the MUX and the “All Functions” circuit require only O(2^n/n) gates. Thus, this clever combination of both types of circuits has saved us a factor of n in the total number of noisy gates required.

Filed under: Uncategorized | Tags: distributed computing, papers, reliability, theory

In today’s post, I want to focus on some of the highlights of the lower bound that I mentioned last time on the “Networks of Noisy Gates“ problem. Recall that, in this problem, you are given a function that can be computed with a circuit of regular gates. You want to construct a new circuit with the smallest number of noisy gates that computes correctly with probability greater than . These new noisy gates each fail independently with some small probability and when they fail, they produce the complement of their correct output.

The result I want to talk about today is the lower bound by Pippenger, Stamoulis and Tsitsiklis which shows that there are some functions that can be computed with gates without noise, but which require noisy gates. In particular, they show that this is true if is the exclusive or function.

The first part of their lower bound is a result that shows a new model of failure where wires can also fail can be made equivalent to the old model of failure where only gates can fail. In particular, in the new model, each wire fails with some probability , independently of all other wires and gates, and failure of a wire results in transmitting the complement of the bit that is sent down that wire.

Lemma 3.1 of the paper “Lower Bound for the Redundancy of Self-correcting Arrangements of Unreliable Functional Elements” by Dobrushin and Ortyukov shows the following. Let be the maximum number of input wires for any gate. Then for any and for any set of inputs to a gate, there exists a vector of malfunction probabilities on the gate such that the probability that the gate fails to produce the correct output is exactly . Essentially this shows that, the new model where we have a small probability of failure on both wires and gates is stochastically equivalent to the old model where we have a larger probability of error just on the gates.

Now we assume we have inputs to the to our exclusive-or function , and we let be these inputs. For the purposes of the lower bound, we’ll assume that each is an independent random variable that is with probability and otherwise. Now for this lower bound, we’re going to be very generous to the noisy circuit. We’re going to assume that 1) no gate ever fails and 2) the only wires that can fail are those that are directly connected to some input bit ( each of these wires fails independently with probability ).

For the -th input bit, we’ll let be the number of wires that are connected directly to that input bit, and hence carry the value if they are not faulty. Now in the backend of the circuit, which is computing the exclusive-or of the input bits, there needs to be some estimate for each value of . Not surprisingly, it is possible to prove (as the paper does) that Maximum Likelihood way to estimate is to take the majority bit over all the wires that are connected to input . We’ll let be an indicator r.v. that is iff the Maximum Likelihood estimate for is wrong. Note that , since if *every* wire that carries that value is faulty, then clearly the circuit will use the wrong value.

Next, note that the probability that the noisy circuit fails is just . So now we have a cute probability problem: assume you’re give independent indicator random variables, you know the probability that each of them is , and you want to compute the probability that their exclusive-or is . The next lemma solves this problem with the help of generating functions.

*Proof:* Let be the generating function for . Note that . Now let and let . Now comes the clever part: note that . This is true since is always and is if is even and if is odd.

But by linearity of expectation, . Next, note that since the are independent. Thus, and . This completes the proof.

The remainder of the proof for the lower bound is just algebra. We’ll make use of the inequality between arithmetic and geometric means, which states that for any set of numbers, , it holds that .

Lemma 2is

*Proof:* Let be the probability of error for the noisy circuit. By assumption, . Using Lemma~1 and our bound on , we have that

Now using the inequality between arithmetic and geometric means, we have that

Again using the inequality between arithmetic and geometric means (now on the term that is being subtracted) and the fact that , we get:

Isolating the term in the above inequality gives us that

Since is a constant depending only on and since , the above inequality completes the proof.

Finally, we note that since each gate is assumed to have constant fan in, if the number of wires is , it follows that the number of gates in the noisy circuit must also be . That basically completes the proof of the lower bound. The interesting thing is that the only difficulty we exploited is the difficulty of estimating the input values in the situation where the wires may be faulty!