# Machinations

On Networks of Noisy Gates
September 12, 2011, 6:24 pm
Filed under: Uncategorized | Tags: , , ,

In 1952, Von Neumann proposed the following problem.  You are given a function f that can be computed with a circuit of n gates.  You have to build a new circuit to compute f, but all the gates in your new circuit are unreliable. In particular, each new gate fails independently with some small probability epsilon, and when a gate fails, it computes the NOT of what it should compute.  What is the smallest number of gates you need in order to compute f with a probability of error strictly less than 1/2?

Turns out that in general you need Theta(n log n) gates.  The upper bound was shown by Von Neumann, and the lower bound for the exclusive-or function was first proved by Pippenger, Stamoulis and Tsitsiklis (after a previous incorrect attempt by others).

Today I want to talk about the upper bound result.  You might think this is pretty easy – just duplicate each gate log n times, do majority filtering, and use Chernoff bounds to show that the majority output will be correct.  Unfortunately, duplication and majority filtering alone fails since the fraction of faulty wires will increase by a constant with each level of the circuit.

To fix this, you need a sampler (aka a concentrator).  A sampler is just a special type of bipartite graph.  In the simplest case, there are the same number of nodes on the top as on the bottom and the graph is d-regular for some constant d.  The very useful property of a sampler is that it reduces error in the following sense.  If some fraction x of the nodes in the top are “bad”, then the fraction of nodes on the bottom that have a majority of bad neighbors is less than x/4.  We can make the denominator an arbitrary large constant by increasing d, but 4 will suffice for our purposes.  For more about samplers, including how to construct them and other applications, see Section 3.2.1 of our Breaking the O(n^2) Bit Barrier paper.

Once, you’ve got a sampler, Von Neumann’s construction is pretty easy and is illustrated in the figure below.  In the upper right corner of the figure is a noiseless gate G with two input wires and a single output wire.  The large circuit to the left replicates this functionality with noisy gates.  In the large circuit, in place of the two input wires, we have two “cables”, each consisting of Theta(log n) wires.  In place of G, we have Theta(log n) replicas of G.  One wire from each cable goes into each of these replicas.  Below this, we have wires connected via a sampler to a bunch of simple circuits (“Maj”) that simply compute the majority of their inputs.

These majority circuits (Maj in the figure) are themselves also noisy.  However, since they have constant degree input, its not hard to construct them so that they fail only with constant probability. The names “Executive Organ” and “Restoring Organ” used in the figure are Von Neumann’s original terminology.  A bit colorful perhaps, but who are we to change it?

The simple analysis of this circuit is illustrated below. Adjacent to each cable is now an upper bound on the fraction of wires in the cable whose output is faulty.  Each of these upper bounds holds with high probability (whp),  i.e. a probability of error that is polynomially small in n). Adjacent to each gate is the probability that the gate is faulty.

We assume inductively that the fraction of bad wires in each of the input cables is no more than \theta for some fixed constant \theta.  The first observation is that the fraction of faulty wires coming out of the gates G is then no more than 2\theta + 2\epsilon.  To see this, note that, whp, the fraction of gates that fails is no more than 2\epsilon by Chernoff bounds.  In the worst case, each faulty gate and each faulty input wire causes its own separate faulty output, which gives us  2\theta + 2\epsilon faulty wires.

Next notice that by the properties of the sampler, if all the Maj circuits are correct, we would have a factor of 4 decrease in the fraction of faulty wires leaving these, so that the fraction of faulty output wires would be only 1/2(\theta + \epsilon).  But again by Chernoff bounds, whp, at most 2\epsilon of the Maj circuits will be faulty.  Thus we have at most a 1/2(\theta + \epsilon) + 2\epsilon fraction of faulty wires coming out and this last term is no more than \theta for \theta large enough but depending only on \epsilon.

The final thing to note is that the fraction of faulty wires on the final output cable is the same as the fraction of faulty wires on the two input cables.  Thus, the fraction of faulty wires does not increase with the depth of the circuit! This is a remarkable result, particularly since it dates back to 1956!  I’d argue that the Von Neumann fault model continues to have the same sway over CS as the Von Neumann architecture.  Our fault models still generally assume that a linear number of processing elements are made faulty, and we seek to tolerate these faults while minimizing resource costs with respect to computation in a world without faults. Unfortunately, a resource blowup of a logarithmic factor seems to be completely intolerable to practitioners – especially if in the common case, there are no faults at all!  This has been a barrier to use of these types of results.
How can we go beyond this model in order to get more practical robustness?  Should we use properties of modern computational networks, i.e. their ability to be reconfigured and to allow for feedback? Should we use a more modern type of competitive analysis: i.e. assume an adversary with a fixed but unknown budget b of processing units that it can corrupt, and then demand that our resource blowup be competitive with b instead of n?  Should we try to get improved results for specialized classes of functions, rather than try to attack the general problem for all types of functions?

June 22, 2010, 6:19 pm
Filed under: Uncategorized | Tags: , , , ,

This is the final post of a two part series.  Guest post by Navin Rustagi.

In the paper which we have submitted to SSS, we show that for worms which are designed to last for at most O(log n) time steps, there are certain conditions on the parameters which ensure that with the alert strategy adapted slightly from our paper in OPODIS, all but o(n) nodes in the network will be alerted w.h.p.

We call the lower bound on the expected rate of growth of alerts as p and the upper bound on the expected rate of growth of worms as q. We choose the values of  α,  β and γ s.t  p ≥ 2q². For these values of the parameters, we show that w.h.p all but o(n) nodes would be alerted if the worms are constrained to spread in at most O(log n) time steps. In this paper, we assume that the overlay network has degree O(log n)(as opposed to a constant degree in the previous paper) so that we can make use of the resulting high vertex and edge expansion in our analysis. The rate of growth of alerts as characterized by  p  only provably holds true till O(n/log n) nodes are alerted. After that, the lower bound on the expected rate of growth of alerts till the number of alerted nodes exceed n/3, is called p1=(1+e /4).

which is not a small step round is called a   large step round.

We observe that for a worm to take over the network in O(log n) rounds, there has to be a large step round after n/(logp12r+1n) nodes have been infected and before κ0 n/(logp12r+1n) nodes have been infected, where κ0 is a fixed constant. We make use of the high edge and vertex expansion of the underlying network, coupled with the fact that  p ≥ 2q²  to show that the detector nodes alerted at the end of this large step round are good enough to alert all but o(n) nodes in τ more rounds.

A simulation implementing our algorithm on a network of 500 nodes with degree 10 and  α=4,  γ=0.15,  β=1  and  τ=3 is given here.

June 17, 2010, 7:10 pm
Filed under: Uncategorized | Tags: , , , ,

This post is the first part of a two part series.  This is by Jared’s student Navin.

Jared and I submitted a paper to SSS 2010. This blog entry will describe the results and some of the ideas used in that paper. The problem is as follows.

Consider the following game between a worm and an alert over a network of  n nodes. Initially, no nodes are infected or alerted and each node in the network is a special detector node independently with small but constant probability, γ. A detector node is dedicated to  protecting  itself against a worst case worm attack and to generate a trace of that attack called  Self Certifying Alerts(SCA). Any node which has a SCA, can run that trace in a sand boxed environment to verify the vulnerability and take necessary measures to protect itself.

The game starts with a single node becoming infected. In every round thereafter, every infected node sends out β worms to other nodes in the population for some constant β; in addition, every alerted node sends out α alerts for some constant α. Nodes in the network change state according to the following three rules: 1) If a worm is received by a node that is not a detector and is not alerted, that node becomes infected; 2) If a worm is received by a node that is a detector, that node becomes alerted; 3) If an alert is received by a node that is not infected, that node becomes alerted.

We allow an infected node to send worm messages to any other nodes in the network, but, in contrast, only allow the alerts to be sent over a special precomputed overlay network with degree d. This network is a random d regular graph. We assume  that the infected nodes collaborate with each other, and know everything except which nodes are detectors, and the alerted nodes’ random coin flips.

So, is there a strategy which the  alerted nodes should follow to ensure that all but o(n) nodes in the network would  be alerted? Are there any properties of the underlying overlay network, which are necessary for any successful strategy by alerted nodes to work?

Consider the  following strategy for propagating alerts. Each alerted node, chooses α neighbors uniformly at random from its neighbors to send alerts to in the next time step.  In our paper titled “Worm Versus Alert: Who Wins in a Battle for Control of a Large-Scale Network?’‘ which appeared in OPODIS 2007, we showed that this strategy for the alerted nodes ensures that only a vanishingly small fraction of the nodes become infected w.h.p, no matter what strategy is used by the infected nodes. In particular, we prove that if d ≥ α and ( α / β(1 – γ) ) > (2d)/c where c is the expansion constant, then w.h.p all but o(n) nodes are infected. We also prove in this paper that there is a worm strategy to take over the network, if the underlying network does not have good expansion.

There is a run of this game for 600 nodes given in the figure.  Time moves from left to right, top to bottom in the figure and the red nodes are infected, green nodes are alerted and blue nodes are neither.

So far this post  describes the problem statement and previous contributions. I will describe the most recent contributions in the next blogpost.

Navin Rustagi

Calling Bluffs and Naming Names: Myths about the Internet Topolgy

Tanya Berger-Wolf sent me a brave little paper recently: Mathematics and the Internet: A Source of Enormous Confusion and Great Potential by Willinger, Alderson and Doyle.  This paper does a post mortem on the popular belief that empirical studies show that the Internet is scale-free.  Spoiler: They don’t.

Willinger et al. trace this erroneous belief starting with a paper by Pansiot and Grad (who are by no means responsible), that collected data on node degrees “to get some experimental data on the shape of multicast trees one can actually obtain in [the real] Internet”.  This data was collected using trace routes, which was reasonable given that its purpose was to determine the shape of multicast trees.  In 1999, Faloutsous, Faloutsous and Faloutsous used the data from Pansiot and Grad for a purpose for which it was not intended: they used it to infer information about the degree distribution for the Internet’s router-level topology.  Next, Barabasi et al., who were studying the WWW graph, and observing scale-free distributions there, picked up the reports of scale-free distributions for the Internet and added this to their list of networks with such properties.

A paper soon followed by Barabasi and Albert that described the preferential attachment model of network growth, wherein nodes are added one at a time to a network; each node has a constant number of out links; and the destinations of these out links are selected from the current nodes with probability proportional to each current node’s number of incoming links. Barabasi and Albert showed that this model generates a network with a scale-free distribution, and posited the preferential attachment as a model of the growth of the Internet and the WWW network.  The preferential attachment model had actually been studied over about 75 years by Yule, Luria and Delbruck, and Simon, but the rediscovery by Albert and Barabasi, and the possible connection to modern networks generated a huge amount of excitement and a large number of follow-up papers.  In particular, a slew of follow-up papers specifically studied properties of preferential attachment (PA) networks, i.e. networks generated via the preferential attachment model of growth. It was shown that PA networks are very robust to random deletions, but very fragile to adversarial deletions.  One commonly accepted conclusion was that the Internet, like PA networks, had high degree nodes, that were very centrally located, and whose removal would easily disconnect the network.  This idea was actually celebrated as a nice implication from theoretical network models to the real world.

So what’s the problem?  Basically the research community made two huge mistakes; mistakes that some people in the community still have not recovered from (in the sense that “recover from” means  “become aware of”)!

Mistake 1: Empirical studies do not support the claim that the Internet is scale-free

Willinger et al. go through several problems about why traceroutes do not create accurate maps of the Internet.  There are many technical details here that are hard for a theoretician like me to understand.  However, one detail that caught my eye was the fact that when a traceroute encounters an “opaque layer-2” cloud, “it falsely ‘discovers’ a high-degree node that is really a logical entity  … rather than a physical node”.  This causes traceroutes to report high-degree nodes in the core of the router-level Internet, even when such nodes don’t exist.  This type of bias is claimed to be even worse than the well-known “over-sampling of high-degree nodes” bias that traceroutes also have.

Mistake 2: Scale-free networks are not PA networks

PA Networks are scale-free, but the converse is not true.  In particular, there are scale-free networks that are robust to adversarial deletions.  Intuitively, it’s possible to have a scale-free network where the high degree nodes are not at the “core” of the network.  More generally, it’s possible to have a scale free network with good expansion properties.  Such networks are robust to adversarial deletions.  In fact, Wilinger et al., suggest that the Internet is more likely to have the high degree nodes at the “periphery” of the network since each router can only handle a certain amount of traffic, and more edges are only possible if each edge is handling less traffic.

What Next?

Here the Willinger paper looses steam.  They suggest a first principles approach to Internet modeling, where the network formed is one that tries to optimize a given algorithmic problem over the network.  This is great.  What is not so great is the model they propose, which assumes that the network creation problem is solved in a completely centralized manner.  Much better would be if they had used game theory as a starting point.  After all, the Internet is inherently a massive, distributed enterprise. There is actually some really cool work done in algorithmic game theory on network creation games.  For example, see Chapter 19 of The Book, or this paper by Fabrikant et al. (shout out to Elitza Maneva in the et al. for this paper, who was visiting UPC while I was there on sabbatical).  Yes, it’s true that most of these game theory result have shown that network creation games have small price of anarchy.  However, that does not imply that the network topology you get in a Nash equilibria will be like the topology you get in the socially optimum solution.  I’d like to see some results on the types of topologies one might expect in an Nash equilibria for network creation games.

Day 3 (the final) from PODC
Last Installment from Amitabh – Ed I should sell this pic to the Canadan government, or to the Calgarians!

A riveting talk was the keynote  given by Bruce Hendrickson:`Emerging Challenges and Opportunities in Parallel computing: The Cretaceous Redux?’.  He discussed the state of parallel computing and the wide gap between theory and practice in the area.  With the advancement of Moore’s law and new technologies like multi-core emerging, a revolutionary change of environment is upon us and many  present technologies in practice like MPI may face extinction. He compared it to the Cretaceous era with the well adapted and highly specialized Dinosaurs about to feel the meteors come plummeting through the atmosphere. Upon some rocks, there scurry about scruffy rodents, the people in theory, oblivious to reality and immune in some ways to the incoming meteors. This change would thrust upon the rodents the task of rebuilding the world, moving the dead bodies of the dinosaurs out of the way. All in all, great opportunities await in this scenario for both the theoretician and the practitioner.

George Giakkoupis presented The Effect of Power-Law Degrees on the navigability of Small Worlds, his work with Pierre Fraigniaud. This work has a somewhat surprising extension to Kleinberg’s famous results on small world networks . The Kleinberg model  has a n-node d-dimensional lattice where each node u has directed edges to it’s closest neighbors plus a long range link with target chosen among all nodes v with probability proportional to 1/d(u,v)^r where d(u,v) is the distance between the node u and node v.  Kleinberg showed that when r=d (the dimension of the lattice), greedy search needs O(log^2 n) expected steps to locate a target from a source.  In contrast, for r not equal to d, any decentralized algorithm will need  polynomial(n) expected steps. This showed that in a small world network, not only is the diameter small but there is a way to find paths in a decentralized way, provided each agent has a sense of direction. George and Pierre modified the Kleinberg model so that the number of long range links follows a power-law distribution. They discovered that power-laws help navigation iff 2 < a < 3, where a is the exponent of the  power law.  In particular, they showed that greedy search requiring O(log^(a-1)n) steps for a in such a range.  The effect of the power-law distribution is significant: As a approaches 2 from above, the expected number of steps of greedy routing in the augmented lattice with power-law degrees approaches the square-root of the expected number of steps of greedy routing in the augmented lattice with fixed degrees, even with both networks having the same average degree. The other surprising thing is that their results don’t seem to depend on the dimensionality of the lattice.

A short note on two other papers. Chen Avin et al’s paper SINR Diagrams: Towards Algorithmically Usable SINR Models of Wireless Networks (presented by Yuval Emek) discusses the properties of the Signal-to-Interference and Noise Ratio (SINR) model which seems to be a more realistic alternative to simpler models such as Unit Disk Graph (UDG).  Their model allows for the possible convexity of  reception zones which seems to be observed in practice.   Prosenjit Bose, Paz Carmi and Stephane Durocher’s Bounding the Locality of Distributed Routing Algorithms (presented by Durocher) discusses various parameters necessary and/or sufficient to permit local routing on a network modelled by a connected undirected graph, and establish bounds on locality of routing for these parameters, as well as bounds on dilation (worst-case ratio of actual route lengths to shortest path lengths).

… and that’s all from PODC.