Calling Bluffs and Naming Names: Myths about the Internet Topolgy

Tanya Berger-Wolf sent me a brave little paper recently: Mathematics and the Internet: A Source of Enormous Confusion and Great Potential by Willinger, Alderson and Doyle.  This paper does a post mortem on the popular belief that empirical studies show that the Internet is scale-free.  Spoiler: They don’t.

Willinger et al. trace this erroneous belief starting with a paper by Pansiot and Grad (who are by no means responsible), that collected data on node degrees “to get some experimental data on the shape of multicast trees one can actually obtain in [the real] Internet”.  This data was collected using trace routes, which was reasonable given that its purpose was to determine the shape of multicast trees.  In 1999, Faloutsous, Faloutsous and Faloutsous used the data from Pansiot and Grad for a purpose for which it was not intended: they used it to infer information about the degree distribution for the Internet’s router-level topology.  Next, Barabasi et al., who were studying the WWW graph, and observing scale-free distributions there, picked up the reports of scale-free distributions for the Internet and added this to their list of networks with such properties.

A paper soon followed by Barabasi and Albert that described the preferential attachment model of network growth, wherein nodes are added one at a time to a network; each node has a constant number of out links; and the destinations of these out links are selected from the current nodes with probability proportional to each current node’s number of incoming links. Barabasi and Albert showed that this model generates a network with a scale-free distribution, and posited the preferential attachment as a model of the growth of the Internet and the WWW network.  The preferential attachment model had actually been studied over about 75 years by Yule, Luria and Delbruck, and Simon, but the rediscovery by Albert and Barabasi, and the possible connection to modern networks generated a huge amount of excitement and a large number of follow-up papers.  In particular, a slew of follow-up papers specifically studied properties of preferential attachment (PA) networks, i.e. networks generated via the preferential attachment model of growth. It was shown that PA networks are very robust to random deletions, but very fragile to adversarial deletions.  One commonly accepted conclusion was that the Internet, like PA networks, had high degree nodes, that were very centrally located, and whose removal would easily disconnect the network.  This idea was actually celebrated as a nice implication from theoretical network models to the real world.

So what’s the problem?  Basically the research community made two huge mistakes; mistakes that some people in the community still have not recovered from (in the sense that “recover from” means  “become aware of”)!

Mistake 1: Empirical studies do not support the claim that the Internet is scale-free

Willinger et al. go through several problems about why traceroutes do not create accurate maps of the Internet.  There are many technical details here that are hard for a theoretician like me to understand.  However, one detail that caught my eye was the fact that when a traceroute encounters an “opaque layer-2” cloud, “it falsely ‘discovers’ a high-degree node that is really a logical entity  … rather than a physical node”.  This causes traceroutes to report high-degree nodes in the core of the router-level Internet, even when such nodes don’t exist.  This type of bias is claimed to be even worse than the well-known “over-sampling of high-degree nodes” bias that traceroutes also have.

Mistake 2: Scale-free networks are not PA networks

PA Networks are scale-free, but the converse is not true.  In particular, there are scale-free networks that are robust to adversarial deletions.  Intuitively, it’s possible to have a scale-free network where the high degree nodes are not at the “core” of the network.  More generally, it’s possible to have a scale free network with good expansion properties.  Such networks are robust to adversarial deletions.  In fact, Wilinger et al., suggest that the Internet is more likely to have the high degree nodes at the “periphery” of the network since each router can only handle a certain amount of traffic, and more edges are only possible if each edge is handling less traffic.

What Next?

Here the Willinger paper looses steam.  They suggest a first principles approach to Internet modeling, where the network formed is one that tries to optimize a given algorithmic problem over the network.  This is great.  What is not so great is the model they propose, which assumes that the network creation problem is solved in a completely centralized manner.  Much better would be if they had used game theory as a starting point.  After all, the Internet is inherently a massive, distributed enterprise. There is actually some really cool work done in algorithmic game theory on network creation games.  For example, see Chapter 19 of The Book, or this paper by Fabrikant et al. (shout out to Elitza Maneva in the et al. for this paper, who was visiting UPC while I was there on sabbatical).  Yes, it’s true that most of these game theory result have shown that network creation games have small price of anarchy.  However, that does not imply that the network topology you get in a Nash equilibria will be like the topology you get in the socially optimum solution.  I’d like to see some results on the types of topologies one might expect in an Nash equilibria for network creation games.