December 3, 2009, 10:30 pm
Filed under: Uncategorized | Tags: , , ,

Interesting paper here on the way in which BitTorrent uses the tit-for-tat algorithm for file sharing.  Outside of the huge amount of work on auctions, this is really the only real-world applications of game theory in computer science that I am aware of.  Are there others?

The Surprising Persistence of Peer-to-Peer

Maxwell Young and I were talking the other day about the ups and downs of research in peer-to-peer systems. In 2002, when I got my PhD, I and everyone I knew at UW, felt that the popularity of p2p research was at a crescendo, and would quickly taper off in a year or two.  However, this morning, seven years later, I’m  reviewing IPDPS submissions, and I see that about 20% of the papers are on p2p (or their close relative overlay networks).  What’s going on?

Partially, I think the interest in academic circles comes from the fact that p2p research allows us to study what happens when there is no leader.  There are many challenging and fun problems in designing distributed systems that work when all components of the system are equal; that is, equal in terms of resources available, and also equal in the sense that all members of the system both use the system, and contribute to the system.   Maybe p2p thinking suits the egalitarian bent of academics?   Maybe it comes from a desire to imitate natural systems like ants and bees?

However, a perennial question is: are there legitimate uses of p2p systems?  Isn’t the trend currently in the opposite direction, with cloud computing promising that someday networks will consist of mindless clients on one side, and computationally powerful servers on the other.  In such a situation, will there be much need for direct communication between the clients?  It’s  hard for me, at least, to predict where these trends will eventually play out.  However, I would not be surprised if both the p2p extreme and the weak client, powerful server extreme continue to exist side-by-side for a long time to come.

I did want to try to list some potential legitimate uses of p2p that I have heard about recently below. I’d love to hear about others, or arguments for or against the continued existence of p2p.  Here’s are some of the big system (or ideas for systems) that I know about now.

  • Vanish: A system that prevents archiving of sensitive data.  In other words, Vanish attempts to enable that data like private email exchanges, photos, or messages can be given a deadline after which they simply can no longer be reconstructed.  To do this, Vanish breaks up content into pieces using Shamir secret sharing, distributes these pieces across a p2p network, and depends on sufficiently active churn in the peer-to-peer network to ensure that eventually enough of these pieces will leave the network, and so the original message will be lost forever.  Vanish got a nice writeup in the New York Times in July, but the original system has been shown to be vulnerable to a certain type of Sybil attack in this paper.
  • Akamai is a company with a billion dollar market cap that enables Internet content and application delivery.  As I understand it, the “peers” in the Akamai system are actually companies; and the Akamai network ensures robust and efficient delivery of content from these “peers” to end users.  This paper is what enabled Akamai to get its first round of VC funding, I hear.  But, I’m not sure if the algorithms now used still have any connection to the paper.
  • Skype is a peer-to-peer system for voice calls over the Internet that I used a lot when I was on sabbatical in Europe.  In my experience, the voice quality and reliability of google chat was much better than Skype, but somehow it was much easier for us to get friends and family to use skype than google chat.  I still use Skype nowadays for research conference calls.
  • Bittorrent is a p2p system allowing for quick collaborative downloading of large data items.  Estimates are that it consumes about one quarter to one half of all traffic on the Internet.  Don’t know how much of this traffic is “legitimate”, but at least some portion of bittorrent bandwidth has been used by publishers for distribution of free music, TV shows, and software.  Vuze is a bittorrent client with over 1 million users: clearly a well-used network, and perhaps the largest overlay network on the Internet.

Random Sampling
July 17, 2009, 5:10 pm
Filed under: Uncategorized | Tags:

A fundamental statistical operation over a network is to sample a node uniformly at random. For large networks like the internet or the Facebook social network, this is the only principled way to gain information about properties of the network like: node degree distributions, fraction of nodes with a given property, clustering coefficient, etc.

For the most part, all techniques for sampling a node uniformly at random are currently just heuristics. Maybe you take a random walk around the graph for a while and then throw in some tricks to try to correct for bias in the stationary distribution. However, I think there is the possibility of a great and important theory problem here. Increasingly, many interesting networks are huge and not completely known. Randomly sampling nodes is a great way to measure properties of such networks, but bias in the sampling can lead to all kinds of problems [1]. Here’s a stab at a problem: Given an unknown graph and an arbitrary node (or two or more) in that graph, devise an algorithm to sample (close to) uniformly from the set of all nodes, when the only operation available is to move along edges of the graph. Of course, the graph would have to have “sufficiently nice” properties to do this, e.g. ergodic, good expansion, good degree distribution, etc. Coupling theory would be a good candidate tool to use for this problem.

[1] For example, see this paper for an example of how bias due to traceroutes can lead to erroneous prediction of power-law degree distribution where it does not actually exist.

P.S. A few years ago, Valerie King and I wrote a paper on choosing a random node in a peer-to-peer network. This only worked for highly structured networks and so did not solve the problem described above. However, we came up with the following algorithmic tool that might be useful for the harder problem. Imagine we can broadcast a message from some node to all other nodes in the network once for free and we want to minimize the number of nodes that need to reply to this broadcast. What is the minimum number of responses that need to be sent in order to select a node uniformly at random?

A naive approach is for every node to choose a number uniformly at random between 0 and 1; have the broadcaster send out a random number between 0 and 1; have nodes respond that have random numbers closest to that of the broadcaster (mod 1); and choose the single responder that is closest. Surprisingly, this scheme has significant bias. Think about it this way.  If you distribute n points uniformly at random on a circle with circumference 1, what would you expect the minimum distance to be between any pair of points.  Think it would be about 1/n?  Then, you’d be wrong!  It’s actually much less: only 1/n^2.  So how then can you remove this bias? Check out the paper for the answer!