We’re hiring for one tenure track Assistant Professor position at the University of New Mexico. From the ad: “We seek applicants across computer science, including but not limited to large scale data systems, networks, software design and engineering, graphics and visualization, and AI and adaptive systems.” More details are available here.
Filed under: Uncategorized | Tags: algorithms, data streams, game theory, greed, local, mapreduce, theory
We just recently finished talking about greedy algorithms in my graduate algorithms class. In the past few months, I’ve developed a new appreciation for greed. Part of this is due to a discussion with Alan Borodin at a recent workshop, who made a good case for greediness as a stand-in for simplicity. Anyone working in algorithms will agree that simple algorithms are better; even more so in the area of distributed computing, where even simple algorithms can be notoriously difficult to implement. Unfortunately, it’s hard to improve what you can’t measure, and so there is a real need for good measures of algorithmic simplicity. This makes me think of several important techniques in algorithm design and how these might intersect with the notion of simplicity.
- Local algorithms: In distributed computing, local algorithms are those that run in time independent of the network size – i.e. each node in the network only receives information from its local neighborhood. Clearly, these types of algorithms are more simple than algorithms that require information to be exchanged over long distances in the network. Approximate edge coloring of graphs and some resource allocation problems are amenable to this approach. See the two great papers: What can be computed locally? by Naor and Stockmeyer and What cannot be computed locally by Kuhn, Moscibroda and Wattenhoffer for more on this area.
- Data streams: Algorithms that use little space are likely to be simple. Data stream algorithms use very little space (or more specifically the amount of space used is very small compared to the input size). While the analysis for many of these algorithms can be quite sophisticated, the algorithms themselves can usually be described in less than a quarter of a page of pseudo-code. Muthukrishnan’s book is a great resource in this area (plus it is freely available here). Recently, there has been a lot of interest in applying the data stream model in a distributed setting. See for example the Massive, Unordered Data (MUD) model, which tries to capture the approach of systems like Mapreduce.
- Greedy algorithms : Many of us are familiar with several algorithms of this type. Perhaps Kruskall’s algorithm is the most frequently used greedy algorithm that is always correct. Of course, there are many, many heuristics that are greedy that are not guaranteed to be correct. I wonder what are the most frequently used greedy approximation algorithms. Nothing comes to mind except for the greedy set cover algorithm or greedy approximation algorithms for maximizing submodular functions.
- “Selfish” algorithms: This is a phrase I just made up to describe distributed algorithms that require each node to behave selfishly. In particular, these are problems for which 1) the social welfare in any Nash equilibria is close to the optimal social welfare, and 2) the players can quickly converge to a Nash by following a locally optimal strategy. There is a lot of interest in these types of algorithms in the distributed computing community right now, partially, I think, because of the great fact that they are simple to describe and implement. Selfish algorithms are also simple in the sense that they may arise naturally when large groups of agents get together; in some sense, the algorithmic engineer is (at least partially) removed from the picture.
Are there other ways of formulating problems that ensure “simple” algorithms? Perhaps you, dear reader, can help me flesh out this list.
When I first started as a professor, I felt that what students needed to do well in a class, above all else, was precise, error-free lectures, and lot’s of practice doing problems. Based on this ideas, I did most of my lectures from slides that I had very carefully prepared before hand, and gave the class lots and lots of problems to solve. This actually worked pretty well for a while – I got great feedback from students, was nominated for some teaching awards, and enjoyed the experience.
However, eventually I realized that my lectures were not as exciting as they could be. I read some articles on Tomorrows Professor listserv, that bemoaned the use of pre-prepared slides. They claimed that extensive use of slides makes lectures completely predictable and unresponsive. This semester, as an experiment, I stopped using slides, or even lecture notes of any kind. I memorize enough of the lecture so that I can be sure to use terminology that is reasonably close to the textbook and to the notes I’ve posted from previous years of the class.
How have things changed? I find teaching a more exciting and sometimes scary activity. Also, it’s easier for me to respond to feedback from the class as a whole about e.g. whether it’s worthwhile to go over an example of a particular concept. When a student asks an interesting question, I definitely spend more time than I used to on exploring the answer in depth. For me, at least, the lectures are a lot more fun and challenging. From the feedback I’m getting from students so far, they also seem to be more engaged. Do I make more mistakes in lecture? Definitely. However, I feel that I’m slowly getting better at checking things on the fly and am making fewer mistakes now than when I started this.
How has this changed my teaching philosophy? I realize now that a big part of education is inspiration, motivation, and a kind of educational “entertainment”. I also realize that it’s just much more engrossing and memorable to watch someone walk carefully across a rope in the air, than to trudge across a line on the floor that was drawn before the performance even began!
Filed under: Uncategorized | Tags: Akamai, Bittorrent, churn, cloud computing, IPDPS, peer-to-peer, secret sharing, Skype, Sybil attack, theory, Vanish
Maxwell Young and I were talking the other day about the ups and downs of research in peer-to-peer systems. In 2002, when I got my PhD, I and everyone I knew at UW, felt that the popularity of p2p research was at a crescendo, and would quickly taper off in a year or two. However, this morning, seven years later, I’m reviewing IPDPS submissions, and I see that about 20% of the papers are on p2p (or their close relative overlay networks). What’s going on?
Partially, I think the interest in academic circles comes from the fact that p2p research allows us to study what happens when there is no leader. There are many challenging and fun problems in designing distributed systems that work when all components of the system are equal; that is, equal in terms of resources available, and also equal in the sense that all members of the system both use the system, and contribute to the system. Maybe p2p thinking suits the egalitarian bent of academics? Maybe it comes from a desire to imitate natural systems like ants and bees?
However, a perennial question is: are there legitimate uses of p2p systems? Isn’t the trend currently in the opposite direction, with cloud computing promising that someday networks will consist of mindless clients on one side, and computationally powerful servers on the other. In such a situation, will there be much need for direct communication between the clients? It’s hard for me, at least, to predict where these trends will eventually play out. However, I would not be surprised if both the p2p extreme and the weak client, powerful server extreme continue to exist side-by-side for a long time to come.
I did want to try to list some potential legitimate uses of p2p that I have heard about recently below. I’d love to hear about others, or arguments for or against the continued existence of p2p. Here’s are some of the big system (or ideas for systems) that I know about now.
- Vanish: A system that prevents archiving of sensitive data. In other words, Vanish attempts to enable that data like private email exchanges, photos, or messages can be given a deadline after which they simply can no longer be reconstructed. To do this, Vanish breaks up content into pieces using Shamir secret sharing, distributes these pieces across a p2p network, and depends on sufficiently active churn in the peer-to-peer network to ensure that eventually enough of these pieces will leave the network, and so the original message will be lost forever. Vanish got a nice writeup in the New York Times in July, but the original system has been shown to be vulnerable to a certain type of Sybil attack in this paper.
- Akamai is a company with a billion dollar market cap that enables Internet content and application delivery. As I understand it, the “peers” in the Akamai system are actually companies; and the Akamai network ensures robust and efficient delivery of content from these “peers” to end users. This paper is what enabled Akamai to get its first round of VC funding, I hear. But, I’m not sure if the algorithms now used still have any connection to the paper.
- Skype is a peer-to-peer system for voice calls over the Internet that I used a lot when I was on sabbatical in Europe. In my experience, the voice quality and reliability of google chat was much better than Skype, but somehow it was much easier for us to get friends and family to use skype than google chat. I still use Skype nowadays for research conference calls.
- Bittorrent is a p2p system allowing for quick collaborative downloading of large data items. Estimates are that it consumes about one quarter to one half of all traffic on the Internet. Don’t know how much of this traffic is “legitimate”, but at least some portion of bittorrent bandwidth has been used by publishers for distribution of free music, TV shows, and software. Vuze is a bittorrent client with over 1 million users: clearly a well-used network, and perhaps the largest overlay network on the Internet.