Machinations


Homework and the Internet
January 29, 2010, 8:56 pm
Filed under: Uncategorized | Tags: ,

This is what I have been reduced to:

“Let P (x), Q(x), R(x), S (x, y) be the predicates, “x is a true dungeon master”, “x has sex appeal”, “x is a wood-elf ”, “x is a friend to y”. Translate the following statements into predicate logic.

• A true dungeon master is a friend to all wood-elves
• Only true dungeon masters have sex appeal
• Bob is not a friend to some wood-elf

Prove from the statements that Bob does not have sex appeal.”

I’m not sure when it happened, but for at least several years it has been the case that solutions for the problems in any good textbook can be found somewhere on the Internet.  When I first started teaching, my homework sets were usually a combination of easier problems from the text book and harder problems that I made up.  These days, even for the easy problems, I can’t really assign anything from a text book.

Definitely the ubiquity of solutions to written hw problems has its downsides.  It has made my job a little bit harder, but this hasn’t been a big problem really.  The major downside is that there some very good problems that students are missing out on.  For example, I recently covered the proof that square root of 2 is irrational in my mathematical foundations of computer science class.  A great hw problem to ask students after they see this proof is to prove that square root of 3 is irrational.  However, I really don’t feel it is fair to assign this hw problem in a large class where certainly at least one student will look up the solution on the Internet.

I’m curious how other lecturers are dealing with this issue, and how students feel about it.  Should we be trying harder to hide the solutions to good hw problems?  Should we plant “fake” solutions on the web?  What happens when you google “dungeon master” and “sex appeal”? (P.S.  Hello to my CS261 students!)




Day 2 at SODA
January 19, 2010, 10:54 pm
Filed under: Uncategorized | Tags: , ,

Editor: Guest Post again by Amitabh Trehan

I am in the business meeting: surprisingly, it’s a lot of fun, maybe due to the free drinks and the humorous presentations by the contenders for SODA 2012,  freed, it seems, from the betas, thetas and hidden constants! Ok then, after multi-round voting by a show of hands and a play of non-musical chairs on a large scale (to help get the count correct), Kyoto has narrowly triumphed in it’s bid for 2012 over Barcelona: the other contenders (in order of votes received) were Prague, Rome and New York. There were some wild celebrations and tennis style fist-pumping!

There were many exciting talks today ranging from algorithmic game theory to distribued computing to graph algorithms and some indescribable topics like those at the business meeting!

Noam Nisan gave a very interesting talk on Google’s auction for tv ads.  I think I’ve been seeing the mention of this article on the blogosphere somewhere but avoided reading it (maybe I thought it was a google advert :) . First, it was  indeed a surprise that Google markets television ads, but as it turns out, they have good reason to do so. Google brings a new system of marketing advertisement time that is different from the traditional office room deals, and which brings nice algorithmic challenges too. For the adverstisor, Google brings a web-based interface, precise targeting of content, and excellent feedback. The auctions are run everyday (for the next days advertisements, I think). Informally, the auction has slots for sale, bidders bid with their value for the particular slots and total budget, and the auction outputs the set of slots won by each bidder, and the price for which that slot is given.   The problem model is fairly straightforward with some simple constraints. The algorithm tries to optimize revenue, efficiency, and fairness.

The  auction is based on simultaneous ascending auction, and basically is as follows:  each ad-slot has an associated price that keeps increasing throughout the auction. Prices start at low reserve prices, and rise whenever there is “over-demand” for an ad-slot — i.e. a slot that is currently held
by one bidder is desired by another. Such small price increases keep going on until there is no
“over-demand”, at which point the auction closes. The basic step in the auction is the calculation
of the “demand” of a bidder at current prices — i.e. which set of slots would this bidder desire to
acquire assuming that slots are priced as given. In its basic form, the calculation of this demand is
done by a greedy algorithm that chooses slots according to decreasing bid-to-price ratio. Under certain theoretical assumptions (“gross substitutes”) it is known that this procedure ends with a “Walrasian market equilibrium” and under even stricter assumptions these prices are incentive compatible — i.e. they give no bidder any strategic reason to under or over bid.  In reality, some slots may go unsold though, and they are put through the wringer again till google is sure nobody wants them. At that point google puts up public service ads on them such as those asking people to desist from taking drugs. As Noam said, it’s not the case that google doesn’t want you to take drugs, it’s just that they have empty slots :) ; So, a betting hunch I have from this talk is that if you see too many public service ads coming on the television, maybe the slots are going cheap and it’s time to buy them!!!

I would like to briefly mention a few more talks here: Monotonicity in Bargaining Networks by Yossi Azar et al. talked about bargaining networks.  In these networks, the nodes are players, and each edge has a certain amount of money on it that can be split among the endpoints of that edge only if the endpoints reach an agreement on how to split the dollar.  However, each node is allowed to split the money with at most one neighbor.  There are many real world situations where such networks exist e.g. among buyers and sellers where the edge weight can be considered as the difference in value of the object being sold, and in dating or marriage, where the edge weight would relate to the benefit of a relationship. A stable solution is related to the fractional min cover and thereby, maximum matching.  The authors show that for several solution concepts (i.e. stable outcomes) for this game are monotonic on bipartite graphs, where monotonic means that adding an edge between nodes x and y can only increase the profit of x and y on the bargaining network.Interesting questions remain: What are the  mechanisms of bargaining? Are there simple distributed algorithms for these solutions? lower bounds?

The evening talks on A model of computation for map reduce, and Deterministic algorithms for the Lovasz local lemma drew much interest from the audience. I also enjoyed the talks Highwway dimension, shortest paths and provable efficient algos, and max flows in planar graphs.



SODA at Austin has a good fizz.
January 18, 2010, 1:17 pm
Filed under: Uncategorized | Tags: , ,
Austn downtown at night

Room with a view: Austin night © Amitabh Trehan

Editor’s note: This is another guest post from  Amitabh Trehan.

Hello from SODA at Austin. First, a few questions as warmup exercise (It’s a theory conference!):
- If I blog about the conference (my second conference reporting), should I not get a Press card and full financial support from the organizers? :)
- If the proceedings are on disk only, should there be more (or less) electric points in the seminar rooms?
- A European friend asked me ‘Is continental breakfast supposed to be just a bowl of fruits and coffee? Which continent are they referring to’?
- How did Austin land up in the middle of Texas?

So, on to some reports now (before my editor cuts my writing for being mathematically imprecise!). Kudos to Austin: on Saturday, I got to watch a wonderful performance by the Austin symphonic Orchestra and violinist Nadja Salerno-Sonnenberg sitting in the very first row on a $5 student ticket, and today live blues at a downtown bar. The conference itself is big: there are 3 parallel sessions throughout the day from 9 am to almost 7 pm besides a plenary session and best paper presentation. Of course, I could only attend at most (n-2)/3 +  2 talks, where n was the total number of talks.

The best paper award has been given to Asadpour, Goemans, Madry, Gharan and Saberi’s paper An O(log n/ log log n)-approximation Algorithm for the Asymmetric Traveling Salesman Problem . This  paper breaks the 28 year old Θ(log n) bound set by Frieze et al for the classic Asymmetric Travelling Salesman Problem (ATSP).  To remind the reader, ATSP is the asymmetric version of the Traveling Salesman Problem (TSP) which is the problem of finding the minimum cost tour visiting each vertex at least once given a set V of n vertices, and a cost function c: V X V -> R+, and where for ATSP c(u,v) need not be equal to c(v,u). The technique they have used is similiar to the algorithm for the symmetric version by Christofides which gives a factor 3/2 approximation. At a very high level, this is to  construct a special spanning tree, find  minimum cost Eulerian augmentation of this tree, and shortcut the resulting Eulerian walk to get the result. The spanning tree is special in the sense that it is a thin tree. The algorithm uses Held-Karp Linear Programming (LP) relaxation and a thin tree is roughly defined with respect to the optimum solution x* of the relaxation as a spanning tree that, for every cut, contains a small multiple of the corresponding values of x* in the cut. Shayan Oveis Gharan. presenting the paper, highlighted as important the ideas of ‘thinness’ and that of maximum entropy: explicitly fix the desired combinatorial structure, and then maximize the randomness. Overall, a very nice paper.

The other talk I will briefly discuss is the plenary session talk by Cynthia Dwork on ‘Differential Privacy’. She called it ‘postmodern privacy data analysis’. From what little I know (which is not much) of postmodernism, this seems an apt analogy: the author is supposed to be much less important (or almost anonymous) as compared to the work of the author. The aim of differential privacy is not only important but urgent in our time: How to publicly release statistical information about a set of people without compromising the privacy of any individual. Identity thefts and the  ease with which one can obtain personal information is now getting alarming. After giving a brief history of the problem and explaining the concepts of adjacent databases, epsilon-differential privacy, linkage attacks etc, she presented the new extensions of their work using the H1N1 epidemic as an example: Continual observation and Pan-privacy.  Continual observation is the question of how one can analyze aggregate user information to monitor, say, regional health conditions, while maintaining differential privacy. Pan-privacy maintains the privacy of the database “inside and out” completely hiding the pattern or appearance; protecting against both announced (legal action, subpoenas etc) and unannounced intrusions. The pan-privacy density estimator, as it is called, is based on an idea from the social sciences called randomized response given by Warner in 1965. The idea was to do a coin toss while recording the response of a respondent in a survey and record the response according to the result of the toss and possible truthfulness of the respondent’s statement. As Cynthia showed in her last slide, there are many animals now in the differential privacy zoo!

Besides these, I attended some interesting talks in the day ranging from property testing to graph theory (minors etc) and algorithms about clique-width and kernels. There was also, as one participant called ‘the Kawarabayashi’ session which had  four papers by Ken-ichi Kawarabayashi, one less than last year (as far as I know).



Another Application of Game Theory
January 7, 2010, 5:43 pm
Filed under: Uncategorized | Tags: , , ,

Well it took me a trip out to Rome to find out about it, but David Kempe (at WINE) told me about another application of game theory developed in CS that is actually used in the real world.  Milind Tambe at USC has created a system that is currently used for scheduling security activities at the LA airport (and soon Pittsburgh airport) based on game theoretic concepts .  You can read about it here and in more detail in the research papers on Milind’s web page.

The basic idea seems like the following.  They analyze the problem as a Stackleberg game where the airport decides first on a probability distribution over which security activities will be scheduled that day (e.g. at which gates complete searches of luggage will occur, where and when drug sniffing dogs will be deployed, etc.) and then the “bad guys” get to decide on their actions e.g. which areas of the airport they will try to smuggle contraband through.  The goal of the airport security is of course to maximize the probability of catching illegal activity minimize the expected damage of illegal activity.  To do this, there needs to be some unpredictability on the part of the airport since the bad guys get to see the history of security activities.

This is really very cool research – David tells me that after implementing the system, LAX has been able to detect more illegal activity.



Tit-for-tat
December 3, 2009, 10:30 pm
Filed under: Uncategorized | Tags: , , ,

Interesting paper here on the way in which BitTorrent uses the tit-for-tat algorithm for file sharing.  Outside of the huge amount of work on auctions, this is really the only real-world applications of game theory in computer science that I am aware of.  Are there others?



Grant Writing
November 19, 2009, 7:10 pm
Filed under: Uncategorized | Tags: , , ,

I haven’t been posting in a while because I’ve been busy with preparing NSF grant proposals (which have shifted their deadlines this year from spring to December).  In the interest of getting out a post (and procrastinating), I’m going to write today about my great passion as a researcher: grant writing.  Seriously, I do sometimes enjoy the opportunity to plan out a research agenda for the next 3-5 years, and always the proposals I write do inform the research I will do in the future.  Also, so far I’ve also been really lucky with  grants, with a hit rate of about 60% (we’ll see how it goes this year now that we have a toddler around the house).  When your grants are funded, the grant writing process is much more fun!

Here are some of the tricks that I’ve learned for good grants.  Mostly these are from my experiences with NSF grants but they have been useful for other agencies.  I’d love to hear about some other ideas in the comments section.

The Introduction:

  • Nail Motivation: Start with the problem you are trying to solve.  Show that solving this problem has big implications, hopefully outside your own narrow area of work; and that your research will take significant steps towards solving the problem
  • Nail Novelty: What is it about your approach that will cause you to succeed where others have failed?  What unique ideas/tools/techniques do you bring to the problem?
  • Expose a gap: As early as possible in the proposal, describe a gap in current research
  • Don’t repeat: Don’t repeat text from the grant summary in the intro of your grant!  This risks starting a bad precedent of the reader skipping over parts of your grant.
  • Pull in the reader. Write a first sentence for both the summary and intro that is controversial, asks an interesting question, evokes strong emotions (hopefully not disgust :) , etc.

The body:

  • Less is More: Don’t tack on ideas or problems that you don’t really understand.  The reviewers will uncover your ignorance and your grant will be skewered.  The main constraint on conveying ideas is not the page length, but rather  reviewers time.  Expect a reviewer to spend about 30 minutes on your grant.  You can really only get across a couple of clear ideas in this time.  I know this from experience – whole sections of my grants have been misunderstood by reviewers in the past because I tried to cram too much in.  Make it clear what the grant will not cover as well as what it will cover.
  • Pictures are crucial: Spend as much time putting in 2-3 useful figures as you would writing a section of the grant
  • Related Work is crucial: You’re proposal should be like an exciting conversation with the research community.  Thread related work throughout your proposal – be generous with praise for great ideas, and make it clear how your efforts fit into the broader scheme.  If you make the related work look exciting, your own research will look more exciting.  Don’t miss anything that is remotely relevant to your work.  A common way to have a grant shot down is a review like: “This proposal ignores results  from the work of researcher X”.
  • Technical depth: a portion of your proposal (maybe a third?) can and should be addressed to the experts.  This is the place to include proofs, equations and enough technical details to convince the experts you have a good idea.
  • Try to give something to the reader:  teach the reader something new, include an interesting story or analogy, describe a cool problem.  The reader will appreciate anything that breaks the monotony of reading grant proposals.

Some ideas about the process:

  • Give a talk: I have found it useful to give a talk on preliminary results and future ideas as part of the proposal writing process.  It works well to do this for both an expert audience and a non-expert audience.  See what questions and ideas you get and use these to improve the proposal.
  • Become your own worse enemy:  you absolutely need to anticipate the 2 or 3 most devastating criticisms of your grant and then address those in the proposal
  • Get feedback: Get an expert and a non-expert to spend half an hour with the grant and give quick feedback,
  • Be intellectually mature: Don’t pretend you’ll solve every problem.  Give credit where it is due.  Be realistic about  the research time line and the limitations of your technique.
  • Try to write an hour a day on the grant to keep up momentum.
  • Develop a style: I like to use a sort of dialect style where I describe problems and then solutions, or questions and then answers.  This usually determines the broad outline of my proposal.  However, there are many different writing styles and you should figure out what works best for you.


We’re hiring
October 29, 2009, 9:58 pm
Filed under: Uncategorized | Tags: ,
Obligatory Albuquerque Photo

Obligatory Albuquerque Photo: Light Snow Today!

We’re hiring for one tenure track Assistant Professor position at the University of New Mexico.  From the ad: “We seek applicants across computer science, including but not limited to large scale data systems, networks, software design and engineering, graphics and visualization, and AI and adaptive systems.”  More details are available here.



Greed and Other Virtues
October 29, 2009, 9:28 pm
Filed under: Uncategorized | Tags: , , , , , ,

We just recently finished talking about greedy algorithms in my graduate algorithms class.  In the past few months, I’ve developed a new appreciation for greed.  Part of this is due to a discussion with Alan Borodin at a recent workshop, who made a good case for greediness as a stand-in for simplicity.  Anyone working in algorithms will agree that simple algorithms are better; even more so in the area of distributed computing, where even simple algorithms can be notoriously difficult to implement.  Unfortunately, it’s hard to improve what you can’t measure, and so there is a real need for good measures of algorithmic simplicity.  This makes me think of several important techniques in algorithm design and how these might intersect with the notion of simplicity.

  • Local algorithms: In distributed computing, local algorithms are those that run in time independent of the network size – i.e. each node in the network only receives information from its local neighborhood.  Clearly, these types of algorithms are more simple than algorithms that require information to be exchanged over long distances in the network.  Approximate edge coloring of graphs and some resource allocation problems are amenable to this approach.  See the two great papers: What can be computed locally? by Naor and Stockmeyer and What cannot be computed locally by Kuhn, Moscibroda and Wattenhoffer for more on this area.
  • Data streams: Algorithms that use little space are likely to be simple.  Data stream algorithms use very little space (or more specifically the amount of space used is very small compared to the input size).  While the analysis for many of these algorithms can be quite sophisticated, the algorithms themselves can usually be described in less than a quarter of a page of pseudo-code.  Muthukrishnan’s book is a great resource in this area (plus it is freely available here).  Recently, there has been a lot of interest in applying the data stream model in a distributed setting.  See for example the Massive, Unordered Data (MUD) model, which tries to capture the approach of systems like Mapreduce.
  • Greedy algorithms : Many of us are familiar with several algorithms of this type.  Perhaps Kruskall’s algorithm is the most frequently used greedy algorithm that is always correct.  Of course, there are many, many heuristics that are greedy that are not guaranteed to be correct.  I wonder what are the most frequently used greedy approximation algorithms.  Nothing comes to mind except for the greedy set cover algorithm or greedy approximation algorithms for maximizing submodular functions.
  • “Selfish” algorithms: This is a phrase I just made up to describe distributed algorithms that require each node to behave selfishly.  In particular, these are problems for which 1) the social welfare in any Nash equilibria is close to the optimal social welfare, and 2) the players can quickly converge to a Nash by following a locally optimal strategy.  There is a lot of interest in these types of algorithms in the distributed computing community right now, partially, I think, because of the great fact that they are simple to describe and implement.  Selfish algorithms are also simple in the sense that they may arise naturally when large groups of agents get together; in some sense, the algorithmic engineer is (at least partially) removed from the picture.

Are there other ways of formulating  problems that ensure “simple” algorithms?  Perhaps you, dear reader, can help me flesh out this list.



Teaching without a Net
October 22, 2009, 5:15 pm
Filed under: Uncategorized | Tags: , ,

When I first started as a professor, I felt that what students needed to do well in a class, above all else, was precise, error-free lectures, and lot’s of practice doing problems.  Based on this ideas, I did most of my lectures from slides that I had very carefully prepared before hand, and gave the class lots and lots of problems to solve.  This actually worked pretty well for a while – I got great feedback from students, was nominated for some teaching awards, and enjoyed the experience.

However, eventually I realized that my lectures were not as exciting as they could be.  I read some articles on Tomorrows Professor listserv, that bemoaned the use of pre-prepared slides.  They claimed that extensive use of slides makes lectures completely predictable and unresponsive.  This semester, as an experiment, I stopped using slides, or even lecture notes of any kind.  I memorize enough of the lecture so that I can be sure to use terminology that is reasonably close to the textbook and to the notes I’ve posted from previous years of the class.

How have things changed?  I find teaching a more exciting and sometimes scary activity.  Also, it’s easier for me to respond to feedback from the class as a whole about e.g. whether it’s worthwhile to go over an example of a particular concept.  When a student asks an interesting question, I definitely spend more time than I used to on exploring the answer in depth.   For me, at least,  the lectures are a lot more fun and challenging.  From the feedback I’m getting from students so far, they also seem to be more engaged.  Do I make more mistakes in lecture?  Definitely.  However, I feel that I’m slowly getting better at checking things on the fly and am making fewer mistakes now than when I started this.

How has this changed my teaching philosophy?  I realize now that a big part of education is inspiration, motivation, and a kind of educational “entertainment”.  I also realize that it’s just much more engrossing and memorable to watch someone walk carefully across a rope in the air, than to trudge across a line on the floor that was drawn before the performance even began!



The Surprising Persistence of Peer-to-Peer

Maxwell Young and I were talking the other day about the ups and downs of research in peer-to-peer systems. In 2002, when I got my PhD, I and everyone I knew at UW, felt that the popularity of p2p research was at a crescendo, and would quickly taper off in a year or two.  However, this morning, seven years later, I’m  reviewing IPDPS submissions, and I see that about 20% of the papers are on p2p (or their close relative overlay networks).  What’s going on?

Partially, I think the interest in academic circles comes from the fact that p2p research allows us to study what happens when there is no leader.  There are many challenging and fun problems in designing distributed systems that work when all components of the system are equal; that is, equal in terms of resources available, and also equal in the sense that all members of the system both use the system, and contribute to the system.   Maybe p2p thinking suits the egalitarian bent of academics?   Maybe it comes from a desire to imitate natural systems like ants and bees?

However, a perennial question is: are there legitimate uses of p2p systems?  Isn’t the trend currently in the opposite direction, with cloud computing promising that someday networks will consist of mindless clients on one side, and computationally powerful servers on the other.  In such a situation, will there be much need for direct communication between the clients?  It’s  hard for me, at least, to predict where these trends will eventually play out.  However, I would not be surprised if both the p2p extreme and the weak client, powerful server extreme continue to exist side-by-side for a long time to come.

I did want to try to list some potential legitimate uses of p2p that I have heard about recently below. I’d love to hear about others, or arguments for or against the continued existence of p2p.  Here’s are some of the big system (or ideas for systems) that I know about now.

  • Vanish: A system that prevents archiving of sensitive data.  In other words, Vanish attempts to enable that data like private email exchanges, photos, or messages can be given a deadline after which they simply can no longer be reconstructed.  To do this, Vanish breaks up content into pieces using Shamir secret sharing, distributes these pieces across a p2p network, and depends on sufficiently active churn in the peer-to-peer network to ensure that eventually enough of these pieces will leave the network, and so the original message will be lost forever.  Vanish got a nice writeup in the New York Times in July, but the original system has been shown to be vulnerable to a certain type of Sybil attack in this paper.
  • Akamai is a company with a billion dollar market cap that enables Internet content and application delivery.  As I understand it, the “peers” in the Akamai system are actually companies; and the Akamai network ensures robust and efficient delivery of content from these “peers” to end users.  This paper is what enabled Akamai to get its first round of VC funding, I hear.  But, I’m not sure if the algorithms now used still have any connection to the paper.
  • Skype is a peer-to-peer system for voice calls over the Internet that I used a lot when I was on sabbatical in Europe.  In my experience, the voice quality and reliability of google chat was much better than Skype, but somehow it was much easier for us to get friends and family to use skype than google chat.  I still use Skype nowadays for research conference calls.
  • Bittorrent is a p2p system allowing for quick collaborative downloading of large data items.  Estimates are that it consumes about one quarter to one half of all traffic on the Internet.  Don’t know how much of this traffic is “legitimate”, but at least some portion of bittorrent bandwidth has been used by publishers for distribution of free music, TV shows, and software.  Vuze is a bittorrent client with over 1 million users: clearly a well-used network, and perhaps the largest overlay network on the Internet.