Machinations


The Clever Algorithm and a New Blog
February 11, 2016, 8:45 pm
Filed under: Uncategorized | Tags: , ,

My student Abhinav Aggarwal has just written the inaugural post on his new blog (the name of which, I hope, is not a comment on the soporific qualities of my advisement 🙂  His first post is on the Clever algorithm by Jon Kleinberg.   This is inarguably, the paper that started the modern, spectral-based approach to web search that Google has built on so successfully.

Some questions worth pondering while reading through Abhinav’s summary (and hopefully also the seminal paper itself): 1) Why did PageRank “beat out” the Clever algorithm in real-world web search? 2) Are there domains besides web search where one might use a “top left and right eigenvector” approach like the Clever algorithm? 3) What about the “soft” results in the paper like the use of the second eigenvector for clustering?   Can these be formalized in an interesting way?



Simons institute boot camp on counting complexity and phase transitions
February 10, 2016, 4:09 pm
Filed under: Uncategorized | Tags: ,

[The following report on the Counting Complexity and the Phase Transitions Boot Camp at the Simons Institute was written by my student Abhinav Aggarwal]

2016 Boot Camp on Counting Complexity and Phase Transitions

Jan 25-28, Simons Institute for the Theory of Computing, University of California, Berkeley


Recently I visited the Simons Institute at UC Berkeley to attend a boot camp on counting complexity and phase transitions. The program for the same is available here. There were a total of 16 talks spread over 4 days, with various speakers covering topics from basics of counting complexity, dichotomy theorems, Markov chain mixing times and random instances. All the videos from the lectures are available here. The following is a brief summary of talks in two areas that seemed interesting to me and are related to my research at UNM.

Nayantara Bhatnagar from University of Delaware and Ivona Bezakova from Rochester Institute of Technology presented 3 talks about properties of Markov chains and their mixing times, along with a couple of applications of the same. The talks started with basics of Markov chain properties and conditions under which a given chain is ergodic (irreducible, aperiodic and finite). Once an ergodic Markov chain is obtained, a metric called total variation distance is defined between the initial distribution on the states and the stationary distribution of this chain (which is unique because of ergodicity). This distance measures how close the former is to latter, purely on the basis of the difference in probability mass allocated by the former to the events in the sample space.

A well-known result is that for stochastic matrices (like the transition probability matrix for Markov chains), the decrease in the distance from the current distribution to the stationary distribution over time depends solely on the spectral gap. The smaller this gap, the larger the distance, and consequently, the more time taken by the Markov chain to approach its stationary distribution. This property is exploited by techniques called Markov chain Monte Carlo (MCMC), which are used heavily for sampling from non-trivial distributions. Some examples of this technique that were presented in the talks include sampling a uniformly random coloring of a given undirected graph and approximate counting of the number of matchings in a given graph. The details of the construction and proof can be found here.

The talks continued with a discussion of various techniques that are popular while sampling and counting using Markov chains. Four of these techniques are:

  1. Almost uniform sampler – Given a tolerance parameter d > 0, produce a sample from the distribution that is within d total variation distance from the uniform distribution.
  2. Fully polynomial almost uniform sampler (FPAUS) – An almost uniform sampler that runs in time polynomial in the input size and log 1/d .
  3. Randomized approximation scheme – Given a counting problem and a tolerance parameter ε>0, produce a count which is within ±ε of the actual count with probability at least 3/4.
  4. Fully polynomial randomized approximation scheme (FPRAS) – A randomized approximation scheme that runs in time polynomial in the input size and (1/ε).

Upon defining these techniques, our goal then becomes to design a Markov chain with small mixing time so that the runtime of FPRAS can be minimized. The talks further discussed details about coupling theory, both between Markov chains and probability distributions. Nayantara presented these concepts using coupling between two sequences of coin tosses and the famous card shuffling example, which is used to prove that the top-to-random shuffle takes O(n log n) steps to produce a perfect shuffle of a deck of n cards. Within these many steps, the total variation distance of the Markov chain underlying this shuffling reaches sufficiently close to the uniform distribution.

Another session that I thoroughly enjoyed was the one on approximate counting by Leslie Ann Goldberg. The main topic covered in her talk was relative complexity and its relation to the #BIS problem (Bipartite Independent Sets). She started with the discussion of the Potts model and the partition function in the context. This was in relation to the number of proper q-colorings in a given graph. The aim of the model was to approximate the count of these colorings. She discussed an FRPAS algorithm for the same, which outputs a rational number z such that for a given tolerance ε > 0 and the actual count C, we have Pr (Ce^-ε <= z <= Ce^ε) >= 3/4. The counting problem using FRPAS can be NP Hard in general., however Leslie showed that the problem cannot be much harder than that.

An outline of the proof uses the bisection technique by Vazirani and Valiant, which shows that #SAT can be approximated by a probabilistic polynomial time Turing machine using an oracle for SAT. Leslie then defined the relative complexity for approximate counting using AP-reductions from a function f to a function g. Concretely, a function f is AP-reducible to a function g if 1) there exists a randomized algorithm A to compute f using an oracle for g, and 2) A is a randomized approximation scheme for f whenever the oracle is a randomized approximation scheme for g. This makes the class of functions with an FPRAS closed under AP-reductions.

An immediate dichotomy result that comes out of this formulation is that if NP != RP, then within #P, all FPRASable problems form a class and the rest remain unFPRASable. All problems in a given class are AP-inter-reducible, but an FPRAS doesn’t necessarily exist unless NP = RP. However, within the class of unFPRASable problems, a dichotomy further exists. This class is partitioned into two subclasses, one of which consists of problems that are AP-reducible to #SAT and the other consists of problems that are AP-reducible to #BIS. Leslie concluded her talk by giving an example of this trichotomy in the context of graph-homomorphisms.