Slides from talks are available at the link above.
[The following blog report on MMDS’14 was written by my student Mahdi Zamani – Ed]
Recently, I attended the MMDS workshop hosted by UC Berkeley. The workshop consisted of 40 talks distributed in four days and a poster session. It focused on algorithmic and statistical problems in large scale data analysis and brought together researchers from several areas including computer science, mathematics, statistics, and Big Data practitioners. In the third day of the workshop, I presented a poster about our recent work on multi-party sorting of large data sets. The following is a brief summary of a few talks that seemed interesting to me and my current research projects. The workshop program is available here.
Dan Suciu from University of Washington talked about running queries on very large databases. He argued that traditionally database queries were measured in terms of disk I/O but in Big Data query processing, since the database is store on distributed clusters, communication is the bottleneck. Now, the problem is to run a query on p servers via a Massive Parallel Communication (MPC) model, where the goal is to minimize the load on each server. In this model, the database records are partitioned among p servers each getting M/p, where M is the total number of records. Then, the database operations (like SQL joins) are performed on the partitions. As an example, Dan mentions a popular operation in Big Data analysis called triangle that is defined over three relations: Q(x,y,z)=R(x,y) join S(y,z) join T(z,x). Using a naive approach, this query can be computed in two communication rounds, doing two separate joins. But, surprisingly, it can be computed in a single communication round with reasonable load on each server using a simple technique described by Dan in .
Bill Howe from University of Washington talked about Myria, a web-based framework written in Java for Big Data query processing. He is motivated by the fact that at least we, as scientists, spend more than 90% of our time handling data for our experiments rather than designing new techniques and algorithms (I think by scientists he means practitioners here. I believe theoreticians spend much less time working with data 🙂. Bill explains that although current Hadoop optimizations are more than 100 times faster than Hadoop itself, we still need faster and simpler frameworks that can be at least used by non-expert scientists. The main idea behind Myria is to blur the distinction between relational algebra and linear algebra. In other words, every query in the relational domain is translated to simple linear algebraic equations that can be computed very fast. Myria provides an imperative-declarative language that runs queries in parallel.
Yiannis Koutis from University of Puerto Rico talked about spectral algorithms for mining data from large graphs. Spectral analysis is an important tool for graph partitioning, where the set of vertices of a large graph is divided into smaller components such that the number of edges running between separated components is small. This is often an important subproblem for complexity reduction or parallelization. A cut that provides the smallest number of such edges is called the sparsest cut. Such a cut bisects the graph into the most important components. Given an adjacency matrix A of a graph G, the general idea behind spectral graph partitioning is that the eigenvector associated with the second smallest eigenvalue of the Laplacian matrix L of A is the sparsest cut of G (given a matrix A and the degree matrix of A denoted by D, the Laplacian matrix of can be computed as L=D-A). Since finding the exact eigenvalues of the Laplacian matrix is often computationally intensive, Yiannis argues that one can find a good (small) sparse cut by approximating the eigenvalues. However, it is an open problem whether one can find better cuts using another technique with similar computational costs. At the end of the talk, I asked Yiannis whether probabilistic methods (such as random walks) are slower than spectral methods. He answered that probabilistic methods are usually local, and global random walks are often much slower than global spectral analysis.
David Woodruff from IBM Almaden Research talked about an optimal CUR matrix decomposition technique. Given a matrix A, the goal is to find three matrices C, U, and R such that A=C.U.R. A solution to this problem can be used in many applications such as recommendation systems. For example, Netflix is interested in fast techniques for decomposing its huge users-by-movies matrix into a matrix of the most important users and a matrix of the most important movies. Since the exact decomposition is often very computationally expensive, an approximation is calculated instead. Although there are fast randomized algorithms for solving this problem (e.g., ), David proposes their asymptotically-optimal deterministic algorithm published recently in STOC 2014 .
Xavier Amatriain from Netflix talked about current Big Data challenges of his company. Until 2007, the company had over 100 million movie ratings and now it holds about 5 billion ratings. The company currently has 44 million users around the world and holds more than 5 billion hours of movies. The users play about 50 million movies and submit 5 million ratings per day. This contributes to about 32% of US daily downstream traffic. From this huge pile of data, Netflix needs to extract ranking and similarity information about the movies. Their main approach to this end is to employ distributed machine learning algorithms. Xavier argues that in today’s world more data sometimes beats better algorithms. He backs up his claim by a quote from Peter Norving, Director of Research at Google: “Google does not have better algorithms, only more data” (see this). Xavier continues by arguing that training algorithms can run in parallel because training data are often independent of each other. He also describe how they have scaled their algorithms by distributing at three different levels: across regions or subsets of the population, at the hyperparameter optimization stage, and at the model training level.
[A polished version of this report is available here – Ed]
 : “Optimal CUR Matrix Decompositions”, Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pp. 353—362, 2014. URL: http://doi.acm.org/10.1145/2591796.2591819.
Filed under: Uncategorized | Tags: secure multiparty computation, theory, workshops
Last month I went to a workshop on secure multiparty computation that was organized by IARPA. I’ve had mixed luck with these types of workshops that are organized by funding agencies. However, this one was pretty good, because of the timeliness of the topic, well-prepared talks, and the fact that the participants were some of the big names in security and MPC.
This was also one first times I’ve visited and given a talk at MIT, which was a lot of fun. Many of the talk slides are included in the workshop link above.
FOCI (Free and Open Communication on the Internet) is a nice workshop that is co-chaired this year by my colleague and friend Jed Crandall. Many luminaries in the emerging area of anti-censorship are on the PC this year: Roger Dingledine (founder of TOR), Joan Feigenbaum, Nikita Borisov.
If you’ve got something appropriate, it’s a good place to submit. Submissions are due on May 6th.
Filed under: Uncategorized | Tags: distributed computing, shared memory, theory, workshops
Last week, I attended a great workshop in Banff, Canada on “Probabilistic versus Deterministic Approaches to Shared Memory Computation”. The following is an extremely biased, incomplete and watered down summary – also it only includes morning talks because I was too sleepy in the afternoon to take notes – reader beware.
On Monday morning, Jennifer Welch and I gave back-to-back talks on emulating shared memory objects in “weird” communication models. Jennifer talked about maintaining shared memory consistency under probabilistic churn in a peer-to-peer network. In particular, she described recent work that created a shared read-write register in such a model that loses its state only with some small probability. The technical proofs of correctness relied on careful analysis of continuous time Markov processes. I believe that this paper describes some of their results in this model.
My own talk was on emulating a single writer, multi-reader in a wireless communication model subject to adversarial jamming (link to slides) – this was based on work with Valerie King and Maxwell Young from last PODC. I’ve blogged about this somewhat before so won’t repeat myself. However, if you’re a fan of the golden ratio (and who isn’t?) then you should read the first half of the slides. I had a lot of good feedback at the talk on how to extend these results including: 1) determining what happens when there are multiple communication channels available; 2) what happens if one can use signal processing in the event of a jam to determine what the underlying message was in the case where many processors broadcast the sam underlying message; and 3) determining if one can reduce the power consumption necessary to in order to maintain the state of the shared object, perhaps expending more energy only when there is a need to change state.
On Tuesday, Keren Censor-Hillel and Danny Hendler gave excellent back-to-back talks on restricted use shared objects. Keren started with upper bounds – in particular how to implement a max count register in the restricted use model and Danny followed up with lower bounds from a paper joint with Aspnes, Attiya, and Censor-Hillel. A restricted use object is essentially one where some restriction exists on the number of operations that may be applied to an object. There are two common types of restrictions: 1) m limited use: at most m ops may be applied; and 2) b bounded: at most b values supported by the object. An old result by Jayanti, Tan and Toeg in ’00 shows that Omega(n) work and space is needed for history-less primitives [Jayanti, Tan, Toueg, ’00], but these lower bounds don’t apply to restricted use objects. There have been several exciting results showing the possibility of implementing such objects in polylog step complexity (see the talk by Gilbert and Bender below for an example of cool applications of these new results). How can we devise lower bounds for this new type of restricted use object? Danny discussed the notion of L perturbable objects that intuitively can be perturbed at most L times. He gave details of a result (joint with Aspnes, Attiya, and Keren) showing that a L perturbable object must have step complexity Omega(min(log L,n)) and space complexity Omega(min(L,n)). This lower bound is for deterministic objects only; randomized lower bounds are still relatively open (lower bound of Omega(log log m/ log log log m))
On Wednesday, Lisa Higham and Wojciech Golab gave talks on the notion of strong linearizability. I will just touch on the problem they address. When a shared object is linearizable, it intuitively means that the history of invocations and responses to that object can be ordered in time in a way that 1) the total order extends the “happens before” partial order over all the operations; and 2) the ordering obeys correctness properties for the shared object. (I’m probably missing something important here – maybe someone will correct me in the comments). Many (most?) people think that if operations on an object are linearizable, then everything is great: the object acts like it is atomic in the sense that it appears to the rest of the system as if operations on it occur instantaneously.
Lisa and Wojciech showed that these people are sadly mistaken. In a recent STOC result, joint with Wojciech Golab and Philipp Woelfel, it’s shown that linearizability does not suffice in the case where processes can use randomization. They introduce a new concept strong linearizability that is sufficient (and more or less necessary) to ensure correctness of randomized algorithms. Unfortunately, as discussed by Wojciech, it seems that to ensure strong linearizability for most shared objects requires significantly higher resource costs than to ensure linearizability. Valerie King brought up an interesting question about whether for certain types of randomized algorithms, a somewhat weaker notion may work (for example, if the algorithm is Monte Carlo and it’s ok if the scheduler can slightly tweak the probability of “bad” events, so long as this probability never gets too large).
[Note from Wojciech: you said “it seems that to ensure strong linearizability for most shared objects requires significantly higher resource costs than to ensure linearizability.” Is your impression based by any chance on the impossibility results I gave during my presentation? Those actually pertain to first-step and first-update linearizability, one of which is strictly stronger than strong linearizability, and the other is incomparable. For strong linearizability itself, there are several upper bounds that Lisa and I didn’t mention as prominently as we should have. In particular, known universal constructions for lock-based objects and wait-free objects tend to be strongly linearizable. Thus, the message we wanted to get across is that strong linearizability is a practical property because it’s readily attainable, and in several important cases it comes at no additional cost beyond the cost of ordinary linearizability. (That’s in contrast to first-step and first-update linearizability.)]
Next, Hagit Attiya led an interesting discussion on how to motivate our work in distributed computing, pointing out that other CS areas, like Machine Learning, are frequently better at “selling” their results. Maurice pointed out that many results that originated in the PODC community (like Byzantine agreement) have been “rediscovered” by the systems community, frequently without significant attibution/homage to the distributed algorithms community. How can we more effectively advertise these results outside our own community? Hint: it may help to have fewer models.
On Wednesday afternoon, we took a road trip to Lake Louise where we saw a great collection of ice sculptures, walked around a beautiful snow covered lake, and learned that penny loafers aren’t the best footwear for a glacial approach.
On Thursday, Seth Gilbert and Michael Bender gave a great joint talk on their recent FOCS paper on “Mutual Exclusion with O(log^2 log n) Amortized Work”. A recent algorithm for this problem was by Danny Hendler and Phillip Woefel (one of the workshop organizers), which showed that it was possible to breakthrough a Theta(log n) barrier (shown by Attiya, Hendler and Woefel); this previous algorithm achieved Theta(log n/ (log log n)) work against an adaptive adversary (in the shared memory world, this essentially means an adversary in the full information communication model). Seth and Michael’s work assumes a weaker adversary that is oblivious in that it can’t see past coin flips by the processors. Their new algorithm is also Monte Carlo in that it ensures each processor gets into the critical section only with high prob.
Some key technical ideas behind their new result are: 1) to use a dense array to store processors that are waiting to enter the critical section; and 2) to create and use clever approximation and work-efficient counters (remember you can only afford O(log^2 log n) work per counter) in order to dynamically manage the array of processors that are waiting. An interesting open problem: Can we prove that an adaptive adv. (i.e. one that has full information) can force at least Omega(log n/log log n) work even if the algorithm ensures access to the critical section only with high prob; or alternatively can we design an algorithm in this model that does better?
Valerie King gave a nice talk in the afternoon, half of which was on connections between the shared memory model and the message passing model for a consensus problem. We’re likely writing a paper on this problem so I’ll probably blog about it later here.
Friday was overcast and cold, which made it a little bit easier to say goodbye to beautiful Banff.
[Thanks to Wojciech Golab for helpful corrections]
Filed under: Uncategorized | Tags: algorithms, data structures, theory, workshops
Bertinoro is a small town near Bologna, with a castle on a hill, a monastery just below it and a beautiful view of farmland and the Mediterranean. It’s now a popular site for workshops (kind of like Dagstuhl), which are held in the castle.
This summer was the second time that I attended the Algorithms and Data Structures (ADS) workshop at Bertinoro. There were many interesting clusters of talks. I’m just going to touch on a very biased sample of some of the talks I remember below:
- Andrew Goldberg talked about new algorithms for solving the single-source shortest path problem on graphs with low “highway” dimension. Intuitively a graph has small highway dimension if for every positive r, there is a “sparse” set of vertices S_r such that every shortest path of length r passes through S_r, where sparse means that every ball of radius O(r) contains a small number of elements in S_r. Renator Werneck and Daniel Delling talked about additional aspects of the SSSP problem, namely 1) handling arbitrary metrics quickly in order to handle, e.g. real-time traffic updates (Weneck); and 2) showing a connection between highway dimension and VC-dimension, and showing that a labeling algorithm coming out of the distributed computing community improves theoretical and practical performance (Delling). This research on fast single source shortest paths algorithms has been going on for years at MSR – it’s nice to see a practical problem so thoroughly investigated.
- For research in data structures, you can’t do much better than Sleator and Tarjan. Bob Tarjan talked about a way to maintain search trees so that rebalancing occurs only on insertion, not deletion (thereby significantly simplifying the rebalancing process). Daneil Sleator talked about, Skip-Splay, a BST data structure that nearly achieves the unified bound (intuitively a data structure that satisfies the unified bound has good performance for sequences of operations where most accesses are near a recently accessed element). On a more practical note, Michael Bender talked about work done at his company on designing a data structure for membership queries that is optimized for 1) space, and 2) writes. Like a Bloom Filter, the data structure trades off space for a false positive rate. However, the new data structure scales beyond main memory, and is optimized for writes and for flash memory. The talk was titled: “Don’t Thrash: How to Cache Your Hash on Flash”
- Distributed computing was represented by Faith Ellen, Sid Sen, Valerie King and me. Faith talked about a communication model conjecture with applications to proving superpolylogarithmic lowerbounds for dynamic data structures. Sid talked about the cost of equivocation in Byzantine agreement – namely what happens if we have partial broadcast channels among sets of 3 processors. Valerie talked about our work “Conflict on a Communication Channel” that I’ve blogged about here previously. I talked about our recent work on scalable rational secret sharing (slides are here).
All in all this was a great workshop. Lot’s of smart people, beautiful scenery, and 4-hour Italian-style dinners (3 courses of alcohol!) in which to think about the big problems in algorithms and data structures.