Computer Science Seminar - University of Houston
Skip to main content

Computer Science Seminar

The Communication Complexity of Distributed Set-Joins

When: Friday, November 18, 2016
Where: PGH 563
Time: 11:00 AM

Speaker: Dr. Qin Zhang, Indiana University

Host: Dr. Gopal Pandurangan

Given two collections of sets A = {A_1, ..., A_m} and B = {B_1, ..., B_m} where A_i, B_j are subsets of {1, ..., n}, and a predicate P, the problem of distributed set-joins is to find all pairs (A_i, B_j) such that P(A_i, B_j) is true. P can be the equivalence test, or the intersection test, etc. Assuming A and B are stored at two different sites in a distributed environment, we study the (randomized) communication complexity of set-joins, as well as the (randomized) communication complexity of computing the exact and approximate value of the join size. Combined, our analyses shed new insights into the quantitative differences between these different set-joins. Furthermore, given the close affinity of the natural join and the set-intersection-not-empty join, our results also yield communication complexity results for computing the natural join in a distributed environment.

For more details see, http://homes.soic.indiana.edu/qzhangcs/papers/pods15-join.pdf.

Bio:

Qin Zhang is an assistant professor at the Indiana University Bloomington.  He received a B.S. degree from Funda University and a Ph.D. from Hong Kong University of Science and Technology. He also spent a few years as a post-doc at the Theory Group of IBM Almaden Research Center, and the Center for Massive Data Algorithmics at Aarhus University. He is interested in algorithms for big data, in particular, data stream algorithms, sublinear algorithms, algorithms on distributed data; I/O-efficient algorithms, data structures, database algorithms and communication complexity.