Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Sharing aggregate computation for distributed queries

Huebsch Ryan, Garofalakis Minos, Hellerstein, Joseph, 1952-, Stoica, Ion, 1965-

Full record


URI: http://purl.tuc.gr/dl/dias/8167266C-005C-4C48-A9EB-DC0302CB7517
Year 2007
Type of Item Conference Full Paper
License
Details
Bibliographic Citation R. Huebsch, M. Garofalakis, J. M. Hellerstein and I. Stoica, "Sharing aggregate computation for distributed queries", in ACM SIGMOD International Conference on Management of Data, 2007. doi: 10.1145/1247480.1247535 https://doi.org/10.1145/1247480.1247535
Appears in Collections

Summary

An emerging challenge in modern distributed querying is to effi-ciently process multiple continuous aggregation queries simultaneously.Processing each query independently may be infeasible,so multi-query optimizations are critical for sharing work acrossqueries. The challenge is to identify overlapping computations thatmay not be obvious in the queries themselves.In this paper, we reveal new opportunities for sharing work in thecontext of distributed aggregation queries that vary in their selectionpredicates. We identify settings in which a large set of q suchqueries can be answered by executing k ≪ q different queries.The k queries are revealed by analyzing a boolean matrix capturingthe connection between data and the queries that they satisfy,in a manner akin to familiar techniques like Gaussian elimination.Indeed, we identify a class of linear aggregate functions (includingSUM, COUNT and AVERAGE), and show that the sharing potentialfor such queries can be optimally recovered using standard matrixdecompositions from computational linear algebra. For someother typical aggregation functions (including MIN and MAX) wefind that optimal sharing maps to the NP-hard set basis problem.However, for those scenarios, we present a family of heuristic algorithmsand demonstrate that they perform well for moderate-sizedmatrices. We also present a dynamic distributed system architectureto exploit sharing opportunities, and experimentally evaluatethe benefits of our techniques via a novel, flexible random workloadgenerator we develop for this setting.

Services

Statistics