Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Wavelet synopses with error guarantees

Garofalakis Minos, Gibbons Phillip B.

Full record


URI: http://purl.tuc.gr/dl/dias/82802438-5BB0-49C3-81FD-A3D954CD364F
Year 2002
Type of Item Conference Publication
License
Details
Bibliographic Citation M. Garofalakis and P. Gibbons, "Wavelet synopses with error guarantees", in ACM SIGMOD International Conference on Management of Data, June 2002, pp. 476-487.
Appears in Collections

Summary

Recent work has demonstrated the effectiveness of the wavelet decompositionin reducing large amounts of data to compact sets ofwavelet coefficients (termed “wavelet synopses”) that can be usedto provide fast and reasonably accurate approximate answers toqueries. A major criticism of such techniques is that unlike, forexample, random sampling, conventional wavelet synopses do notprovide informative error guarantees on the accuracy of individualapproximate answers. In fact, as this paper demonstrates, errorscan vary widely (without bound) and unpredictably, even for identicalqueries on nearly-identical values in distinct parts of the data.This lack of error guarantees severely limits the practicality of traditionalwavelets as an approximate query-processing tool, becauseusers have no idea of the quality of any particular approximate answer.In this paper, we introduce Probabilistic Wavelet Synopses,the first wavelet-based data reduction technique with guarantees onthe accuracy of individual approximate answers. Whereas earlierapproaches rely on deterministic thresholding for selecting a setof “good” wavelet coefficients, our technique is based on a novel,probabilistic thresholding scheme that assigns each coefficient aprobability of being retained based on its importance to the reconstructionof individual data values, and then flips coins to select thesynopsis. We show how our scheme avoids the above pitfalls ofdeterministic thresholding, providing highly-accurate answers forindividual data values in a data vector. We propose several noveloptimization algorithms for tuning our probabilistic thresholdingscheme to minimize desired error metrics. Experimental results onreal-world and synthetic data sets evaluate these algorithms, anddemonstrate the effectiveness of our probabilistic wavelet synopsesin providing fast, highly-accurate answers with error guarantees.

Services

Statistics