Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

A fast approximation scheme for probabilistic wavelet synopses

Deligiannakis Antonios, Garofalakis Minos, Roussopoulos Nick

Full record


URI: http://purl.tuc.gr/dl/dias/09F0998A-1CC4-4760-9DDC-0E190E255C7D
Year 2005
Type of Item Conference Full Paper
License
Details
Bibliographic Citation A. Deligiannakis, M. Garofalakis and N. Roussopoulos, "A fast approximation scheme for probabilistic wavelet synopses," in 17th International Scientific and Statistical Database Management Conference, June 2005, pp. 243-252.
Appears in Collections

Summary

Several studies have demonstrated the effectiveness of Haarwavelets in reducing large amounts of data down to compact waveletsynopses that can be used to obtain fast, accurate approximatequery answers. While Haar wavelets were originally designedfor minimizing the overall root-mean-squared (i.e., ✂✁-norm) error in the data approximation, the recently-proposed ideaof probabilistic wavelet synopses also enables their use in minimizingother error metrics, such as the relative error in individualdata-value reconstruction, which is arguably the most importantfor approximate query processing. Known construction algorithmsfor probabilistic wavelet synopses employ probabilistic schemesfor coefficient thresholding that are based on optimal DynamicProgramming(DP) formulations over the error-tree structure forHaar coefficients. Unfortunately, these (exact) schemes can scalequite poorly for large data-domain and synopsis sizes. To addressthis shortcoming, in this paper, we introduce a novel, fast approximationscheme for building probabilistic wavelet synopses overlarge data sets. Our algorithm’s running time is near-linear in thesize of the data-domain (even for very large synopsis sizes) andproportional to ✄✆☎✞✝, where ✝ is the desired approximation guarantee.The key technical idea in our approximation scheme is tomake exact DP formulations for probabilistic thresholding much“sparser”, while ensuring a maximum relative degradation of ✝ onthe quality of the approximate synopsis, i.e., the desired approximationerror metric. Extensive experimental results over syntheticand real-life data clearly demonstrate the benefits of our proposedtechniques.

Services

Statistics