Ιδρυματικό Αποθετήριο [SANDBOX]
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

A fast approximation scheme for probabilistic wavelet synopses

Deligiannakis Antonios, Garofalakis Minos, Roussopoulos Nick

Πλήρης Εγγραφή


URI: http://purl.tuc.gr/dl/dias/09F0998A-1CC4-4760-9DDC-0E190E255C7D
Έτος 2005
Τύπος Πλήρης Δημοσίευση σε Συνέδριο
Άδεια Χρήσης
Λεπτομέρειες
Βιβλιογραφική Αναφορά A. Deligiannakis, M. Garofalakis and N. Roussopoulos, "A fast approximation scheme for probabilistic wavelet synopses," in 17th International Scientific and Statistical Database Management Conference, June 2005, pp. 243-252.
Εμφανίζεται στις Συλλογές

Περίληψη

Several studies have demonstrated the effectiveness of Haarwavelets in reducing large amounts of data down to compact waveletsynopses that can be used to obtain fast, accurate approximatequery answers. While Haar wavelets were originally designedfor minimizing the overall root-mean-squared (i.e., ✂✁-norm) error in the data approximation, the recently-proposed ideaof probabilistic wavelet synopses also enables their use in minimizingother error metrics, such as the relative error in individualdata-value reconstruction, which is arguably the most importantfor approximate query processing. Known construction algorithmsfor probabilistic wavelet synopses employ probabilistic schemesfor coefficient thresholding that are based on optimal DynamicProgramming(DP) formulations over the error-tree structure forHaar coefficients. Unfortunately, these (exact) schemes can scalequite poorly for large data-domain and synopsis sizes. To addressthis shortcoming, in this paper, we introduce a novel, fast approximationscheme for building probabilistic wavelet synopses overlarge data sets. Our algorithm’s running time is near-linear in thesize of the data-domain (even for very large synopsis sizes) andproportional to ✄✆☎✞✝, where ✝ is the desired approximation guarantee.The key technical idea in our approximation scheme is tomake exact DP formulations for probabilistic thresholding much“sparser”, while ensuring a maximum relative degradation of ✝ onthe quality of the approximate synopsis, i.e., the desired approximationerror metric. Extensive experimental results over syntheticand real-life data clearly demonstrate the benefits of our proposedtechniques.

Υπηρεσίες

Στατιστικά