Το έργο με τίτλο A fast approximation scheme for probabilistic wavelet synopses από τον/τους δημιουργό/ούς Deligiannakis Antonios, Garofalakis Minos, Roussopoulos Nick διατίθεται με την άδεια Creative Commons Αναφορά Δημιουργού 4.0 Διεθνές
Βιβλιογραφική Αναφορά
A. Deligiannakis, M. Garofalakis and N. Roussopoulos, "A fast approximation scheme for probabilistic wavelet synopses," in 17th International Scientific and Statistical Database Management Conference, June 2005, pp. 243-252.
Several studies have demonstrated the effectiveness of Haarwavelets in reducing large amounts of data down to compact waveletsynopses that can be used to obtain fast, accurate approximatequery answers. While Haar wavelets were originally designedfor minimizing the overall root-mean-squared (i.e., ✂✁-norm) error in the data approximation, the recently-proposed ideaof probabilistic wavelet synopses also enables their use in minimizingother error metrics, such as the relative error in individualdata-value reconstruction, which is arguably the most importantfor approximate query processing. Known construction algorithmsfor probabilistic wavelet synopses employ probabilistic schemesfor coefficient thresholding that are based on optimal DynamicProgramming(DP) formulations over the error-tree structure forHaar coefficients. Unfortunately, these (exact) schemes can scalequite poorly for large data-domain and synopsis sizes. To addressthis shortcoming, in this paper, we introduce a novel, fast approximationscheme for building probabilistic wavelet synopses overlarge data sets. Our algorithm’s running time is near-linear in thesize of the data-domain (even for very large synopsis sizes) andproportional to ✄✆☎✞✝, where ✝ is the desired approximation guarantee.The key technical idea in our approximation scheme is tomake exact DP formulations for probabilistic thresholding much“sparser”, while ensuring a maximum relative degradation of ✝ onthe quality of the approximate synopsis, i.e., the desired approximationerror metric. Extensive experimental results over syntheticand real-life data clearly demonstrate the benefits of our proposedtechniques.