URI | http://purl.tuc.gr/dl/dias/E6E011A7-1E7E-41A6-824F-1BD230C6047A | - |
Αναγνωριστικό | http://www.vldb.org/pvldb/2/vldb09-394.pdf | - |
Γλώσσα | en | - |
Μέγεθος | 12 pages | en |
Τίτλος | Probabilistic histograms for probabilistic data | en |
Δημιουργός | Cormode, Graham, 1977- | en |
Δημιουργός | Deligiannakis Antonios | en |
Δημιουργός | Δεληγιαννακης Αντωνιος | el |
Δημιουργός | Garofalakis Minos | en |
Δημιουργός | Γαροφαλακης Μινως | el |
Δημιουργός | McGregor Andrew | en |
Εκδότης | Association for Computing Machinery | en |
Περίληψη | There is a growing realization that modern database management
systems (DBMSs) must be able to manage data that contains uncertainties
that are represented in the form of probabilistic relations.
Consequently, the design of each core DBMS component
must be revisited in the presence of uncertain and probabilistic information.
In this paper, we study how to build histogram synopses
for probabilistic relations, for the purposes of enabling both
DBMS-internal decisions (such as indexing and query planning),
and (possibly, user-facing) approximate query processing tools. In
contrast to initial work in this area, our probabilistic histograms
retain the key possible-worlds semantics of probabilistic data, allowing
for more accurate, yet concise, representation of the uncertainty
characteristics of data and query results. We present a
variety of techniques for building optimal probabilistic histograms,
each one tuned to a different choice of approximation-error metric.
We show that these can be incorporated into a general Dynamic
Programming (DP) framework, which generalizes that used for existing
histogram constructions. The end result is a histogram where
each “bucket” is approximately represented by a compact probability
distribution function (PDF), which can be used as the basis
for query planning and approximate query answering. We present
novel, polynomial-time algorithms to find optimal probabilistic histograms
for a variety of PDF-error metrics (including variation distance,
sum squared error, max error and EMD1). Our experimental
study shows that our probabilistic histogram synopses can accurately
capture the key statistical properties of uncertain data, while
being much more compact to store and work with than the original
uncertain relations. | en |
Τύπος | Πλήρης Δημοσίευση σε Συνέδριο | el |
Τύπος | Conference Full Paper | en |
Άδεια Χρήσης | http://creativecommons.org/licenses/by/4.0/ | en |
Ημερομηνία | 2015-11-30 | - |
Ημερομηνία Δημοσίευσης | 2009 | - |
Θεματική Κατηγορία | Database management | en |
Θεματική Κατηγορία | Database systems | en |
Βιβλιογραφική Αναφορά | G. Cormode, A. Deligiannakis, M. Garofalakis and A. McGregor, "Probabilistic histograms for probabilistic data," in 35th International Conference on Very Large Data Bases, 2009. | en |