URI | http://purl.tuc.gr/dl/dias/E6E011A7-1E7E-41A6-824F-1BD230C6047A | - |
Identifier | http://www.vldb.org/pvldb/2/vldb09-394.pdf | - |
Language | en | - |
Extent | 12 pages | en |
Title | Probabilistic histograms for probabilistic data | en |
Creator | Cormode, Graham, 1977- | en |
Creator | Deligiannakis Antonios | en |
Creator | Δεληγιαννακης Αντωνιος | el |
Creator | Garofalakis Minos | en |
Creator | Γαροφαλακης Μινως | el |
Creator | McGregor Andrew | en |
Publisher | Association for Computing Machinery | en |
Content Summary | There is a growing realization that modern database management
systems (DBMSs) must be able to manage data that contains uncertainties
that are represented in the form of probabilistic relations.
Consequently, the design of each core DBMS component
must be revisited in the presence of uncertain and probabilistic information.
In this paper, we study how to build histogram synopses
for probabilistic relations, for the purposes of enabling both
DBMS-internal decisions (such as indexing and query planning),
and (possibly, user-facing) approximate query processing tools. In
contrast to initial work in this area, our probabilistic histograms
retain the key possible-worlds semantics of probabilistic data, allowing
for more accurate, yet concise, representation of the uncertainty
characteristics of data and query results. We present a
variety of techniques for building optimal probabilistic histograms,
each one tuned to a different choice of approximation-error metric.
We show that these can be incorporated into a general Dynamic
Programming (DP) framework, which generalizes that used for existing
histogram constructions. The end result is a histogram where
each “bucket” is approximately represented by a compact probability
distribution function (PDF), which can be used as the basis
for query planning and approximate query answering. We present
novel, polynomial-time algorithms to find optimal probabilistic histograms
for a variety of PDF-error metrics (including variation distance,
sum squared error, max error and EMD1). Our experimental
study shows that our probabilistic histogram synopses can accurately
capture the key statistical properties of uncertain data, while
being much more compact to store and work with than the original
uncertain relations. | en |
Type of Item | Πλήρης Δημοσίευση σε Συνέδριο | el |
Type of Item | Conference Full Paper | en |
License | http://creativecommons.org/licenses/by/4.0/ | en |
Date of Item | 2015-11-30 | - |
Date of Publication | 2009 | - |
Subject | Database management | en |
Subject | Database systems | en |
Bibliographic Citation | G. Cormode, A. Deligiannakis, M. Garofalakis and A. McGregor, "Probabilistic histograms for probabilistic data," in 35th International Conference on Very Large Data Bases, 2009. | en |