Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Optimal histograms for limiting worst-case error propagation in the size of join results.

Christodoulakis Stavros, Yiannis E. Ioannidis

Full record


URI: http://purl.tuc.gr/dl/dias/A0E77513-A9A3-47D9-9553-601A84B1B522
Year 1993
Type of Item Peer-Reviewed Journal Publication
License
Details
Bibliographic Citation Y. Ioannidis , S. Christodoulakis , " Optimal histograms for limiting worst-case error propagation in the size of join results",ACM Trans. on Datab. Syst. , vol. 18 ,no. 4 ,pp. 709-748,1993 .doi : 10.1145/169725.169708 https://doi.org/10.1145/169725.169708
Appears in Collections

Summary

Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and on this basis estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and its subclass of end-btased histograms; the latter is of particular interest because such histograms are used in several database systems. We concentrate on equality join queries without function symbols where each relation is joined on the same attribute(s) for all joins in which it participates. Join queries of this restricted type are called t-cllque queries. We show that the optimal histogram for reducing the worst-case error in the result size of such a query is always serial. For queries with one join and no function symbols (all of which are vacuously t-clique queries), we present results on finding the optimal serial histogram and the optimal end-biased histogram based on the query characteristics and the frequency distributions of values in the join attributes of the query relations. Finally, we prove that for t-clique queries with a very large number of joins, h~gh-bzased h zstograms (which form a subclass of end-biased histograms) are always optimal. To construct a histogram for the join attribute(s) of a relation, the values in the attribute(s) must first be sorted based on their frequency and then assigned into buckets according to the optimality results above.

Available Files

Services

Statistics