Το work with title Optimal histograms for limiting worst-case error propagation in the size of join results. by Christodoulakis Stavros, Yiannis E. Ioannidis is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
Y. Ioannidis , S. Christodoulakis , " Optimal histograms for limiting worst-case error propagation in the size of join results",ACM Trans. on Datab. Syst. , vol. 18 ,no. 4 ,pp. 709-748,1993 .doi : 10.1145/169725.169708
https://doi.org/10.1145/169725.169708
Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and on this basis estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and its subclass of end-btased histograms; the latter is of particular interest because such histograms are used in several database systems. We concentrate on equality join queries without function symbols where each relation is joined on the same attribute(s) for all joins in which it participates. Join queries of this restricted type are called t-cllque queries. We show that the optimal histogram for reducing the worst-case error in the result size of such a query is always serial. For queries with one join and no function symbols (all of which are vacuously t-clique queries), we present results on finding the optimal serial histogram and the optimal end-biased histogram based on the query characteristics and the frequency distributions of values in the join attributes of the query relations. Finally, we prove that for t-clique queries with a very large number of joins, h~gh-bzased h zstograms (which form a subclass of end-biased histograms) are always optimal. To construct a histogram for the join attribute(s) of a relation, the values in the attribute(s) must first be sorted based on their frequency and then assigned into buckets according to the optimality results above.