Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Fast approximate wavelet tracking on streams

Cormode, Graham, 1977-, Garofalakis Minos, Sacharidis Dimitris

Full record


URI: http://purl.tuc.gr/dl/dias/E21182D0-1AE5-437A-B590-145B2AD6661E
Year 2006
Type of Item Conference Full Paper
License
Details
Bibliographic Citation G. Cormode, M. Garofalakis and D. Sacharidis, "Fast approximate wavelet tracking on streams", in 10th International Conference on Advances in Database Technology, September 2006.
Appears in Collections

Summary

Recent years have seen growing interest in effective algorithms forsummarizing and querying massive, high-speed data streams. Randomized sketchsynopses provide accurate approximations for general-purpose summaries of thestreaming data distribution (e.g., wavelets). The focus of existing work has typicallybeen on minimizing space requirements of the maintained synopsis — however,to effectively support high-speed data-stream analysis, a crucial practicalrequirement is to also optimize: (1) the update time for incorporating a streamingdata element in the sketch, and (2) the query time for producing an approximatesummary (e.g., the top wavelet coefficients) from the sketch. Such timecosts must be small enough to cope with rapid stream-arrival rates and the realtimequerying requirements of typical streaming applications (e.g., ISP networkmonitoring). With cheap and plentiful memory, space is often only a secondaryconcern after query/update time costs.In this paper, we propose the first fast solution to the problem of tracking waveletrepresentations of one-dimensional and multi-dimensional data streams, based ona novel stream synopsis, the Group-Count Sketch (GCS). By imposing a hierarchicalstructure of groups over the data and applying the GCS, our algorithmscan quickly recover the most important wavelet coefficients with guaranteed accuracy.A tradeoff between query time and update time is established, by varyingthe hierarchical structure of groups, allowing the right balance to be found forspecific data stream. Experimental analysis confirms this tradeoff, and shows thatall our methods significantly outperform previously known methods in terms ofboth update time and query time, while maintaining a high level of accuracy.

Services

Statistics