12-08-2011, 10:17 AM
Abstract.
We present deterministic sub-linear space algorithms for a number of problems over update data streams, including, estimating fre- quencies of items and ranges, ¯nding approximate frequent items and approximate Á-quantiles, estimating inner-products, constructing near- optimal B-bucket histograms and estimating entropy. We also present new lower bound results for several problems over update data streams.
1 Introduction
The data streaming model [2, 26] presents a computational model for a variety of monitoring applications, for example, network monitoring, sensor networks, etc., where data arrives rapidly and continuously and has to be processed in an online fashion using sub-linear space. Some examples of fundamental data streaming primitives include, estimating the frequency of items (point queries) and ranges (range-sum queries), ¯nding approximate frequent items, approximate quantiles and approximate hierarchical heavy hitters, estimating inner-product, construct- ing approximately optimal B-bucket histograms, estimating entropy, etc.. We view a data stream as a sequence of arrivals of the form (i; v), where, i is the identity of an item belonging to the domain D = f0; 1; : : : ; n ¡ 1g and v is a non-zero integer that depicts the change in the frequency of i. v ¸ 1 signi¯es v insertions of the item i and v • ¡1 signi¯es jvj deletions of i. The frequency of an item i is denoted by fi and is de¯ned as the sum of the changes to its frequency since the inception of the stream, that is, fi = P (i;v) appears in stream v. If fi ¸ 0 for all i (i.e., deletions correspond to prior insertions) then the corresponding streaming model is referred to as the strict update streaming model, whereas, the model in which frequencies can take arbitrary positive, zero or negative values is called the general update streaming model. The insert-only model refers to data streams with no deletions, that is, v > 0. Randomized algorithms dominate the landscape of sub-linear space algo- rithms for problems over update streams. There are no deterministic sub-linear space algorithms known for a variety of basic problems over update streams, including, estimating the frequency of items and ranges, ¯nding approximate frequent items and approximate Á-quantiles, ¯nding approximate hierarchical
Download full report
http://googleurl?sa=t&source=web&cd=1&ve...escape.pdf&ei=nK1ETs_2LcusrAe61-HVAw&usg=AFQjCNFG19pgZtloZrpRUOttzlfDjj9VjQ&sig2=E9nxDeMi8ztUYp-F2chjBA