12-08-2011, 10:38 AM
Abstract.
We present a simple way of designing deterministic algorithms for problems in the data stream model via lossless expander graphs.We illustrate this by considering two problems, namely, k-sparsity testing and estimating frequency of items.
1 Introduction
We say that an n-dimensional vector f from Zn is k-sparse if it has at most k non-zero entries. The problem is to test whether f is k-sparse or no after it has been subject to a sequence of coordinate wise updates in arbitrary order, that is, f is the frequency vector of a data stream. More formally, a data stream over the domain [n] = {1, 2, . . . , n} is a sequence _ of records of the form (index, i, v), where, index is the position of the record in the sequence, i 2 [n] and v 2 Z. Associated with each data stream _ is an n-dimensional frequency vector f(_), such that fi(_) is the frequency of i, or the cumulative sum of the updates to fi(_), made by the sequence _. That is, fi(_) = X (index,i,v)2_ v, i 2 [n] . The k-sparsity testing problem is as follows: design a data structure, referred to as a k-sparsity tester, that (a) processes any stream _ of updates over the domain [n], and, (b) provides a test to check whether f(_) is k-sparse, that is, whether, f has at most k non-zero entries. The problem is to obtain solutions whose space requirement is o(n). We first review work on the following well-studied and closely related problem, namely, k-sparse vector reconstruction problem, where it is required to design a structure that can process a data stream _ and can retrieve the frequency vector f(_) provided f(_) is k-sparse. However, the structure is not required to actually test whether f(_) is k-sparse or not and may present an incorrect answer if f(_) is not k-sparse. Let m denote maxni =1|fi|. It is easy to show [15] that the k-sparse reconstruction problem requires (k log(mn/k)) bits. Minsky et. al. [22] study a constrained version of the k-sparse vector reconstruction problem where f(_) 2 {−1, 0, 1}n and present a space-optimal algorithm for this scenario. Eppstein and Goodrich [11] present a space-optimal solution for the case when f(_) 2 {0, 1}n. The k-set structure [15] presents a k-sparse vector reconstruction technique for the general case when f 2 {−m, . . . ,m}n. We reproduce their theorem since we will refer to it later.
Download full report
http://googleurl?sa=t&source=web&cd=1&ve...Fisc08.pdf&ei=I7VEToyXK4HVrQea_qztAw&usg=AFQjCNH5gT4FnolID7KRu33yeg_e2PfZig&sig2=Q-Hvr532MyhHG-74Ps1a-Q