Simpler algorithm for estimating frequency moments of data streams
#1

Abstract
The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodru® [11] have presented the ¯rst algorithm for estimating Fk, for k > 2, using space ~O (n1¡2=k), matching the space lower bound (up to poly-logarithmic factors) for this problem [1, 2, 3, 4, 13] (n is the number of distinct items occurring in the stream.) In this paper, we present a simpler 1-pass algorithm for estimating Fk.
1 Introduction
Data streaming systems have many natural applica- tions, such as database systems, network monitoring, sensor networks, RF-id data management, etc.. These applications are characterized by rapidly arriving vo- luminous data which makes it di±cult to store the data in its entirety for either online processing or post- processing. Therefore, there has been a substantial interest in the design of algorithms that process data streams using single-pass (or online) algorithms that re- quire sub-linear space. We view a data stream as a sequence of arrivals of the form (i; v), where, i is the identity of an item that is a member of the domain D = f0; 1; : : : ;N ¡ 1g, and v is the change in the frequency of the item. The value v may be either greater than or less than zero; v ¸ 1 signi¯es v insertions of the item i, and v • ¡1 signi¯es v deletions of i. The frequency of an item i is denoted by fi and is de¯ned as the sum of the changes to its frequency since the inception of the stream, that is, fi = P (i;v) appears in stream v. The kth frequency moment of the stream, denoted by Fk, is de¯ned as Fk = P i fk i . The problem is to design an algorithm, parameterized by accuracy parameter ² and con¯dence parameter ± that returns an estimate ^ Fk of Fk, for k > 2, in a single pass over the data stream, such that, Pr n j ^ Fk ¡ Fkj • ² ¢ Fk o ¸ 1 ¡ ±. Let n denote the number of items in the stream with positive frequencies and let m denote P i2D fi. The problem was introduced in [1, 2] who also present the ¯rst sub-linear space algorithm with space complexity ~O (n1¡1=k). (We say that f(n) is ~O (g(n)) if f(n) = O ³¡ 1 ² ¢O(1) (logm)O(1)(log n)O(1)g(n) ´ .) [6, 8] present algorithms with space complexity ~O (n1¡1=(k¡1)) and [7] presents an algorithm with space complexity ~O (n1¡2=(k+1)). A space lower bound of ­(n1¡2 k ) is shown for this problem in a series of contributions [1, 2, 3, 4] (see[13] for a closely related problem). Recently, Indyk and Woodru® [11] have presented the ¯rst algorithm for estimating Fk using space ~O(n1¡2=k), matching the space lower bound (up to poly-logarithmic factors) for this problem [4]. The space complexity of the algorithm of Indyk and Woodru® has high constants and poly-logarithmic factors. Speci¯cally, their 2-pass algorithm has space complexity of O( 1 ²12 n1¡2 k (log2 n)(log6m)). The 1-pass algorithm that is derived from this algorithm further multiplies the constant and poly-logarithmic factors. In this paper, we present a simpler algorithm for estimating Fk, for k > 2, whose space complexity is O( k2 ²2+4=k n1¡2 k (log2m)(logm + log n)). Broadly speak- ing, we use the seminal idea of Indyk and Woodru® [11] to classify items into groups, based on frequency. How- ever, Indyk and Woodru® de¯ne groups whose bound- aries are randomized; this technique contributes to the complexity of their algorithm. In our algorithm, the group boundaries are deterministic. Our analysis is sim- pler and uses the more traditional approach of directly calculating the expectation and the variance of the es- timator for Fk. Finally, our algorithm is naturally a one-pass algorithm. The remainder of the paper is organized as follows. In Section 2, we brie°y review the Countsketch algorithm[5] and the estimator for the residual second moment [9]. The data structure for the Fk estimator is presented in Section 3. The analysis of the estimator is presented in Section 4. Finally, we conclude in Section 6.


Download full report
http://googleurl?sa=t&source=web&cd=1&ve...soda06.pdf&ei=38BETuGmCcyurAfHvMzpAw&usg=AFQjCNFdzWexPT610Jv8sFQnZx4CmA2waQ&sig2=hQOTJ1DV81DzCZ0DjavQGA
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: krawtchouk moments matlab code, simpler algorithm matlab, radix2 matlab algorithm in frequency recursion, reconstruction 3d using krawtchouk moments, castle continuously anonymizing data streams ppt, calculations with hydraulic trolley jack moments, top 10 music moments of,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering 1 1,073 16-02-2017, 10:51 AM
Last Post: jaseela123d
  Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic To 1 758 14-02-2017, 04:15 PM
Last Post: jaseela123d
  An Efficient Algorithm for Mining Frequent Patterns full report project topics 3 4,737 01-10-2016, 10:02 AM
Last Post: Guest
  watermarking algorithm seminar class 3 2,668 27-04-2016, 11:17 AM
Last Post: dhanabhagya
  Remote Server Monitoring System For Corporate Data Centers smart paper boy 3 2,814 28-03-2016, 02:51 PM
Last Post: dhanabhagya
  Secured Data Hiding and Extractions Using BPCS project report helper 4 3,653 04-02-2016, 12:52 PM
Last Post: seminar report asees
  Data Hiding in Binary Images for Authentication & Annotation project topics 2 1,821 06-11-2015, 02:27 PM
Last Post: seminar report asees
  DATA LEAKAGE DETECTION project topics 16 13,058 31-07-2015, 02:59 PM
Last Post: seminar report asees
  DYNAMIC SEARCH ALGORITHM IN UNSTRUCTURED PEER-TO-PEER NETWORKS--PARALLEL AND DISTRIBU electronics seminars 9 7,335 14-07-2015, 02:25 PM
Last Post: seminar report asees
  Privacy Preservation in Data Mining sajidpk123 3 2,948 13-11-2014, 10:48 PM
Last Post: jaseela123d

Forum Jump: