A hybrid technique for estimating frequency moments over data streams
#1

Abstract.
The problem of estimating the kth frequency moment Fk, for any
non-negative integral value of k, over a data stream by looking at the items exactly
once as they arrive, was considered in a seminal paper by Alon, Matias and
Szegedy [1, 2]. They present a sampling based algorithm to estimate Fk where,
k  2, using space ˜O(n1−1/k)). Coppersmith and Kumar [7] and [10], using
different methods, present algorithms for estimating Fk with space complexity
˜O
(n1−1/(k−1)). In this paper, we present an algorithm for estimating Fk with
space complexity ˜O(n1−2/(k+1)), for k > 2, thereby, improving the space complexity
compared to the algorithms in [1, 2, 7, 10] for k  4.
1 Introduction
Data streams are characterized by large volumes of data that arrive rapidly and continuously.
Due to the volume of the data, it is desirable to design algorithms that estimate
metrics over the data streams, with sub-linear space complexity. One such metric is
frequency moments, studied in a seminal paper by Alon, Matias and Szegedy [1, 2]. In
this paper, we study and present novel algorithms for this problem.
We view a stream as a sequence of arrivals of items l, where, l is the identity of
an item. The items are assumed to draw their identities from the domain [N]. The frequency
of an item with identity l, denoted by fl, is the number of occurrences of l since
the inception of the stream. Thus, each arrival of an item increases its frequency by
1. The kth frequency moment of the stream, denoted by Fk, is defined as
P
l fk
l , for
k > 2 and k integral. This problem was first studied in a seminal paper by Alon, Matias
and Szegedy [1, 2]. It is interesting for several reasons. Historically, it is one of the very
first problems studied in the data streaming model and helped to consolidate the area
of data streaming algorithms. Secondly, the techniques presented in [1, 2] have led to
new solution techniques for several practical problems on data streams, for example,
maintenance of histograms [12] and maintenance of the top-k frequent items [6].
1.1 Prior Work
The kth moment estimation algorithm, presented in [1, 2] (which we call the AMS algorithm)
is based on sampling and estimates Fk, for k  2, to within any specified
approximation parameter 0 <  < 1, and with confidence exceeding a userspecified
parameter  < 1. The space complexity of the AMS algorithm


Download full report
http://googleurl?sa=t&source=web&cd=1&ve...bridFk.pdf&ei=k8dETtnnHYHKrAfI3qHwAw&usg=AFQjCNHXqq-ePQmxocd3piP1Mtcx8in9fA&sig2=fcsVH2cbDst9pWdOjdziYA
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: spire efficient data inference and compression over rfid streams ppt, data security technique, clustering data streams, spire efficient data inference and compression over rfid streams, krawtchouk moments matlab code, data rate frequency, hybrid technique,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering 1 1,086 16-02-2017, 10:51 AM
Last Post: jaseela123d
  Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic To 1 768 14-02-2017, 04:15 PM
Last Post: jaseela123d
  Remote Server Monitoring System For Corporate Data Centers smart paper boy 3 2,853 28-03-2016, 02:51 PM
Last Post: dhanabhagya
  Secured Data Hiding and Extractions Using BPCS project report helper 4 3,672 04-02-2016, 12:52 PM
Last Post: seminar report asees
  Data Hiding in Binary Images for Authentication & Annotation project topics 2 1,836 06-11-2015, 02:27 PM
Last Post: seminar report asees
  DATA LEAKAGE DETECTION project topics 16 13,118 31-07-2015, 02:59 PM
Last Post: seminar report asees
  Privacy Preservation in Data Mining sajidpk123 3 2,974 13-11-2014, 10:48 PM
Last Post: jaseela123d
  projects on data mining? shakir_ali 2 2,047 05-11-2014, 09:30 PM
Last Post: jaseela123d
  data mining full report project report tiger 25 171,259 07-10-2014, 09:10 PM
Last Post: ToPWA
  Data Security Using Honey Pot System computer science topics 5 6,702 11-09-2014, 07:45 PM
Last Post: erhhk

Forum Jump: