ASK HERE

smart paper boy · 12-08-2011, 10:32 AM

Abstract.
Space-economical estimation of the pth frequency moments, defined as Fp = Pn i=1|fi|p, for p > 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine learning, and in data stream computation. Random sketches formed by the inner product of the frequency vector f1, . . . , fn with a suitably chosen random vector were pioneered by Alon, Matias and Szegedy [1], and have since played a central role in estimating Fp and for data stream computations in general. The concept of p-stable sketches formed by the inner product of the frequency vector with a random vector whose components are drawn from a p-stable distribution, was proposed by Indyk [11] for estimating Fp, for 0 < p < 2, and has been further studied in Li [13]. In this paper, we consider the problem of estimating Fp, for 0 < p < 2. A disadvantage of the stable sketches technique and its variants is that they require O( 1 _2 ) inner-products of the frequency vector with dense vectors of stable (or nearly stable [14, 13]) random variables to be maintained. This means that each stream update can be quite time-consuming. We present algorithms for estimating Fp, for 0 < p < 2, that does not require the use of stable sketches or its approximations. Our technique is elementary in nature, in that, it uses simple randomization in conjunction with well-known summary structures for data streams, such as the COUNT-MIN sketch [7] and the COUNTSKETCH structure [5]. Our algorithms require space ˜O( 1 _2+p ) 3 to estimate Fp to within 1 ± _ factors and requires expected time O(log F1 log 1 _ ) to process each update. Thus, our technique trades an O( 1 _p ) factor in space for much more efficient processing of stream updates. We also present a stand-alone iterative estimator for F1.
1 Introduction
Recently, there has been an emergence of monitoring applications in diverse areas including network traffic monitoring, network topology monitoring, sensor networks, financial market monitoring, and web-log monitoring. In these applications, data is generated rapidly and continuously, and must be analyzed efficiently, in real-time and in a single-pass over the data to identify large trends, anomalies, user-defined exception conditions, and so on. In many of these applications, it is often required to continuously track the “big picture”, or an aggregate view of the data, as opposed to a detailed view of the data. In such scenarios, efficient approximate computation is often acceptable.

Download full report
http://googleurl?sa=t&source=web&cd=1&ve...ndom04.pdf&ei=p7NETpSQBIXnrAeqs4GKCw&usg=AFQjCNF4bINQaGweVG9AEGK7HHUiQbdqaw&sig2=-xOxrel1DMT2LHMrH6_KOw

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	A Link-Based Cluster Ensemble Approach for Categorical Data Clustering		1	1,161	16-02-2017, 10:51 AM Last Post: jaseela123d
	Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic To		1	819	14-02-2017, 04:15 PM Last Post: jaseela123d
	Remote Server Monitoring System For Corporate Data Centers	smart paper boy	3	2,949	28-03-2016, 02:51 PM Last Post: dhanabhagya
	Secured Data Hiding and Extractions Using BPCS	project report helper	4	3,743	04-02-2016, 12:52 PM Last Post: seminar report asees
	Data Hiding in Binary Images for Authentication & Annotation	project topics	2	1,892	06-11-2015, 02:27 PM Last Post: seminar report asees
	DATA LEAKAGE DETECTION	project topics	16	13,270	31-07-2015, 02:59 PM Last Post: seminar report asees
	Privacy Preservation in Data Mining	sajidpk123	3	3,064	13-11-2014, 10:48 PM Last Post: jaseela123d
	projects on data mining?	shakir_ali	2	2,112	05-11-2014, 09:30 PM Last Post: jaseela123d
	data mining full report	project report tiger	25	171,532	07-10-2014, 09:10 PM Last Post: ToPWA
	Data Security Using Honey Pot System	computer science topics	5	6,773	11-09-2014, 07:45 PM Last Post: erhhk

Important Note..!

ASK HERE