Distributed Set-Expression Cardinality Estimation
#1

Abstract
We consider the problem of estimating set-expression cardinality in a distributed streaming environment where rapid update streams originating at remote sites are continually transmitted to a central processing system. At the core of our algorithmic solutions for answering set-expression cardinality queries are two novel techniques for lowering data communication costs without sacrificing answer precision. Our first technique exploits global knowledge of the distribution of certain frequently occurring stream elements to significantly reduce the transmission of element state information to the central site. Our second technical contribution involves a novel way of capturing the semantics of the input set expression in a boolean logic formula, and using models (of the formula) to determine whether an element state change at a remote site can affect the set expression result. Results of our experimental study with real-life as well as synthetic data sets indicate that our distributed set-expression cardinality estimation algorithms achieve substantial reductions in message traffic compared to naive approaches that provide the same accuracy guarantees.
1 Introduction
The widespread deployment of wireline and wireless networks linking together a broad range of devices has resulted in a new class of distributed data streaming applications. In these applications, rapid update streams originating at tens or hundreds of remote sites are continuously transmitted to a central processing system for online querying and analysis. Examples include monitoring of service provider network traffic statistics, telecommunication call detail records, Web usage logs, financial stock tickers, retail chain transactions, weather data, sensor data, and so on. Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004 An important consideration in the above-mentioned monitoring applications is the communication overhead imposed by the distributed query processing architecture on the underlying network. A naive approach in which every stream update is shipped to the central site for processing can lead to inordinate amounts of message traffic, and thus have a crippling effect on the communication infrastructure as well as the central processor. For instance, monitoring flow level information within AT&T’s IP backbone using Cisco’s NetFlow tool [1] is known to generate in excess of 500 GBytes of data per day [4]. Clearly, transmitting every flow record to the central network operations center of a large ISP can seriously strain its processing and network resources. As another example, consider wireless sensor networks (e.g., for environmentalmonitoring, inventory tracking, etc.), where sensors have a very limited battery life, and radio communication is much more expensive in terms of power consumption compared to processing. In order to ensure longer lifetimes for sensor nodes, it is critical to reduce the amount of data transmitted, even if that implies additional processing at the sensor nodes [13, 12, 10]. Fortunately, for many distributed stream-oriented applications, exact answers are not required and approximations with guarantees on the amount of error suffice. Thus, it is possible to trade answer accuracy for reduced data communication costs. For example, consider the problemof detecting distributed denial-of-service (DDoS) attacks by analyzing network flow information collected from an ISP’s border routers. In a typical DDoS attack scenario, hundreds of compromised “zombie” hosts flood a specific victim destination with large numbers of seemingly legitimate packets. Furthermore, in order to elude source identification, attackers typically forge, or “spoof”, the IP source address of each packet they send with a randomly-chosen address [11]. Consequently, a possible approach for detecting DDoS attacks is to look for sudden spikes in the number of distinct IP source addresses observed in the flows across the ISP’s border routers. Clearly, our DDoS monitoring application does not require IP source address counts to be tracked with complete precision. Approximate counts can be equally effective for the purpose of discerning DDoS activity as long as errors are small enough so as to not mask abrupt changes. Thus, depending on the accuracy requirements of the DDoS application, routers only need to transmit a subset of flow records to the central monitoring site


Download full report
http://googleurl?sa=t&source=web&cd=1&ve...prCard.pdf&ei=arFETvGSMYTrrQfW-ID2Aw&usg=AFQjCNFHxwp1ZtriYs7OlMfvURxPn-zVMg&sig2=EafQ8HyRIOtXaX0r36OyeA
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: facial expression interpretation, boolean expression for fourway traffic light, facial expression test, axiomatic expression, show text for face expression features in matlab, expression equivalence checking in computer algebra system, face expression recognition,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  SoC Estimation of Rechargeable Batteries full report smart paper boy 1 1,513 04-02-2013, 05:20 PM
Last Post: santosh_bangaru
  Distributed cache updating for the Dynamic source routing protocol computer science crazy 1 1,350 01-12-2012, 01:35 PM
Last Post: seminar details
  Distributed-Input-Distributed-Output (DIDO) Wireless Technology computer girl 0 1,356 06-06-2012, 05:26 PM
Last Post: computer girl
  BANDWIDTH ESTIMATION FOR IEEE 802.11 BASED ADHOC NETWORK computer science crazy 6 5,067 15-02-2012, 02:01 PM
Last Post: dtanvi
  ODAM: An Optimized Distributed Association Rule Mining Algorithm computer science crazy 5 3,358 23-01-2012, 11:56 AM
Last Post: seminar addict
  DISTRIBUTED GENERATION FOR RURAL AREAS ELECTRIFICATION BASED UPON RENEWABLE ENERGY RE seminar-database 4 3,471 16-09-2011, 09:34 AM
Last Post: seminar addict
  Identifying Legitimate Clients under Distributed Denial-of-Service Attacks smart paper boy 0 900 30-08-2011, 10:07 AM
Last Post: smart paper boy
  Processing Set Expressions over Continuous Update Streams smart paper boy 0 805 12-08-2011, 11:27 AM
Last Post: smart paper boy
  JoinDistinct Aggregate Estimation over Update Streams smart paper boy 0 745 12-08-2011, 10:44 AM
Last Post: smart paper boy
  Lower bounds on frequency estimation of data streams smart paper boy 0 666 12-08-2011, 10:20 AM
Last Post: smart paper boy

Forum Jump: