Student Seminar Report & Project Report With Presentation (PPT,PDF,DOC,ZIP)

Full Version: Automotive Stream Based Data Management
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
An embedded system is a special-purpose system in which the computer is completely encapsulated by the device it controls. Unlike a general-purpose computer, such as a personal computer, an embedded system performs one or a few pre-defined tasks, usually with very specific requirements. Since the system is dedicated to specific tasks, design engineers can optimize it, reducing the size and cost of the product.

http://ziddudownload/5571379/Automotivebasedmanag.rar.html
[attachment=11674]
Stream-based Data Management
Characteristics of Data Streams

 Data Streams
 Data streams—continuous, ordered, changing, fast, huge amount
 Traditional DBMS—data stored in finite, persistent data sets
 Characteristics
 Huge volumes of continuous data, possibly infinite
 Fast changing and requires fast, real-time response
 Data stream captures nicely our data processing needs of today
 Random access is expensive—single linear scan algorithm (can only have one look)
 Store only the summary of the data seen thus far
 Most stream data are at pretty low-level or multi-dimensional in nature, needs multi-level and multi-dimensional processing
Stream Data Applications
 Telecommunication calling records
 Business: credit card transaction flows
 Network monitoring and traffic engineering
 Financial market: stock exchange
 Engineering & industrial processes: power supply & manufacturing
 Sensor, monitoring & surveillance: video streams
Security monitoring
 Web logs and Web page click streams
 Massive data sets (even saved but random access is too expensive)
 Data Streams vs. Data Sets
Data Sets: Data Streams:
 Updates infrequent
 Using Traditional Database
 Data Streams Paradigm
 Data Streams Paradigm
 DBMS versus DSMS
• Persistent relations
• Transient streams (and persistent relations)
 DBMS versus DSMS
• Persistent relations
• Transient streams (and persistent relations)
 DBMS versus DSMS
• Persistent relations
• Transient streams (and persistent relations)
 DBMS versus DSMS
• Persistent relations
• Transient streams (and persistent relations)
 DBMS versus DSMS
• Persistent relations
• Transient streams (and persistent relations)
Challenges of Stream Data Processing
 Multiple, continuous, rapid, time-varying, ordered streams
 Main memory computations
 Queries are often continuous
 Evaluated continuously as stream data arrives
 Answer updated over time
 Queries are often complex
 Beyond element-at-a-time processing
 Beyond relational queries (scientific, data mining, OLAP)
 Multi-level/multi-dimensional processing and data mining
 Most stream data are at pretty low-level or multi-dimensional in nature
Processing Stream Queries
 Query types
 One-time query vs. continuous query (being evaluated continuously as stream continues to arrive)
 Predefined query vs. ad-hoc query (issued on-line)
 Unbounded memory requirements
 For real-time response, main memory algorithm should be used
 Memory requirement is unbounded if one will join future tuples
 Approximate query answering
 With bounded memory, it is not always possible to produce exact answers
 High-quality approximate answers are desired
 Data reduction and synopsis construction methods
 Sketches, random sampling, histograms, wavelets, etc.
Methods for Approximate Query Answering
 Sliding windows
 Only over sliding windows of recent stream data
 Approximation but often more desirable in applications
 Batched processing, sampling and synopses
 Batched if update is fast but computing is slow
 Compute periodically, not very timely
 Sampling if update is slow but computing is fast
 Compute using sample data, but not good for joins, etc.
 Synopsis data structures
 Maintain a small synopsis or sketch of data
 Good for querying historical data
 Blocking operators, e.g., sorting, avg, min, etc.
 Blocking if unable to produce the first output until seeing the entire input
 Projects on DSMS (Data Stream Management System)
Research projects and system prototypes
 STREAM (Stanford): A general-purpose DSMS
 Cougar (Cornell): sensors
 Aurora (Brown/MIT): sensor monitoring, dataflow
 Hancock (AT&T): telecom streams
 Niagara (OGI/Wisconsin): Internet XML databases
 OpenCQ (Georgia Tech): triggers, incr. view maintenance
 Tapestry (Xerox): pub/sub content-based filtering
 Telegraph (Berkeley): adaptive engine for sensors
 Tradebot (tradebot.com): stock tickers & streams
 Tribeca (Bellcore): network monitoring
 Streaminer (UIUC): new project for stream data mining
Stream Data Mining vs. Stream Querying
 Stream mining—A more challenging task
 It shares most of the difficulties with stream querying
 Patterns are hidden and more general than querying
 It may require exploratory analysis
 Not necessarily continuous queries
 Stream data mining tasks
 Multi-dimensional on-line analysis of streams
 Mining outliers and unusual patterns in stream data
 Clustering data streams
 Classification of stream data
Challenges for Mining Unusual Patterns in Data Streams
 Most stream data are at pretty low-level or multi-dimensional in nature: needs ML/MD processing
 Analysis requirements
 Multi-dimensional trends and unusual patterns
 Capturing important changes at multi-dimensions/levels
 Fast, real-time detection and response
 Summary
 Stream data analysis: A rich and largely unexplored field
 Current research focus in database community: DSMS system architecture, continuous query processing, supporting mechanisms
 Stream data mining and stream OLAP analysis
 Powerful tools for finding general and unusual patterns
 Largely unexplored: current studies only touched the surface
 Lots of exciting issues in further study
 A promising one: Multi-level, multi-dimensional analysis and mining of stream data
 What Is A Continuous Query ?
Query which is issued once and logically run continuously.
 What is Continuous Query ?
Query which is issued once and run continuously.
 What is Continuous Query ?
Query which is issued once and run continuously.
 Special Challenges
Timely online answers even for rapid data streams
Ability of fast access to large portions of data
Processing of multiple streams simultaneously
 Making Things Concrete
 Making Things Concrete
 Database = two streams of mobile call records
 Outgoing(connectionID, caller, start, end)
 Incoming(connectionID, callee, start, end)
 Query language = SQL
FROM clauses can refer to streams and/or relations
 Query 1 (self-join)
Find all outgoing calls longer than 2 minutes
SELECT O1.call_ID, O1.caller
FROM Outgoing O1, Outgoing O2
WHERE (O2.time – O1.time > 2
AND O1.call_ID = O2.call_ID
AND O1.event = start
AND O2.event = end)
 Result requires unbounded storage
 Can provide result as data stream
 Can output after 2 min, without seeing end
 Query 2 (join)
Pair up callers and callees
SELECT O.caller, I.callee
FROM Outgoing O, Incoming I
WHERE O.call_ID = I.call_ID
 Can still provide result as data stream
 Requires unbounded temporary storage …
 … unless streams are near-synchronized
 Query 3 (group-by aggregation)
Total connection time for each caller
SELECT O1.caller, sum(O2.time – O1.time)
FROM Outgoing O1, Outgoing O2
WHERE (O1.call_ID = O2.call_ID
AND O1.event = start
AND O2.event = end)
GROUP BY O1.caller
 Cannot provide result in (append-only) stream.
Alternatives:
• Output stream with updates
• Provide current value on demand
• Keep answer in memory
Conclusions
 Conventional DBMS technology is inadequate
 We need reconsider all aspects of data management and processing in presence of data streams