DATA MINING AND WARE HOUSING
#5
[attachment=12592]
1. INTRODUCTION:
A data warehouse is a relational database management system designed specifically to meet the needs of transaction processing systems. Data warehousing is a new powerful technique making it possible to extract archived operational data and overcome inconsistencies between different legacy data formats. Data warehouses contain consolidated data from many sources, with summary information and covering a long time period. The sizes of data warehouses ranging from several gigabytes to terabytes are common. Data warehousing technology comprises a set of new concepts and tools, which support the knowledge worker (executive, manager and analyst) with information material for decision making. Thus, the Data warehousing is the process of extracting and transforming operational data into informational data and loading it into a central data store or warehouse.
Data Mining or Knowledge Discovery in Databases (KDD) is the nontrivial extraction of implicit, previously unknown, and useful information from data. Data mining can be defined as "a decision support process in which we search for patterns of information in data". Data mining uses sophisticated statistical analysis and modeling techniques to find patterns and relationships hidden in organizational databases. Once found, the information needs to be presented in a suitable form, with graphs, reports etc. Data Mining includes a number of different technical approaches for extraction of information such as clustering, data summarization, learning classification rules, finding dependency networks, analysing changes, and detecting anomalies. Basically data mining is concerned with the analysis of data and the use of software techniques for finding patterns and regularities in sets of data.
2. DATA WAREHOUSING (DWH):
The fundamental reason for building a data warehouse is to improve the quality of information in the organization. Data coming from internal and external sources, existing in a variety of forms from traditional structural data to unstructured data like text files or multimedia is cleaned and integrated into a single repository. The need of data warehousing is that information systems must be distinguished into operational and informational systems. Operational systems support the day-to-day conduct of the business, and are optimized for fast response time of predefined transactions, with a focus on update transactions. Operational data is a current and real-time representation of the Business State. In contrast, informational systems are used to manage and control the business. They support the analysis of data for decision making about how the enterprise will operate now and in the future. A data warehouse can be normalized or denormalized. It can be a relational database, multidimensional database, flat file, hierarchical database, object database etc. And data warehouses often focus on a specific activity or entity.
2.1 Characteristics of a Data warehouse:
There are generally four characteristics that describe a data warehouse:
1) Subject-oriented: Data are organized according to subject instead of application e.g. an insurance company using a data warehouse would organize their data by customer, premium, and claim, instead of by different products (auto, life, etc.). The data organized by subject contain only the information necessary for decision support processing.
2) Integrated: When data resides in many separate applications in the operational environment, encoding of data is often inconsistent. For instance, in one application, gender might be coded as "m" and "f" in another by 0 and 1. When data are moved from the operational environment into the data warehouse, they assume a consistent coding convention e.g. gender data is transformed to "m" and "f".
3) Time-variant: The data warehouse contains a place for storing data that are 5 to 10 years
old, or older, to be used for comparisons, trends, and forecasting. These data are not updated.
4) Non-volatile: Data are not updated or changed in any way once they enter the data warehouse, but are only loaded and accessed. Modifications of the warehouse data take place only when modifications of the source data are propagated into the warehouse.
5) Derived Data: A data warehouse contains usually additional data, not explicitly stored in the operational sources, but derived through some process from operational data called as derived data. For example, operational sales data could be stored in several aggregation levels (weekly, monthly, quarterly sales) in the warehouse
2.2 Data warehouse systems:
A data warehouse system (DWS) comprises the data warehouse and all components used for building, accessing and maintaining the DWH as shown in Figure 1. The center of a data warehouse system is the data warehouse itself. The data acquisition includes all programs, applications and legacy systems interfaces that are responsible for extracting data from operational sources, preparing and loading it into the warehouse. The access component includes all different applications that make use of the information stored in the warehouse. The typical components of a DWS are as follows:
1) Pre-Data Warehouse
2) Data Acquisition
3) Data Repositories
4) Front End Analytics
1) Pre-Data Warehouse: The pre-Data Warehouse zone provides the data for data warehousing. OLTP databases are where operational data are stored. OLTPs are design for transaction speed and accuracy. Organizations daily operations access and modify operational databases. Data from these opertional databases and any other external data sources are extraced by using interfaces such as JDBC.The Metadata Repository keeps the track of data currently stored in the DWH. Metadata ensures the accuracy of data entering into the DWH. Meta-data ensures that data has the right format and relevancy. The Meta data is "data about data or data describing the meaning of data”.
2) Data Acquisition: Data acquisition is achieved by using following five steps:
a) Extract: Data is extracted from opertational databases and external sources by using interfaces such as JDBC.
b) Clean: Data is cleaned to minimize errors, fill in missing information and removal of as low-level transaction information, which slow down the query times.
c) Transform: The data is transformed to enrich data to correct values & reconcile differences between multiple sources, due to the use of homonyms, synonyms or different units of measurement.
d) Load: The cleaned & transformed data is finally loaded into the warehouse. Additional preprocessing such as sorting and generation of summary information is carried out at this stage. Data is partitioned and indexes are built for efficiency. Due to large volume of data, loading is a slow process.
e) Refresh: Data in the data warehouse is periodically refreshed to reflect updates to the data sources.
3) Data Repositories: The Data Warehouse repository is the database that stores active data of business value for an organization. There are variants of Data Warehouses - Data Marts and ODS. Data Marts are smaller Data Warehouses built on a departmental rather than on a company-wide level. Instead of running ad hoc queries against a huge data warehouse, data marts allow the efficient execution of predicted queries over a significantly smaller database. Data Warehouses collects data and is the repository for historical data. Hence it is not always efficient for providing up-to-date analysis. Hence, the ODS, Operational Data Stores are used. ODS are used to hold recent data before migration to the Data Warehouse.
4) Front End Analytics: Different users to interact with data stored in the repositories use the front-end Analytics potion of the Data Warehouse. Data Mining is the discovery of useful patterns in data. Data Mining are used for prediction analysis and classification. OLAP, Online Analytical Processing, is used to analyze historical data and slice the business information required. Reporting tools are used to provide reports on the data. Data are displayed to show relevancy to the business and keep track of key performance indicators. Data Visualization tools is used to display data from the data repository. Data visualization is combined with Data Mining and OLAP tools. Data visualization shows relevancy and patterns.
2.3 Stages in Implementation: A DW implementation requires the integration of implementation of many products. Following are the steps of implementation:
Step1: Collect and analyze the business requirements.
Step2: Create a data model and physical design for the DW.
Step3: Define the Data sources.
Step4: Choose the DBMS and software platform for DW.
Step5: Extract the data from the operational data sources, transfer it, clean it & load into the DW model or data mart.
Step6: Choose the database access and reporting tools.
Step7: Choose the database connectivity software.
Step8: Choose the data anlysis and presentation software.
Step9: Keep refreshing the data warehouse periodically.
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: dfd for online housing, nj governors housing, materialized, stella wright housing, sanitary ware companies, ware college, data mining and data wear housing seminar topics,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Messages In This Thread
DATA MINING AND WARE HOUSING - by project topics - 02-04-2010, 03:36 PM
RE: DATA MINING AND WARE HOUSING - by Sidewinder - 29-05-2010, 09:52 PM
RE: DATA MINING AND WARE HOUSING - by seminar class - 22-04-2011, 09:35 AM

Possibly Related Threads...
Thread Author Replies Views Last Post
  Block Chain and Data Science jntuworldforum 0 8,370 06-10-2018, 12:15 PM
Last Post: jntuworldforum
  Data Encryption Standard (DES) seminar class 2 9,421 20-02-2016, 01:59 PM
Last Post: seminar report asees
  Skin Tone based Secret Data hiding in Images seminar class 9 7,089 23-12-2015, 04:18 PM
Last Post: HelloGFS
Brick XML Data Compression computer science crazy 2 2,428 07-10-2014, 09:26 PM
Last Post: seminar report asees
  Data Security in Local Network using Distributed Firewalls computer science crazy 10 15,254 30-03-2014, 04:40 AM
Last Post: Guest
  GREEN CLOUD -A Data Center Approach computer topic 0 1,565 25-03-2014, 10:13 PM
Last Post: computer topic
  3D-OPTICAL DATA STORAGE TECHNOLOGY computer science crazy 3 8,560 12-09-2013, 08:28 PM
Last Post: Guest
  Security in Data Warehousing seminar surveyer 3 10,239 12-08-2013, 10:24 AM
Last Post: computer topic
  data warehousing concepts project topics 7 7,168 05-02-2013, 12:00 PM
Last Post: seminar details
Star DATA MINING AND WAREHOUSE seminar projects crazy 2 3,399 05-02-2013, 12:00 PM
Last Post: seminar details

Forum Jump: