DATA MINING AND WAREHOUSE
#1
Star 

Advances in data gathering, storage and distribution have created need for computational tools and techniques to aid in data analysis, data mining and knowledge discovery data base. It is a rapidly growing area of research and application that builds on techniques and theories from many fields including statistics, databases, pattern recognition and learning, data visualization, etc. The topic is based on data mining, which is the process of extraction of unknown and potentially useful information from data in databases. Various tools and algorithms are used in the process of data mining. It is a relatively new concept, which is advancing in recent years. At present we go through the evolution, foundation, tools, techniques and problems faced by data mining.
Reply
#2
DATA MINING

1.1 Introduction

The past two decades has seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation of data has taken place at an explosive rate. It has been estimated that the amount of information in the world doubles every 20 months and the size and number of databases are increasing even faster. The increase in use of electronic data gathering devices such as point-of-sale or remote sensing devices has contributed to this explosion of available data. The problem of effectively utilizing these massive volumes of data is becoming a major problem for all enterprises.

Data storage became easier as the availability of large amounts of computing power at low cost ie the cost of processing power and storage is falling, made data cheap. There was also the introduction of new machine learning methods for knowledge representation based on logic programming etc. in addition to traditional statistical analysis of data. The new methods tend to be computationally intensive hence a demand for more processing power.

It was recognized that information is at the heart of business operations and that decision-makers could make use of the data stored to gain valuable insight into the business. Database Management systems gave access to the data stored but this was only a small part of what could be gained from the data. Traditional on-line transaction processing systems, OLTPs, are good at putting data into databases quickly, safely and efficiently but are not good at delivering meaningful analysis in return. Analyzing data can provide further knowledge about a business by going beyond the data explicitly stored to derive knowledge about the business. This is where Data Mining has obvious benefits for any enterprise.

1.2 What is Data Mining?

1.2.1 Definition

Researchers William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus have defined Data Mining as:

Data mining is the search for relationships and global patterns that exist in large databases but are `hidden' among the vast amount of data, such as a relationship between patient data and their medical diagnosis. These relationships represent valuable knowledge about the database and the objects in the database and, if the database is a faithful mirror, of the real world registered by the database.

The analogy with the mining process is described as:

Data mining refers to "using a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it; it is the hidden information in the data that is useful", Clementine User Guide, a data mining toolkit.

1.2.2 Explanation

Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing business to make proactive knowledge driven decisions. The automated, prospective analysis offered by data mining move beyond the analysis of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

The data mining process consists of three basic stages: exploration, model building and pattern definition. Fig. 1.1 shows a simple data mining structure.

Text Box: Outcome Prediction

Fig. 1.1 Data Mining Structure

Basically data mining is concerned with the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. It is the computer which is responsible for finding the patterns by identifying the underlying rules and features in the data. The idea is that it is possible to strike gold in unexpected places as the data mining software extracts patterns not previously discernable or so obvious that no-one has noticed them before.

Data mining analysis tends to work from the data up and the best techniques are those developed with an orientation towards large volumes of data, making use of as much of the collected data as possible to arrive at reliable conclusions and decisions. The analysis process starts with a set of data, uses a methodology to develop an optimal representation of the structure of the data during which time knowledge is acquired. Once knowledge has been acquired this can be extended to larger sets of data working on the assumption that the larger data set has a structure similar to the sample data. Again this is analogous to a mining operation where large amounts of low grade materials are sifted through in order to find something of value.

1.2.3 Example

A home finance loan actually has an average life span of only 7 to 10 years, due to prepayment. Prepayment means, the loan is paid off early, rather than at the end of, say 25 years. People prepay loans when they refinance or when they sell their home. The financial return that a home-finance derives from a loan depends on its life span. Therefore it is necessary for the financial institutions to be able to predict the life spans of their loans. Rule discovery techniques are used to accurately predict the aggregate number of loan payments in a given quarter (or in a year), as a function of prevailing interest rates, borrower characteristics, and account data. This information can be used to finetune loan parameters such as interest rates, points and fees, in order to maximize profits.

1.3 Knowledge Discovery in Database (KDD)

1.3.1 KDD and Data Mining

Knowledge Discovery in Database (KDD) was formalized in 1989, with reference to the general concept of being broad and high level in pursuit of seeking knowledge from data. The term data mining was then coined; this high-level application technique is used to present and analyze data for decision-makers.

Data mining is only one of the many steps involved in knowledge discovery in databases. The KDD process tends to be highly iterative and interactive. Data mining analysis tends to work up from the data and the best techniques are developed with an orientation towards large volumes of data, making use of as much data as possible to arrive at reliable conclusions and decisions. The analysis process starts with a set of data, and uses a methodology to develop an optimal representation of the structure of data, during which knowledge is acquired. Once knowledge is acquired, this can be extended to large sets of data on the assumption that the large data set has a structure similar to the simple data set.

Fayyad distinguishes between KDD and data mining by giving the following definitions:

Knowledge discovery in databases is the process of identifying a valid, potentially useful and ultimately understandable structure in data.

Data mining is a step in the KDD process concerned with the algorithmic means by which patterns or structures are enumerated from the data under acceptable computational efficiency limits.

The structures that are the outcome of the data mining process must meet certain conditions so that these can be considered as knowledge. These conditions are: validity, understandability, utility, novelty and interestingness.

1.3.2 Stages of KDD

The stages of KDD, starting with the raw data and finishing with the extracted knowledge, are given below.



Selection: This stage is concerned with selecting or segmenting the data that are relevant to some criteria. E.g.: for credit card customer profiling, we extract the type of transactions for each type of customers and we may not be interested in the details of the shop where the transaction takes place.

Preprocessing: Preprocessing is the data cleaning stage where unnecessary information is removed. E.g.: it is unnecessary to note the sex of a patient when studying pregnancy. This stage reconfigures the data to ensure a consistent format, as there is a possibility of inconsistent formats.

Transformation: The data is not merely transferred across, but transformed in order to be suitable for the task of data mining. In this stage, the data is made usable and navigable.

Data Mining: This stage is concerned with the extraction of patterns from the data.

Interpretation and Evaluation: The patterns obtained in the data mining stage are converted into knowledge, which in turn, is used to support decision-making.
Reply
#3



to get information about the topic "data warehouse computer science"full report ppt and related topic refer the page link bellow

http://studentbank.in/report-dbms-data-m...arehousing

http://studentbank.in/report-integration...ng-systems

http://studentbank.in/report-a-survey-of...on-and-lib
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: warehouse, dat mining viva, warehouse management, mining chile, data security in data warehouse ppt, warehouse management system india, warehouse management system project,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Block Chain and Data Science jntuworldforum 0 8,051 06-10-2018, 12:15 PM
Last Post: jntuworldforum
  Data Encryption Standard (DES) seminar class 2 9,353 20-02-2016, 01:59 PM
Last Post: seminar report asees
  Skin Tone based Secret Data hiding in Images seminar class 9 7,017 23-12-2015, 04:18 PM
Last Post: HelloGFS
Brick XML Data Compression computer science crazy 2 2,387 07-10-2014, 09:26 PM
Last Post: seminar report asees
  Data Security in Local Network using Distributed Firewalls computer science crazy 10 14,926 30-03-2014, 04:40 AM
Last Post: Guest
  GREEN CLOUD -A Data Center Approach computer topic 0 1,536 25-03-2014, 10:13 PM
Last Post: computer topic
  3D-OPTICAL DATA STORAGE TECHNOLOGY computer science crazy 3 8,511 12-09-2013, 08:28 PM
Last Post: Guest
  Security in Data Warehousing seminar surveyer 3 9,929 12-08-2013, 10:24 AM
Last Post: computer topic
  data warehousing concepts project topics 7 7,122 05-02-2013, 12:00 PM
Last Post: seminar details
Thumbs Up Fiber Distributed Data Interface Computer Science Clay 1 8,290 23-01-2013, 03:48 PM
Last Post: seminar details

Forum Jump: