05-05-2011, 12:40 PM
Abstract
With the increasing of database applications, mining interestinginformation from huge databases becomes of most concern and avariety of mining algorithms have been proposed in recent years.As we know, the data processed in data mining may be obtainedfrom many sources in which different data types may be used.However, no algorithm can be applied to all applications due tothe difficulty for fitting data types of the algorithm, so the selectionof an appropriate mining algorithm is based on not only thegoal of application, but also the data fittability. Therefore, totransform the non-fitting data type into target one is also an importantwork in data mining, but the work is often tedious orcomplex since a lot of data types exist in real world. Merging thesimilar data types of a given selected mining algorithm into ageneralized data type seems to be a good approach to reduce thetransformation complexity. In this work, the data types fittabilityproblem for six kinds of widely used data mining techniques isdiscussed and a data type generalization process includingmerging and transforming phases is proposed. In the mergingphase, the original data types of data sources to be mined are firstmerged into the generalized ones. The transforming phase is thenused to convert the generalized data types into the target ones forthe selected mining algorithm. Using the data type generalizationprocess, the user can select appropriate mining algorithm just forthe goal of application without considering the data types.to choose an appropriate one by themselves. This is because thedata provided can not be directly used for data mining algorithms.Since most data mining algorithms can only be applied to somespecific data types, the types of data stored in databases restrictsthe choice of data mining methods. If certain kinds of knowledgeneed to be obtained using some data mining algorithms, datatypes transformation should be done first and this is what wecalled “the data types fittability problem” for data mining. For thetime being, there is no tool that can help users to do this kind ofdata types transformation. In this paper, we will survey and analyzethe data types fittability problem for data mining algorithms,and then we propose a “data types generalization process” tosolve the data types fittability problem for the attributes in relationaldatabases.The “data types generalization process” including mergingand transforming phases is a procedure to transform the data typesof atttributes contained in relations (tables). In the merging phase,the original data types of data sources to be mined are first mergedinto the generalized ones. The transforming phase is then used toconvert the generalized data types into the target ones for theselected mining algorithm. Using the data type generalizationprocess, the user can select appropriate mining algorithm just forthe goal of application without considering the data types.
2. Related work
1. Introduction
As mentioned above, because many data mining algorithmscan only be applied to the data types with restricted range,In recent years, the amount of various data grows rapidly. users possibly need to do data types transformation before theWidely available, low-cost computer technology now makes it selected algorithm has been executed. In this paper, we propose apossible to both collect historical data and also institute on-line general concept called “data types generalization process“ whichanalysis for newly arriving data. Automated data generation and provide a procedure for doing this kind of data types transformagatheringleads to tremendous amounts of data stored in databases. tion. Data types generalization can be seen as a pre-processing ofAlthough we are filled with data, but we lack for knowledge. Data data mining. Of course, other pre-processing such as data selecmining[4, 9, 161 is the automated discovery of non-trivial, tion, data cleaning, dimension (attribute) reduction, missing datapreviously unknown, and potentially useful knowledge embedded handling may also need to be performed before running the seindatabases. Different kinds of data mining methods and lected data mining algorithm. In summary, the whole process ofalgorithms have been proposed [4, 91, each of which has its own data mining is the so-called KDD
Download full report
http://ieeexplore.ieeeiel5/6569/17812/00...ber=823352