can u plz provide me ppt on this project
thanks the first time I use this site
i need documentation for documentation for privacy preserving decision tree learning using unrealized data sets
i need documentation for documentation for privacy preserving decision tree learning using unrealized data sets..............etc
Posts: 2,532
Threads: 0
Joined: Aug 2016
Classification is a classical problem in machine learning
and data mining. Given a set of training data tuples, each
having a class label and being represented by a feature vector,
the task is to algorithmically build a model that predicts
the class label of an unseen test tuple based on the tuple’s
feature vector. One of the most popular classification models
is the decision tree model. Decisions trees are popular because
they are practical and easy to understand. Rules can also be
extracted from decision trees easily. Many algorithms, such
as ID3 and C4.5 have been devised for decision tree
construction. These algorithms are widely adopted and used
in a wide range of applications such as image recognition,
medical diagnosis, credit rating of loan applicants, scientific
tests, fraud detection, and target marketing.
In traditional decision-tree classification, a feature (an attribute)
of a tuple is either categorical or numerical. For the
latter, a precise and definite point value is usually assumed.
In many applications, however, data uncertainty is common.
The value of a feature/attribute is thus best captured by not
a single point value, but by a range of values giving rise to
a probability distribution. Data uncertainty arises naturally in
many applications due to various reasons: measurement errors,
data staleness, repeated measurements, limitations of the data
collection process, etc.
A simple way to handle data uncertainty is to abstract
probability distributions by summary statistics such as means
and variances. We call this approach Averaging. Another
approach is to consider the complete information carried by
the probability distributions to build a decision tree. We call
this approach Distribution-based.
In this paper we study the problem of constructing decision
tree classifiers on data with uncertain numerical attributes. Our
goals are (1) to devise an algorithm for building decision trees
from uncertain data using the Distribution-based approach; (2)
to investigate whether the Distribution-based approach could
lead to a higher classification accuracy compared with the Averaging
approach; and (3) to establish a theoretical foundation
on which pruning techniques are derived that can significantly
improve the computational efficiency of the Distribution-based
algorithms.