ASK HERE

seminar class · 12-05-2011, 12:35 PM

PRESENTED BY:
chandra shekher sharma

[attachment=13647]
DATA CLUSTRING AND TECHNIQUES
INTRODUCTION
Data clustering is a method in which we make cluster of objects that are somehow similar in characteristics. The criterion for checking the similarity is implementation dependent.Clustering is often confused with classification, but there is some difference between the two. In classification the objects are assigned to pre defined classes, whereas in clustering the classes are also to be defined.Cluster analysis groups objects (observations, events) based on the informationfound in the data describing the objects or their relationships. The goal is that the objectsin a group will be similar (or related) to one other and different from (or unrelated to) theobjects in other groups. The greater the similarity (or homogeneity) within a group, andthe greater the difference between groups, the “better” or more distinct the clustering
Types of Data in Cluster Analysis
In this section, we study the types of data that often occur in cluster analysis and how
to preprocess them for such an analysis. Suppose that a data set to be clustered contains
nobjects, which may represent persons, houses, documents, countries, and so on. Main
memory-based clustering algorithms typically operate on either of the following two data
structures.
 Data matrix (or object-by-variable structure): This represents n objects, such as persons,
withp variables (also called measurements or attributes), such as age, height,
weight, gender, and so on. The structure is in the form of a relational table, or n-by-p
matrix (n objects _p variables)
 Dissimilarity matrix (or object-by-object structure): This stores a collection of proximities
that are available for all pairs of n objects. It is often represented by an n-by-n
table
A Categorization of Major Clustering Methods
Many clustering algorithms exist in the literature. It is difficult to provide a crisp categorizationof clustering methods because these categories may overlap, so that a methodmay have features fromseveral categories. Nevertheless, it is useful to present a relativelyorganized picture of the different clustering methods.
In general, the major clustering methods can be classified into the followingcategories.
 Partitioning methods: Given a database of n objects or data tuples, a partitioningmethod constructs k partitions of the data, where each partition represents a clusterand k _ n. That is, it classifies the data into k groups, which together satisfy thefollowing requirements:
(1) eachgroupmust contain at least one object, and
(2) eachobject must belong to exactly one group.
There are various kinds of other criteria for judging the quality of partitions.To achieve global
optimality in partitioning-based clustering would require theexhaustive enumeration of all of the
possible
partitions. Instead, most applicationsadopt one of a few popular heuristic methods, such as
(1) thek-means algorithm: where each cluster is represented by the mean value of the objects in the cluster
(2) thek-medoidsalgorithm: where each cluster is represented by one of the objectslocated near the center of the cluster.
 Hierarchical methods: A hierarchical method creates a hierarchical decomposition of
the given set of data objects. A hierarchical method can be classified as being either
agglomerativeor divisive, based on howthe hierarchical decomposition is formed. The
agglomerative approach, also called the bottom-up approach, starts with each object
forming a separate group. It successively merges the objects or groups that are close
to one another, until all of the groups are merged into one (the topmost level of the
hierarchy), or until a termination condition holds. The divisive approach, also called
thetop-down approach.
 Density-based methods: Most partitioning methods cluster objects based on the distance
between objects. Such methods can find only spherical-shaped clusters and
encounter difficulty at discovering clusters of arbitrary shapes.Other clustering methods
have been developed based on the notion of density. Their general idea is to continue
growing the given cluster as long as the density (number of objects or data
points) in the “neighborhood” exceeds some threshold; that is, for each data point
within a given cluster, the neighborhood of a given radius has to contain at least a
minimum number of points.
 Grid-based methods: Grid-based methods quantize the object space into a finite number
of cells that form a grid structure. All of the clustering operations are performed
on the grid structure (i.e., on the quantized space). The main advantage of this
approach is its fast processing time, which is typically independent of the number
of data objects and dependent only on the number of cells in each dimension in the
quantized space.
 Model-based methods:Model-based methods hypothesize a model for each of the clusters
and find the best fit of the data to the given model. A model-based algorithm may
locate clusters by constructing a density function that reflects the spatial distribution
of the data points. It also leads to a way of automatically determining the number of
clusters based on standard statistics, taking “noise” or outliers into account and thus
yielding robust clustering methods.
 clustering high-dimensional data and constraint-based clustering: Clustering high-dimensional data is a particularly important task in cluster analysisbecause many applications require the analysis of objects containing a largenumber of features or dimensions. For example, text documents may contain thousandsof terms or keywords as features, and DNA microarray data may provide informationon the expression levels of thousands of genes under hundreds of conditions.Clustering high-dimensional data is challenging due to the curse of dimensionality.
Constraint-based clustering is a clustering approach that performs clustering by incorporation
of user-specified or application-oriented constraints. A constraint expressesa user’s expectation
describes “properties” of the desired clustering results andprovides an effective means for communicating with the clustering process.
 Graph-Based Clustering
viewed as operating on a proximity graph. However, they are most commonly viewed in
terms of merging or splitting clusters, and often there is no mention of graph related
concepts. There are some clustering techniques, however, that are explicitly cast in terms of a graph or a hypergraph. Many of these algorithms are based on the idea of looking at the nearest neighbors of a point. We begin first with a couple of old, but still quite relevant clustering techniques:-
1)Shared Nearest Neighbor Clustering:Shared nearest neighbor clustering is described
2)Mutual Nearest Neighbor Clustering:Mutual nearest neighbor clustering , It is based on the idea the “mutual neighborhood value (mnv)” of two points,
which is the sum of the ranks ofthe two points in each other’s sorted nearest-neighbor lists

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Program to Encrypt and decrypt a text data using RSA algorithm	smart paper boy	0	2,513	10-08-2011, 11:43 AM Last Post: smart paper boy
	Program the CRC 12 on a data set of characters	smart paper boy	0	1,758	10-08-2011, 11:42 AM Last Post: smart paper boy
	Program to implement the data link layer framing method character stuffing	smart paper boy	0	7,930	10-08-2011, 11:41 AM Last Post: smart paper boy
	Program to implement the data link layer framing method bit stuffing	smart paper boy	0	2,885	10-08-2011, 11:41 AM Last Post: smart paper boy
	To write a C# program to perform encryption and decryption of the given data.	smart paper boy	0	1,767	21-07-2011, 09:50 AM Last Post: smart paper boy
	Study the working of RS flip-flop using NAND gates and NOR gates and compare them	seminar class	0	5,113	13-05-2011, 04:36 PM Last Post: seminar class
	Study the working of basic gates like AND gate, OR gate and NOT gate	seminar class	0	2,290	13-05-2011, 03:50 PM Last Post: seminar class
	TOP 10 WEB HACKING TECHNIQUES	seminar class	0	2,043	28-04-2011, 12:31 PM Last Post: seminar class

Important Note..!

ASK HERE