Student Seminar Report & Project Report With Presentation (PPT,PDF,DOC,ZIP)

hello sir i need quality threshold clustering algorithm code in java

quality threshold clustering algorithm implementation source code

Abstract

Quality Threshold Clustering (QTC) is an algorithm for partitioning data, in fields such as biology, where clustering of large data-sets can aid scientific discovery. Unlike other clustering algorithms, QTC does not require knowing the number of clusters a priori, however, its perceived need for high computing power often makes it an unattractive choice. This paper presents a thorough study of QTC. We analyze the worst case complexity of the algorithm and discuss methods to reduce it by trading memory for computation. We also demonstrate how the expected running time of QTC is affected by the structure of the input data. We describe how QTC can be parallelized, and discuss implementation details of our thread-parallel, GPU, and distributed memory implementations of the algorithm. We demonstrate the efficiency of our implementations through experimental data. We show how data sets with tens of thousands of elements can be clustered in a matter of minutes in a modern GPU, and seconds in a small scale cluster of multi-core CPUs, or multiple GPUs. Finally, we discuss how user selected parameters, as well as algorithmic and implementation choices, affect performance.

This algorithm requires the apriori specification of the threshold distance within the cluster and the minimum number of elements in each cluster. Now from each data point we find all its candidate data points. Candidate data points are those which are within the range of the threshold distance from the given data point. This way we find the candidate data points for all data point and choose the one with large number of candidate data points to form cluster. Now data points which belongs to this cluster is removed and the same procedure is repeated with the reduced set of data points until no more cluster can be formed satisfying the minimum size criteria.

Advantages

1) Quality Guaranteed - Only clusters that pass a user-defined quality threshold will be returned.

2) Number of clusters is not specified apriori.

3) All possible clusters are considered - Candidate cluster is generated with respect to every data points and tested in order of size against quality criteria.

Disadvantages

1) Computationally Intensive and Time Consuming - Increasing the minimum cluster size or increasing the number of data points can greatly increase the computational time.
2) Threshold distance and minimum number of element in the cluster has to be defined apriori.

Guest

dhanabhagya