ASK HERE

project topics · 04-08-2011, 03:42 PM

Text Mining

The explosion of on-line information has given rise to many query based search engines and manually constructed topic hierarchies. But with the current growth rate in the amount of information, query results grow incomprehensibly large and manual classification in topic hierarchies creates an immense bottleneck. Search engines return millions of relevant sites but sites referring to similar content are not grouped. Cluster search, groups similar sites, giving users a greater chance of finding more sites relevant to their search.

In this dissertation, we address these problems with a system for topical information space navigation that combines the query-based and taxonomic approaches. Our system Racimo enables the creation of dynamic hierarchical document clustering based on full text of articles. A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, each document often contains a small fraction of words in the vocabulary. These features require special handlings. Another requirement is hierarchical clustering where clustered documents can be browsed according to the increasing specificity of topics. In this system, we propose to use the notion of frequent itemsets, which comes from association rule mining, for document clustering. The intuition of our clustering criterion is that each cluster is identified by some common words, called frequent itemsets, for the documents in the cluster. Frequent itemsets are also used to produce a hierarchical topic tree for clusters. By focusing on frequent items, the dimensionality of the document set is drastically reduced. We show that this method outperforms best existing methods in terms of both clustering accuracy and scalability.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	An Efficient Algorithm for Mining Frequent Patterns full report	project topics	3	4,912	01-10-2016, 10:02 AM Last Post: Guest
	Privacy Preservation in Data Mining	sajidpk123	3	3,064	13-11-2014, 10:48 PM Last Post: jaseela123d
	projects on data mining?	shakir_ali	2	2,112	05-11-2014, 09:30 PM Last Post: jaseela123d
	data mining full report	project report tiger	25	171,532	07-10-2014, 09:10 PM Last Post: ToPWA
	A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Site	project topics	1	2,394	13-12-2012, 12:22 PM Last Post: Guest
	DATA MINING AND WAREHOUSE	smart paper boy	1	1,698	10-11-2012, 12:44 PM Last Post: seminar details
	Java based Data Mining Project Ideas	electronics seminars	1	4,095	10-03-2012, 03:21 PM Last Post: seminar paper
	HARDWARE ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING - KNOWLEDGE AND	electronics seminars	3	3,957	20-02-2012, 04:33 PM Last Post: seminar paper
	data mining project ideas	computer science topics	6	19,049	14-02-2012, 10:44 AM Last Post: seminar paper
	data mining projects idea	project topics	1	1,362	14-02-2012, 10:44 AM Last Post: seminar paper

Important Note..!

ASK HERE