ASK HERE

20-06-2016, 02:32 PM

iam doing project on effective pattern discovery in text mining.

**visalakshik** · 02-07-2016, 02:21 PM

To get information about the topic Effective Pattern Discovery for Text Mining full report ppt and related topic refer the page link below

Abstract

Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance.

INTRODUCTION

Due to the rapid growth of digital data made available in
recent years, knowledge discovery and data mining have
attracted a great deal of attention with an imminent need for
turning such data into useful information and knowledge.
Many applications, such as market analysis and business
management, can benefit by the use of the information and
knowledge extracted from a large amount of data.
Knowledge discovery can be viewed as the process of
nontrivial extraction of information from large databases,
information that is implicitly presented in the data,
previously unknown and potentially useful for users. Data
mining is therefore an essential step in the process of
knowledge discovery in databases.

RELATED WORK

Many types of text representations have been proposed in
the past. A well known one is the bag of words that uses
keywords (terms) as elements in the vector of the feature
space. In addition to TFIDF, the global IDF and entropy
weighting scheme is proposed in [9] and improves
performance by an average of 30 percent. Various weighting
schemes for the bag of words representation approach were
given in [1],[14]. The problem of the bag of words approach
is how to select a limited number of features among an
enormous set of words or terms in order to increase the
system’s efficiency and avoid overfitting. In order to reduce
the number of features, many dimensionality reduction
approaches have been conducted by the use of feature
selection techniques, such as Information Gain, Mutual
Information, Chi-Square, Odds ratio, and so on.

CONCLUSION

Many data mining techniques have been proposed in the last
decade. These techniques include association rule mining,
frequent itemset mining, sequential pattern mining,
maximum pattern mining, and closed pattern mining.
However, using these discovered knowledge (or patterns) in
the field of text mining is difficult and ineffective. The
reason is that some useful long patterns with high specificity
lack in support (i.e., the low-frequency problem). We argue
that not all frequent short patterns are useful. Hence,
misinterpretations of patterns derived from data mining
techniques lead to the ineffective performance.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	A New Data Mining Based Network Intrusion Detection Model	prem0597	2	4,231	04-05-2018, 09:42 PM Last Post: Guest
	download engineering mathematics 4 vtu ksc text book		3	2,435	12-10-2016, 03:03 PM Last Post: Guest
	ppt on effective pattern discovery for text mining		1	478	18-06-2016, 11:35 AM Last Post: dhanabhagya
	frequent term based text clustering code		1	492	18-06-2016, 11:13 AM Last Post: dhanabhagya
	frequent term based text clustering code		1	587	17-06-2016, 12:39 PM Last Post: dhanabhagya
	free pdf download precision engineering text book by rl murthy		2	707	17-06-2016, 12:24 PM Last Post: dhanabhagya
	ksc maths 4th sem engg text book pdf		1	860	16-06-2016, 11:09 AM Last Post: dhanabhagya
	download engineering mathematics 4 vtu ksc text book		1	1,617	16-06-2016, 10:50 AM Last Post: dhanabhagya
	Issues using a long term pattern detection application		1	629	28-05-2016, 11:06 AM Last Post: dhanabhagya
	matlab source code for voice to text conversion		1	605	12-05-2016, 11:28 AM Last Post: dhanabhagya

Important Note..!

ASK HERE