ASK HERE

sachin1091 · 01-07-2011, 09:24 PM

Abstract
Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on the-fly. Such records are query-dependent and a pre learned method using training examples from previous query results may fail on the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the “presumed” non duplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the non duplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply.

Existing System:
To build a system that helps users integrate and, more importantly, compare the query results returned from multiple Web databases, a crucial task is to match the different sources’ records that refer to the same real-world entity. For example, Fig. 1 shows some of the query results returned by two online bookstores, booksamillion.com and abebooks.com, in response to the same query “Harry Potter” over the Title field. It can be seen that the record numbered 3 in Fig. 1a and the third record in Fig. 1b refer to the same book, since they have the same ISBN number although their authors differ somewhat. In comparison, the record numbered 5 in Fig. 1a and the second record in Fig. 1b also refer to the same book if we are interested only in the book title and author.1 The problem of identifying duplicates,2 that is, two (or more) records describing the same entity, has attracted much attention from many research fields, including Databases, Data Mining, Artificial Intelligence, and Natural Language Processing.3 Most previous work4 is based on predefined matching rules hand-coded by domain experts or matching rules learned offline by some learning method from a set of training examples. Such approaches work well in a traditional database environment, where all instances of the target databases can be readily accessed, as long as a set of high-quality representative records can be examined by experts or selected for the user to label.
1.4 Proposed System:
In the Web database scenario, the records to match are highly query-dependent, since they can only be obtained through online queries. Moreover, they are only a partial and biased portion of all the data in the source Web databases. Consequently, hand-coding or offline-learning approaches are not appropriate for two reasons. First, the full data set is not available beforehand, and therefore, good representative data for training are hard to obtain. Second, and most importantly, even if good representative data are found and labeled for learning, the rules learned on the representatives of a full data set may not work well on a partial and biased part of that data set. To illustrate this problem, consider a query for books of a specific author, such as “J. K. Rowling.” Depending on how the Web databases process such a query, all the result records for this query may well have only “J. K. Rowling” as the value for the Author field. In this case, the Author field of these records is ineffective for distinguishing the records that should be matched and those that should not. To reduce the influence of such fields in determining which records should match, their weighting should be adjusted to be much lower than the weighting of other fields or even be zero. However, if a matching rule is learned from representatives of the full data set, then it is highly unlikely that a rule to deal with such fields will be discovered. Moreover, for each new query, depending on the results returned, the field weights should probably change too, which makes supervised-learning based methods even less applicable.

seminar paper · 25-02-2012, 09:59 AM

to get information about the topic Record Matching over query Result from multiple Web Database full report ppt and related topic refer the page link bellow
http://studentbank.in/report-record-matc...ses--24129

http://studentbank.in/report-record-matc...-databases

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Web based remote device monitoring	harini	5	2,981	12-03-2016, 01:50 PM Last Post: seminar report asees
	watermarking relational databases using optimization techniques	ravikiran.wgl	3	2,638	27-09-2014, 01:59 PM Last Post: Guest
	12. Over speed indication and Automatic accident Avoiding System for four wheeler Re	shiven234	4	7,097	20-02-2014, 04:23 AM Last Post: Guest
	Development of a web-based Recruitment Process System for the HR group for a company	slim silesh	2	3,761	24-03-2013, 12:18 AM Last Post: Guest
	MULTIPLE ROUTING CONFIGURATION FOR FAST IP NETWORK RECOVERY	[email protected]	2	3,305	20-12-2012, 10:01 AM Last Post: Guest
	Request for Web Based Stationery management system project	fkachala	2	2,774	15-11-2012, 05:52 PM Last Post: Guest
	automatic vehicle over speed controlling system for school and collage zone	mahendiran.a	2	3,530	30-09-2012, 10:15 AM Last Post: Guest
	privacy preserving updates to anonymous and cofidential databases	anukanduru	0	669	16-03-2012, 12:39 PM Last Post: anukanduru
	databases for the courier management system		1	1,124	15-03-2012, 12:02 PM Last Post: seminar paper
	extended xml tree pattern matching	dhivya19	1	972	12-03-2012, 12:59 PM Last Post: seminar paper

Important Note..!

ASK HERE