Record Matching Over Query Results from Multiple Web Databases
#1

Abstract
Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on the-fly. Such records are query-dependent and a pre learned method using training examples from previous query results may fail on the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the “presumed” non duplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the non duplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply.

Existing System:
To build a system that helps users integrate and, more importantly, compare the query results returned from multiple Web databases, a crucial task is to match the different sources’ records that refer to the same real-world entity. For example, Fig. 1 shows some of the query results returned by two online bookstores, booksamillion.com and abebooks.com, in response to the same query “Harry Potter” over the Title field. It can be seen that the record numbered 3 in Fig. 1a and the third record in Fig. 1b refer to the same book, since they have the same ISBN number although their authors differ somewhat. In comparison, the record numbered 5 in Fig. 1a and the second record in Fig. 1b also refer to the same book if we are interested only in the book title and author.1 The problem of identifying duplicates,2 that is, two (or more) records describing the same entity, has attracted much attention from many research fields, including Databases, Data Mining, Artificial Intelligence, and Natural Language Processing.3 Most previous work4 is based on predefined matching rules hand-coded by domain experts or matching rules learned offline by some learning method from a set of training examples. Such approaches work well in a traditional database environment, where all instances of the target databases can be readily accessed, as long as a set of high-quality representative records can be examined by experts or selected for the user to label.
1.4 Proposed System:
In the Web database scenario, the records to match are highly query-dependent, since they can only be obtained through online queries. Moreover, they are only a partial and biased portion of all the data in the source Web databases. Consequently, hand-coding or offline-learning approaches are not appropriate for two reasons. First, the full data set is not available beforehand, and therefore, good representative data for training are hard to obtain. Second, and most importantly, even if good representative data are found and labeled for learning, the rules learned on the representatives of a full data set may not work well on a partial and biased part of that data set. To illustrate this problem, consider a query for books of a specific author, such as “J. K. Rowling.” Depending on how the Web databases process such a query, all the result records for this query may well have only “J. K. Rowling” as the value for the Author field. In this case, the Author field of these records is ineffective for distinguishing the records that should be matched and those that should not. To reduce the influence of such fields in determining which records should match, their weighting should be adjusted to be much lower than the weighting of other fields or even be zero. However, if a matching rule is learned from representatives of the full data set, then it is highly unlikely that a rule to deal with such fields will be discovered. Moreover, for each new query, depending on the results returned, the field weights should probably change too, which makes supervised-learning based methods even less applicable.
Reply
#2

to get information about the topic Record Matching over query Result from multiple Web Database full report ppt and related topic refer the page link bellow
http://studentbank.in/report-record-matc...ses--24129

http://studentbank.in/report-record-matc...-databases
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: cgbse 10supply results, avit vmu, amc 12 results**nd distribution, record matching, avit vmu results, sats results for, mpsc results meghalaya,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Web based remote device monitoring harini 5 2,981 12-03-2016, 01:50 PM
Last Post: seminar report asees
  watermarking relational databases using optimization techniques ravikiran.wgl 3 2,638 27-09-2014, 01:59 PM
Last Post: Guest
  12. Over speed indication and Automatic accident Avoiding System for four wheeler Re shiven234 4 7,097 20-02-2014, 04:23 AM
Last Post: Guest
  Development of a web-based Recruitment Process System for the HR group for a company slim silesh 2 3,761 24-03-2013, 12:18 AM
Last Post: Guest
Rainbow MULTIPLE ROUTING CONFIGURATION FOR FAST IP NETWORK RECOVERY [email protected] 2 3,305 20-12-2012, 10:01 AM
Last Post: Guest
  Request for Web Based Stationery management system project fkachala 2 2,774 15-11-2012, 05:52 PM
Last Post: Guest
  automatic vehicle over speed controlling system for school and collage zone mahendiran.a 2 3,530 30-09-2012, 10:15 AM
Last Post: Guest
  privacy preserving updates to anonymous and cofidential databases anukanduru 0 669 16-03-2012, 12:39 PM
Last Post: anukanduru
  databases for the courier management system 1 1,124 15-03-2012, 12:02 PM
Last Post: seminar paper
  extended xml tree pattern matching dhivya19 1 972 12-03-2012, 12:59 PM
Last Post: seminar paper

Forum Jump: