we are doing projects on the title Deduplication using genetic algorithm , we need code for deduplication using genetic algorithm to implement our project.
please update as soon as possible.we need code for deduplication using genetic algorithm to implement our project.
Posts: 14,118
Threads: 61
Joined: Oct 2014
In the next growth of technology, the use of databases is very high. As the use of databases grows higher the dirty data on the other side is the biggest disadvantage with databases. Dirty data can contain errors such as spelling or punctuation, incorrect data associated with a field, incomplete or obsolete data or even duplicate data in the database. Various software cleanup data is used to remove dirty data. In our article we propose a concept of genetic programming approach for record deduplication that combines several pieces of evidence extracted from the data content to find a deduplication function that is able to identify whether two entries in a repository are replicas or not. In addition, our genetic programming approach is able to automatically adapt these functions to a given fixed replica identification limit.
Detecting duplicate records is important for preprocessing and data cleansing. Artificial Bee Colony (ABC) is one of the most recently introduced algorithms based on the intelligent foraging behavior of a swarm of honey bees. Our approach to duplicate detection is to use the ABC algorithm to generate the optimal similarity measure to decide whether the data is duplicate or not. In the training phase, the ABC algorithm is used to generate the optimal measure of similarity. Once the optimal similarity measure is obtained, the deduplication of the remaining datasets is performed with the aid of an optimal similarity measure generated from the ABC algorithm.