A huge amount of personal health information has been available over the last few decades and manipulation of any part of this information poses a huge risk to the health field. Existing anonymization methods are only suitable for sensitive and low-dimensional individual data to keep abreast of privacy such as generalization and bucketization. In this paper, we propose an anonymization technique that is a combination of the benefits of anatomization and an improved cutting approach that adheres to the principle of k-anonymity and l-diversity for the purpose of dealing with high-dimensional data together With multiple sensitive data. The anatomy approach dissociates the observed correlation between quasi-identifier attributes and sensitive attributes (SA) and gives two separate tables with non-overlapping attributes. In the improved cutting algorithm, vertical partitioning makes the SA correlation in ST together and therefore minimizes dimensionality by using the advanced clustering algorithm. In order to obtain the optimal size of the cubes, the tuples are partitioned by MFA. The experimental results indicate that the proposed method can preserve the privacy of the data with numerous SAs. The anatomization approach minimizes loss of information and the cut algorithm helps preserve correlation and utility, which in turn reduces dimensionality of data and loss of information. Advanced clustering algorithms demonstrate their effectiveness by minimizing time and complexity. In addition, this work adheres to the principle of k-anonymity, l-diversity and thus avoid threats of privacy such as belonging, identity and disclosure of attributes.
Today's healthcare providers store and transmit a large amount of confidential data as the content of your business. Sensitive data can be personally identifiable information from customers. Any misuse of this information creates a critical threat to your business. By making sensitive data available to the public, they need to be protected from abuse.
From the point of view of data privacy protection, data anonymity is the only popularly used approach. It modifies the information, taking into account that it is difficult to link people with their data. This methodology tries to assure the identity along with the sensitive information of the subjects when the data are shared for diverse purposes (Lefevre et al., 2008, Pfitzmann and Hansen 2008). SA is the set of attributes whose values are confidential, such as type of cancer, treatment, symptom, date of diagnosis and doctor. The other attributes are related to identifiers whose values help to make the identification of an individual distinctly as a name or ID, and QI attributes are those attributes that help to recognize an individual when they are collected together. These attributes can be considered with caution so that there is no information leakage.
When sharing records, it is very important to avoid disclosure of confidential information of individuals. There are three basic disclosures of privacy that have been identified so far. They are disclosure of identity, disclosure of membership and also disclosure of the attribute. Identity disclosure occurs when the character is linked to a specific record within the shared dataset. The disclosure of attributes occurs when the new approximate information about a man or woman is discovered, which in turn indicates that the shared data make feasible the possibility of recovering the characteristics of the individual with greater certainty than would be obtained before with the Shared records. Membership disclosure occurs when the information revealed is about whether a person's record exists in the data being published or not.
There are several methods of anonymization that prevail to maintain privacy. These are generalization, deletion, anatomy, bucketization, permutation and perturbation. Generalization and deletion are focused on QI attributes, whereas bucketization focuses on the division of SA of QI attributes with a description that is less specific. Anatomic and permutation dissociate the correlation between QI and SA attributes by the collection and rearrangement of sensitive values in a qid group. The disturbance alters the data by adding noise, aggregating values, exchanging values or generating artificial data or by encrypting the data, in light of a few measurable characteristics of the first information.