privacy preservation in data mining java source code application
#1

i want privacy preservation in data mining java source code.
Reply
#2
privacy preservation in data mining java source code application

Abstract:

Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the l-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure.

Algorithm Used:

Slicing Algorithms:

Our algorithm consists of three phases: attribute partitioning, column generalization, and tuple partitioning. We now describe the three phases.

Algorithm tuple-partition(T, l)

1. Q = {T}; SB = ∅.

2. while Q is not empty

3. remove the first bucket B from Q; Q = Q − {B}.

4. split B into two buckets B1 and B2, as in Mondrian.

5. if diversity-check(T, Q ∪ {B1,B2} ∪ SB, l)

6. Q = Q ∪ {B1,B2}.

7. else SB = SB ∪ {B}.

8. return SB.

Existing System:

First, many existing clustering algorithms (e.g., k- means) requires the calculation of the “centroids”. But there is no notion of“centroids”in our setting where each attribute forms a data point in the clustering space. Second, k-medoid method is very robust to the existence of outliers (i.e., data points that are very far away from the rest of data points). Third, the order in which the data points are examined does not affect the clusters computed from the k-medoid method.

Disadvantages:

1. Existing anonymization algorithms can be used for column generalization, e.g.,Mondrian . The algorithms can be applied on the subtable containing only attributes in one column to ensure the anonymity requirement.

2. Existing data analysis (e.g., query answering) methods can be easily used on the sliced data.

3. Existing privacy measures for membership disclosure protection include differential privacy and presence.

Proposed System:

We present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the l-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute.

Advantages:

1. We introduce a novel data anonymization technique called slicing to improve the current state of the art.

2. We show that slicing can be effectively used for preventing attribute disclosure, based on the privacy requirement of l-diversity.

3. We develop an efficient algorithm for computing the sliced table that satisfies l-diversity. Our algorithm partitions attributes into columns, applies column generalization, and partitions tuples into buckets. Attributes that are highly-correlated are in the same column.

4. We conduct extensive workload experiments. Our results confirm that slicing preserves much better data utility than generalization. In workloads involving the sensitive attribute, slicing is also more effective than bucketization. In some classification experiments, slicing shows better performance than using the original data (which may overfit the model). Our experiments also show the limitations of bucketization in membership disclosure protection and slicing remedies these limitations.

Module Description:

1. Original Data

2. Generalized Data

3. Bucketized Data

4. Multiset-based Generalization Data

5. One-attribute-per-Column Slicing Data

6. Sliced Data

Original Data:

We conduct extensive workload experiments. Our results confirm that slicing preserves much better data utility than generalization. In workloads involving the sensitive attribute, slicing is also more effective than bucketization. In some classification experiments, slicing shows better performance than using the original data.

Generalized Data:

Generalized Data, in order to perform data analysis or data mining tasks on the generalized table, the data analyst has to make the uniform distribution assumption that every value in a generalized interval/set is equally possible, as no other distribution assumption can be justified. This significantly reduces the data utility of the generalized data.

Bucketized Data:

we show the effectiveness of slicing in membership disclosure protection. For this purpose, we count the number of fake tuples in the sliced data. We also compare the number of matching buckets for original tuples and that for fake tuples. Our experiment results show that bucketization does not prevent membership disclosure as almost every tuple is uniquely identifiable in the bucketized data.

Multiset-based Generalization Data:

We observe that this multiset-based generalization is equivalent to a trivial slicing scheme where each column contains exactly one attribute, because both approaches preserve the exact values in each attribute but break the association between them within one bucket.

One-attribute-per-Column Slicing Data:

We observe that while one-attribute-per-column slicing preserves attribute distributional information, it does not preserve attribute correlation, because each attribute is in its own column. In slicing, one groups correlated attributes together in one column and preserves their correlation. For example, in the sliced table shown in Table correlations between Age and Sex and correlations between Zipcode and Disease are preserved. In fact, the sliced table encodes the same amount of information as the original data with regard to correlations between attributes in the same column.

Sliced Data:

Another important advantage of slicing is its ability to handle high-dimensional data. By partitioning attributes into columns, slicing reduces the dimensionality of the data. Each column of the table can be viewed as a sub-table with a lower dimensionality. Slicing is also different from the approach of publishing multiple independent sub-tables in that these sub-tables are linked by the buckets in slicing.
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: java application projects with source code, anxiety generalized, java code for data mining algorithms, bank application source code in java, privacy preservation for sample datasets ppt, privacy and data mining, how to download a data mining project with source code,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  data flow diagram for whatsapp 2 4,184 09-12-2018, 02:30 PM
Last Post: Rhysevans371a
  simple java rmi chat application source code 2 18,979 20-07-2018, 12:08 PM
Last Post: Guest
  A New Data Mining Based Network Intrusion Detection Model prem0597 2 4,224 04-05-2018, 09:42 PM
Last Post: Guest
  authentication schemes for session passwords using color and images project source code 2 2,231 03-02-2018, 09:35 AM
Last Post: Nischithnash
  5d data storage technology ppt 2 1,609 01-12-2017, 09:43 PM
Last Post: Ajaykc
  free download source code for online movie ticket booking in java 2 18,489 15-08-2017, 03:21 PM
Last Post: Morshed
  source code for rsa encryption and decryption in java 2 7,927 29-05-2017, 04:21 PM
Last Post: Meghna Jadhav
  download liver tumor ct scan image in matlab with source code 4 7,985 21-05-2017, 09:54 PM
Last Post: abdulrahmanmashaal
  online cab booking source code in asp net 3 7,861 11-05-2017, 10:39 AM
Last Post: jaseela123d
Thumbs Up online catering management system on php with report and source code and ppt 4 8,717 29-04-2017, 10:59 AM
Last Post: jaseela123d

Forum Jump: