14-10-2010, 01:24 PM
This article is presented by:Noman Mohammed
Benjamin C. M. Fung
Ke Wangy Patrick
C. K. Hung
Privacy-Preserving Data Mashup
ABSTRACT
Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for di®erent data providers to °exibly integrate their expertise and deliver highly cus- tomizable services to their customers. Nonetheless, com- bining data from di®erent sources could potentially reveal person-speci¯c sensitive information. In this paper, we study and resolve a real-life privacy problem in a data mashup ap- plication for the ¯nancial industry in Sweden, and propose a privacy-preserving data mashup (PPMashup) algorithm to securely integrate private data from di®erent data providers, whereas the integrated data still retains the essential infor- mation for supporting general data exploration or a speci¯c data mining task, such as classi¯cation analysis. Experi- ments on real-life data suggest that our proposed method is e®ective for simultaneously preserving both privacy and information usefulness, and is scalable for handling large volume of data.
INTRODUCTION
Mashup is a web technology that combines information and services from more than one source into a single web application. It was ¯rst discussed in a 2005 issue of Business Week [16] on the topic of integrating real estate information into Google Maps. Since then, web giants like Amazon, Yahoo!, and Google have been actively developing mashup applications. Mashup has created a new horizon for service providers to integrate their data and expertise to deliver highly customizable services to their customers. Data mashup is a special type of mashup application that aims at integrating data from multiple data providers de- pending on the user's service request. Figure 1 illustrates a typical architecture of the data mashup technology. A service request could be a general data exploration or a sophisticated data mining task such as classi¯cation anal- ysis. Upon receiving a service request, the data mashup web application dynamically determines the data providers, collects information from them through their web service application programming interface (API),1 and then inte- grates the collected information to ful¯ll the service request. Further computation and visualization can be performed at the user's site (e.g., a browser or an applet). This is very di®erent from the traditional web portal which simply di- vides a web page or a website into independent sections for displaying information from di®erent sources. A data mashup application can help ordinary users ex- plore new knowledge. Nevertheless, it could also be misused by adversaries to reveal sensitive information that was not available before the data integration. In this paper, we study the privacy threats caused by data mashup and propose a privacy-preserving data mashup (PPMashup) algorithm to securely integrate person-speci¯c sensitive data from di®er- ent data providers, whereas the integrated data still retains the essential information for supporting general data explo- ration or a speci¯c data mining task, such as classi¯cation analysis. The following real-life scenario illustrates the si- multaneous need of information sharing and privacy preser- vation in the ¯nancial industry. This research problem was discovered in a collaborative project with Nordax Finans AB, which is a provider of un- secured loans in Sweden. We generalize their problem as follows: A loan company A and a bank B observe di®erent sets of attributes about the same set of individuals identi¯ed by the common key SSN,2 e.g., TA(SSN; Age;Balance) and TB(SSN; Job; Salary). These companies want to implement a data mashup application that integrates their data to sup- port better decision making such as loan or credit limit ap- proval, which is basically a data mining task on classi¯cation analysis. In additional to companies A and B, their part- nered credit card company C also have access to the data mashup application, so all three companies A, B, and C are data recipients of the ¯nal integrated data. Companies A and B have two privacy concerns. First, simply joining TA and TB would reveal the sensitive information to the other party. Second, even if TA and TB individually do not contain person speci¯c or sensitive information, the integrated data can increase the possibility of identifying the record of an individual. Their privacy concerns are reasonable because Sweden has a population of only 9 million people. Thus, it is not impossible to identify the record of an individual by collecting information from pubic databases. The next example illustrates this point.
For more information about this article,please follow the link:
http://googleurl?sa=t&source=web&cd=1&ve...Mashup.pdf&ei=I7a2TJuQDYTSuwOJzIibCQ&usg=AFQjCNEFow3dV3EdeAbf6Rzlf16dKOGSBA