Cloudy Team

Overview:

Commun projet between Inria (Claude Castelluccia), UC Berkeley (Dawn Song) and UC Irvine (Gene Tsudik)

Cloud computing is a form of computing where general purpose clients are used to access resources and applications managed and stored on a remote server. Cloud applications are increasingly relied upon to provide basic services like e-mail clients, instant messaging and office applications. The customers of cloud applications benefit from outsourcing the management of their computing infrastructure to a third-party cloud provider. However, this places the customers in a situation of blind trust towards the cloud provider. The customer has to assume that the “cloud” always remains confidential, available, fault-tolerant, well managed, properly backed-up and protected from natural accidents as well as intentional attacks. An inherent reason for today’s limitations of commercial cloud solutions is that end users cannot verify that servers in the cloud and the network in between are hosting and disseminating tasks and content without deleting, disclosing or modifying any content.

The main goal of the Cloudy project is to study various aspects of Cloud Computing Security and Privacy.

Current projects:

-Password Security: Measuring the strength of passwords is crucial to ensure the security of password-based authentication. However, current methods to measure password strength have limited accuracy, first, because they use rules that are too simple to capture the complexity of passwords, and second, because password frequencies widely differ from one application to another. In this work, we are developing adaptive password strength meters that estimate password strength using Markov-models. We propose a secure implementation that greatly improves on the accuracy of current techniques [1]. We are also studying how personal information about a user, gathered for example from social networks, can help in speeding up password guessing [2].

[1] C. Castelluccia (Inria), M. Duermuth, D. Perito (UC Berkeley). Adaptive Password Strength Meters from Markov Models, Network and Distributed Systems Security Symposium (NDSS), February 2012.
[2] C. Castelluccia (Inria), B. Chaabane, M. Duermuth, D. Perito (UC Berkeley). OMEN.

-Content-Centric Networking (CCN) Security and Privacy: Content-Centric Networking (CCN) is an alternative to host-centric networking of today’s Internet. CCN emphasizes content distribution by making content directly addressable. Named-Data Networking (NDN) is an example of CCN that is being considered as one of the candidates for the next-generation Internet architecture. One key NDN feature is router-side content caching that optimizes bandwidth consumption, reduces congestion and provides fast fetching for popular content. Unfortunately, the same feature is also detrimental to privacy of both consumers and producers of content. As we show in this work, simple and difficult to detect timing attacks can exploit NDN routers as “oracles” and allow the adversary to learn whether a nearby victim has recently requested certain content. Similarly, probing attacks that target nearby content producers can be used to discover whether certain content has been recently fetched. In this work, we analyze the scope and feasibility of such attacks, and design some efficient countermeasures that offer quantifiable privacy guarantees while retaining key features of NDN [3].

[3] Gergely Acs (Inria), Mauro Conti, Paolo Gasti (UCI), Cesar Ghali (UCI), Gene Tsudik (UCI), Cache Privacy in Named-Data Networking, under review .

Visits:

-Daniele Perito was a post-doc at UC Berkeley since Dec. 2011. He is now working as a security architectect at Square (https://squareup.com/).
-Gergely Acs visited UC Irvine from May to June 2012.
-Claude Castelluccia visited UC Irvine and UC Berkeley from June 23rd, 2012 to August, 23rd, 2012.
-Abdelberi Chaabane is visiting PARC from June to Octover 2012.

Plan for 2013:

Most of the current projects will continue. In particular, we will keep working with UCI on CCN security and privacy. With the departure of Daniele Perito, the current collaboration with UC Berkeley has slowed down. We are planning to reactivate it with the probable visit of Claude Castelluccia to UC Berkeley from mid. 2013 to mid. 2015. This visit will be mostly devoted to work on Big Data Privacy.

Motivation: Many different types of datasets have become widely available in recent years and have opened the possibility to improve our understanding of large-scale social networks by investigating how people exchange information, interact, and develop social interactions. Furthermore, more and more online systems, such as smart meters or web analytics, are increasily tracking and collecting user’s data for analytics purposes. When aggregated the collected data, sometimes refer to as big data, can help understand complex processes, such as the spread viruses, and built better transportation systems or recommendation systems, prevent traffic congestion. However, they unfortunately pose a considerable threat to privacy. For example, mobility trajectories might be used by a malicious attacker to discover potential sensitive information about a user, such as his habits, religion or relationships. Because privacy is so important to people, companies and researchers are reluctant to publish datasets by fear of being held responsible for potential privacy breaches.

Research Objectives: There is therefore an urgent need to develop Privacy-Preserving Data Analytics (PPDA) systems that collect and transform raw data into a version that is immunized against privacy attacks but that still preserves useful information for data analysis. This is the main objective of this proposal. A PPDA system is composed of two phases: a data collection and a data publishing phase. The data publisher collects data from individuals, in the data collection phase, that are then anonymized before release by the data publishing phase. There are two classes of PPDA according to whether the entity that is collecting and anonymizing the data is trusted or not. In the trusted model, that we refer to as Privacy-Preserving Data Publishing (PPDP), individuals trust the data publisher and disclose all their data to it. In untrusted model, that we refer to as Privacy-Preserving Data Collection (PPDC), individuals do not trust their data publisher. They may add some noise to their data to protect sensitive information from the data publisher.

- Privacy-Preserving Data Publishing: In the trusted model, individuals trust the data publisher and disclose all their data to it. For example, in a medical scenario, patients give their true information to hospitals to receive proper treatment. It is then the responsibility of the data publisher to protect privacy of the individuals’ personal data. To prevent potential data leakage, datasets must be sanitized before possible release. Several data sanitization algorithms have been presented recently [3]. However, their privacy properties are often dubious, since they rely on privacy models that are either ad-hoc or considered weak. Reidentification of sanitized data has been demonstrated in various contexts such as by Narayanan and Shmatikov on the Netflix Prize dataset [23]. It is therefore urgent to respond to the failure of existing data sanitization techniques by developing new schemes with proven guarantees.
Different privacy models have been recently proposed. Amongst them, Differential Privacy model is probably the model that provides the strongest guarantees [10]. Several proposals have been recently proposed to release private data under the Differential Privacy model. However most of these schemes releases a “snapshot” of the datasets at a given period of time. This release often consists of histograms. They can, for example, be used to release distributions of some pathologies (such as cancer, flu, hiv, hepatitis, etc.) in a given population. For many analytics applications, “snapshots” of data are not enough, and sequential data are required. Furthermore, current work focusses on rather simple data structures, such as numerical data. Release of more complex data, such as graphs, are often also very useful. For example, recommendation systems need the sequences of visited websites or bought items. They also need to analyse people connection graphs to identify the best products to recommend. Network trace analytics also rely on the sequence of events to detect anomalies or intrusions. Similarly, traffic analytics applications typically need sequences of visited places of each user as opposed to the positions of each user at given periods of time. In fact, it is often essential for these applications to know that users A moved from position 1 to position 2, or at least to learn the probability of moving from position 1 to position 2. Histograms would typically represent the number of a users in position 1 and position 2, but would not provide the number of users that moved from position 1 to position 2. Due to the inherent sequentiality and high-dimensionality of sequential data, one major challenge of applying current data sanitization solutions on sequential data comes from the uniqueness of sequences (e.g., very few sequences are identical). This fact makes existing techniques result in poor utility. Schemes to privately release data with complex data structures, such as sequential, relational and graph data, are required. This is one the goals of this proposal.

Privacy-Preserving Data Collection: In untrusted model, individuals do not trust their data publisher. They may add some noise to their data to protect sensitive information from the data publisher. For example, websites commonly use third party web analytics services, such as Google Analytics to obtain aggregate traffic statistics such as most visited pages, visitors’ countries, etc. Similarly, other applications, such as smart metering or targeted advertising applications, are also tracking users in order to derive aggregated information about a particular class of users. Unfortunately, to obtain this aggregate information, services need to track users, resulting in a violation of user privacy. The increasing presence and tracking of third-party sites used for advertising and analytics has been demonstrated in a study [19]. This study showed that the penetration of the top 10 third-parties growing from 40% in 2005 to 70% in 2008, and to over 70% in September 2009. Also, most popular social networking websites, such as Facebook, Twitter, Xing, and Google+ track users around the web. Each of these social networks have social widgets for sharing and recommendation (called Like, Tweet, Visitors, and +1 buttons) which are installed on numerous websites. These buttons allow the social networks to track users, even when they don’t click those buttons – just viewing a webpage with such a button is sufficient to be tracked. A recent Wall Street Journal study [29] showed that several of the most popular Android or iPhone applications, including games and OSNs transmitted the phone’s unique device ID, phone’s location, age, gender and other personal details to third-party companies without users’ awareness or consent. One of our goals is to develop Privacy-Preserving Data Collection solutions. We propose to study whether it is possible to provide efficient collection/aggregation solutions without tracking users, i.e. without getting or learning individual contributions. In other word, it will try to answer the following questions: “is it possible to generate statistics about a group of users without having access to individual contributions?”.

References:

[3] Francesco Bonchi, Laks V.S. Lakshmanan, and Hui (Wendy)Wang. Trajectory anonymity in publishing personal mobility data. SIGKDD Explor. Newsl., 13(1), August 2011.
[10] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.
[19] B. Krishnamurthy and C. Wills. Privacy diffusion on the web: a longitudinal perspective. In WWW, 2009.
[23] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Security and Privacy, 2008.
[29] S. Thurm and Y. Kane. Your apps are watching you. The Wall Street Journal, December 2010.

Requested Budget:

The requested budget is 20.000 euros. It will be used for students’ exchanges and cover some of the cost of Claude’s visit at UCB.