An infrastructure for automated web privacy measurement has three components: simulating users, recording observations (response metadata, cookies, behavior of scripts, etc.), and analysis. We set out to build a platform that can automate the first two components and can ease the researcher’s analysis task. We sought to make OpenWPM general, modular, and scalable enough to support essentially any privacy measurement.
OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection, including a proxy, a Firefox extension, and access to Flash cookies. Check out the Github repository for more information on the instrumentation and data collected.
Insights from a 1-million-site Measurement of Online Tracking
FTC PrivacyCon. Washington, DC
The Web Privacy Problem is a Transparency Problem
FTC PrivacyCon. Washington, DC
Arvind Narayanan and Dillon Reisman
To appear in “Transparent data mining for Big and Small Data”. Editors: Tania Cerquitelli, Daniele Quercia, Frank Pasquale. Springer. 2017.
This book chapter presents an overview of the goals, design, and findings of the WebTAP project. We recommend that readers new to the project begin here. The chapter concludes with recommendations for public policy and regulation of privacy.
Jessica Su, Ansh Shukla, Sharad Goel, Arvind Narayanan
We show — theoretically, via simulation, and through experiments on real user data — that de-identified web browsing histories can be linked to social media profiles using only publicly available data. This is possible because each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique. We recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them.
The Future of Ad Blocking: Analytical Framework and New Techniques
Grant Storey, Dillon Reisman, Arvind Narayanan, Jonathan Mayer
We present a systematic study of ad blocking — and the associated “arms race” — as a security problem. We propose five new ad blocking techniques and evaluate them using prototype implementations we have built. Contrary to widespread assumption, we argue that users / ad blockers hold the upper hand.
Steven Englehardt and Arvind Narayanan
We present the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites (“cookie syncing”). Our findings include multiple sophisticated fingerprinting techniques never before measured in the wild.
Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, Edward W. Felten
We investigate the ability of a passive network observer to leverage third-party HTTP tracking cookies for global surveillance. Using simulated browsing profiles from several locations around the world, we cluster network traffic by transitively linking shared unique cookies and estimate that for typical users 62-73% of websites with embedded trackers are located in a single connected component. Furthermore, almost half of the most popular webpages will leak a logged-in user’s real-world identity to an eavesdropper in unencrypted traffic.
Michael Kranch, Joseph Bonneau
We have conducted the first in-depth empirical study of two important new web security features, strict transport security (HSTS) and public-key pinning. Our findings highlight that the web platform, as well as modern websites, are large and complicated enough to make even conceptually simple security upgrades challenging to deploy in practice.
Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, Claudia Diaz
[ACM CCS 2014][Summary][Code][Data]
This collaboration between researchers at KU Leuven and Princeton is the first large-scale study of three advanced web tracking mechanisms – canvas fingerprinting, evercookies, and the use of “cookie syncing” in conjunction with evercookies.
Nicky Robinson, Joseph Bonneau
We study Facebook Connect’s permissions system using crawling, experimentation, and surveys and determine that it works differently than both users and developers expect in several ways.
Steven Englehardt, Christian Eubank, Peter Zimmerman, Dillon Reisman, Arvind Narayanan
We identify 32 web privacy measurement studies, cast them as instances of a generic experimental framework, and perform a thorough methodological analysis. We analyze design and implementation alternatives and make recommendations based on considerations of experimental rigor and engineering feasibility. We present a flexible, modular web privacy measurement platform that supports any experiment that fits the framework. It is also highly scalable and avoids many common pitfalls. Finally, as a case study of our methods and infrastructure, we measure the “filter bubble”, i.e., the extent of personalization based on a user’s history, by crawling approximately 300,000 pages across nine news sites and present evidence that this personalization effect has been greatly overstated in the popular press.
Christian Eubank, Marcela Melara, Diego Perez-Botero, Arvind Narayanan
We present the first published large-scale study of mobile web tracking. We compare tracking across five physical and emulated mobile devices with one desktop device as a benchmark.
We welcome additional collaborators to contribute to our web transparency work.
Academic researchers, developers, public advocates, and others with expertise in online privacy all could advance our progress towards providing accurate web privacy information and best practices for the public.
If you are interested in working with us on these issues, please Arvind Narayanan.