by Steven Englehardt , Gunes Acar, and Arvind Narayanan
Today we report yet another type of surreptitious data collection by third-party scripts that we discovered: the exfiltration of personal identifiers from websites through “login with Facebook” and other such social login APIs. Specifically, we found two types of vulnerabilities :
- seven third parties abuse websites’ access to Facebook user data
- one third party uses its own Facebook “application” to track users around the web.
Vulnerability 1: Third parties piggyback on Facebook access granted to websites
Facebook Login and other social login systems simplify the account creation process for users by decreasing the number of passwords to remember. But social login brings risks: Cambridge Analytica was found misusing user data collected by a Facebook quiz app which used the Login with Facebook feature. We’ve uncovered an additional risk: when a user grants a website access to their social media profile, they are not only trusting that website, but also third parties embedded on that site.
We found seven scripts collecting Facebook user data using the first party’s Facebook access . These scripts are embedded on a total of 434 of the top 1 million sites, including fiverr.com, bhphotovideo.com, and mongodb.com. We detail how we discovered these scripts in Appendix 1 below. Most of them grab the user ID, and two grab additional profile information such as email and username. We believe the websites embedding these scripts are likely unaware of this particular data access .
The user ID collected through the Facebook API is specific to the website (or the “application” in Facebook’s terminology), which would limit the potential for cross-site tracking. But these app-scoped user IDs can be used to retrieve the global Facebook ID, user’s profile photo, and other public profile information, which can be used to identify and track users across websites and devices .
|Company||Script Address||Facebook Data Collected|
|OnAudience*||http://api.behavioralengine.com/scripts/be-init.js||User ID (hashed),
Email (hashed), Gender
|Lytics||https://c.lytics.io/static/io.min.js (loaded via OpenTag)||User ID|
|ProPS^||http://st-a.props.id/ai.js||User ID (has code to collect more)|
* OnAudience stopped collecting this information after we released the results of a previous study in the No Boundaries series, which showed them abusing browser autofill to collect user email addresses.
^ Although we observe these scripts query the Facebook API and save the user’s Facebook ID, we could not verify that it is sent to their server due to obfuscation of their code and some limitations of our measurement methods.
While we can’t say how these trackers use the information they collect, we can examine their marketing material to understand how it may be used. OnAudience, Tealium AudienceStream, Lytics, and ProPS all offer some form of “customer data platform”, which collect data to help publishers to better monetize their users. Forter offers “identity-based fraud prevention” for e-commerce sites. Augur offers cross-device tracking and consumer recognition services. We were unable to determine the company which owns the ntvk1.ru domain.
Vulnerability 2: Tracking users around the web with the Facebook Login service
Some third parties use the Facebook Login feature to authenticate users across many websites: Disqus, a commenting widget, is a popular example. However, hidden third-party trackers can also use Facebook Login to deanonymize users for targeted advertising. This is a privacy violation, as it is unexpected and users are unaware of it. But how can a hidden tracker get the user to Login with Facebook? When the same tracker is also a first party that users visit directly. This is exactly what we found Bandsintown doing. Worse, they did so in a way that allowed any malicious site to embed Bandsintown’s iframe to identify its users.
We discovered that the iframe injected by Bandsintown would pass the user’s information to the embedding script indiscriminately. Thus, any malicious site could have used their iframe to identify visitors. We informed Bandsintown of this vulnerability and they confirmed that it is now fixed.
This unintended exposure of Facebook data to third parties is not due to a bug in Facebook’s Login feature. Rather, it is due to the lack of security boundaries between the first-party and third-party scripts in today’s web. Still, there are steps Facebook and other social login providers can take to prevent abuse: API use can be audited to review how, where, and which parties are accessing social login data. Facebook could also disallow the lookup of profile picture and global Facebook IDs by app-scoped user IDs. It might also be the right time to make Anonymous Login with Facebook available following its announcement four years ago.
 Steven Englehardt is currently working at Mozilla as a Privacy Engineer. He coauthored this post in his Princeton capacity, and this post doesn’t necessarily represent Mozilla’s views.
 We use the term “vulnerability” to refer to weaknesses arising from insecure design practices on today’s web, rather than its commonly understood sense in computer security of weaknesses arising due to software bugs.
 In this post we focus on websites which use Facebook Login, but the vulnerabilities we describe are likely to exist for most social login providers and on mobile devices. Indeed, we found scripts that appear to grab user identifiers from the Google Plus API and from the Russian social media site VK , but we limited our investigation to Facebook Login as it’s the most widely used social SDK on the web.
 In order to better understand the level of integration a third party has with the first party, we categorize scripts based on their use of the first party’s Application ID (or AppId), which is provided to Facebook during the login initialization phase to identify the site. Inclusion of a site’s application ID and initialization code in the third-party library suggests a tighter integration—the first party was likely required to configure the third-party script to access the Facebook SDK on their behalf. While application IDs aren’t meant to be secrets, we take the lack of an App ID to imply loose integration—the first party may not be aware of the access. In fact, all of the scripts in this category take the same actions when embedded on a simple test page with no prior business relationship.
. The following could indicate the first party’s awareness of the Facebook data access:
1) third-party initiates the Facebook login process instead of passively waiting for the login to happen; 2) third-party includes the unique App ID of the website it is embedded on. The seven scripts listed above neither initiate the login process, nor contain the app ID of the websites.
Still, it is very hard to be certain about the exact relationship between the first parties and third parties.
 The application-scoped IDs can be resolved to real user profile information by querying Facebook’s Graph API or retrieve the user’s profile photo (which does not even require authentication!). When security researchers showed that it is possible to map app-scoped IDs to Facebook IDs and download profile pictures Facebook responded as follows: “This is intentional behavior in our product. We do not consider it a security vulnerability, but we do have controls in place to monitor and mitigate abuse.” A Facebook interface with similar controls was reportedly used to harvest of 2 Billion Facebook users’ public profile data. Note that although the endpoint found by the researchers does not work anymore, the following endpoint still redirects to users’ profile page: https://www.facebook.com/[app_scoped_ID].
Appendix 1 — Measurement Methods
To study the abuse of social login APIs we extended OpenWPM to simulate that the user has authenticated and given full permissions to the Facebook Login SDK on all sites. We added instrumentation to monitor the use of the Facebook SDK interface (`window.FB`). We did not otherwise inject the user’s identity into the page, so any exfiltrated personal data must have been queried from our spoofed API.
As in our previous measurements, we crawled 50,000 sites from the Alexa top 1 million in June 2017. We used the following sampling strategy: visit all of the top 15,000 sites, randomly sample 15,000 sites from the Alexa rank range [15,000 100,000), and randomly sample 20,000 sites from the range [100,000, 1,000,000). This combination allowed us to observe the attacks on both high and low traffic sites. On each of these 50,000 sites we visited 6 pages: the front page and a set of 5 other pages randomly sampled from the internal links on the front page.
To spoof that a user is logged in, we create our own `window.FB` object and replicate the interface of version 2.8 of the Facebook SDK. The spoofed API has the following properties:
- For method calls that normally return personal information we spoof the return values as if the user is logged in and call and necessary callback function arguments.
- These include `FB.api()`, `FB.init(), `FB.getLoginStatus()`, `FB.Event.subscribe()` for the events `auth.login`, `auth.authResponseChange`, and `auth.statusChange`, and `FB.getAuthResponse()`.
- For the Graph API (`FB.api`), we support most of the profile data fields supported by the real Facebook SDK. We parse the requested fields and return a data object in the same format the real graph API would return.
- For method calls that don’t return personal information we simply call a no-op function and ignore any callback arguments. This helps minimize breakage if a site calls a method we don’t fully replicate.
- We fire `window.fbAsyncInit` once the document has finished loading. This function is normally called by the Facebook SDK.
Appendix 2 — Third parties which access the Facebook API on behalf of first parties
We also found a number of third-party scripts interacting with the Facebook API, which appear to be operating on behalf of the first party . These companies offer a range of services, such as integrating multiple social login options, monitoring social media engagement, and aggregating customer data. As a specific example, BlueConic offers a Facebook Profile transfer service, that copies information from the user’s Facebook profile information to BlueConic’s data platform. Additional third-party services which access Facebook profile information on the first party’s behalf include: Zummy, Social Miner, Limespot (personalizer.io), Kissmetrics, Gigya, and Webtrends.
Image assets used in figures are from the Noun Project: