Researchers Bob Diachenko and Vinny Troia discovered an unsecured Eslasticsearch server containing an unprecedented 4 billion user accounts.
The database, discovered on October 16, 2019, contained more than 4 terabytes of data is the largest data leaks from a single source organization in history.
The leaked data contained names, email addresses, phone numbers, LinkedIn and Facebook profile information.
According to the researchers, it contains personal and social information that appears to originate from 2 different data enrichment companies.
“The discovered Elasticsearch server containing all of the information was unprotected and accessible via web browser at http://18.104.22.168:9200. No password or authentication of any kind was needed to access or download all of the data.” reads the post published by the experts.
“The majority of the data spanned 4 separate data indexes, labeled “PDL” and “OXY”, with information on roughly 1 billion people per index. Each user record within the databases was labeled with a “source” field that matched either PDL or Oxy, respectively.”
Researchers believe the data in the PDL indexes originated from People Data Labs, a data aggregator and enrichment company.
The archive contained nearly 3 billion PDL user records associated with roughly 1.2 billion unique people. The archive included 650 million unique email addresses, the data belonging the three different PDL indexes were respectively scraped from LinkedIN (i.e. Email addresses and phone numbers), and social media profiles such as a person’s Facebook, Twitter, and Github URLs.
The experts reported their findings to PDL that replied that the exposed Elasticsearch instance doesn’t belong to them.
“In order to test whether or not the data belonged to PDL, we created a free account on their website which provides users with 1,000 free people lookups per month.“
“The data discovered on the open Elasticsearch server
“When I checked my account on PeopleDataLabs.com, the returned results were identical – including that phone number.
Since I have never seen this phone number appear in any of my previously breached/leaked records, this is a very good indication that the leaked database originated from PDL.” continues the post.
The exposed archive also includes records that appear to belong to the data enrichment company OxyData.io.
The “Oxy” database contained records scraped from LinkedIn, including recruiter information. Once notified of the discovery, OxyData told the researchers that the server did not belong to it.
The researchers speculate that the server was operated by an organization that is a customer of both People Data Labs and OxyData,
“If this was a customer that had normal access to PDL’s data, then it would indicate the data was not actually “stolen”, but rather
“Because of obvious privacy concerns cloud providers will not share any information on their customers, making this a dead end.
Agencies like the FBI can request this information through
(SecurityAffairs – Elasticsearch server, social information)