The GeekedIn recruitment project scraped user data from GitHub and other similar websites, but data were inadvertently leaked online.
The popular security expert Troy Hunt, who operates the data breach notification service the owner ‘Have I Been Pwned,’ recently received a 600 Mb MongoDB backup file containing data from a tech recruitment website called GeekedIn.
The analysis of the data dump revealed the presence of more than 8 million GitHub profiles, including names, email addresses, locations and other data.
“On Saturday, a character in the data trading scene popped up and sent me a 594MB file called geekedin.net_mirror_20160815.7z. It was allegedly a MongoDB backup from August belonging to a site I’d not heard of before, one called GeekedIn and they apparently do this:” reported Troy Hunt.
The archive includes over one million of valid email addresses, the remaining part is composed of email “firstname.lastname@example.org” evidently associated with GitHub accounts with no public email address.
The database also contains thousands of accounts apparently taken from BitBucket.
GeekedIn service crawls code hosting repositories, including GitHub and BitBucket, and creates profiles for open-source projects and developers. The GeekedIn data could be used by recruiters to hire developers whose skills match their needs.
The data gathered by the service are public and does not include any sensitive information such as passwords, but GitHub does allow users to use its data for commercial purposes.
The data was stored in a MongoDB database that was exposed on the internet and accessible to anyone.
“And again, yes, you can go and pull this data publicly on a per-individual basis but the constant response I got from close confidants I shared this information with is that ‘it just feels wrong’. And it is wrong, not just the scraping of GitHub in the first place in order to commercialise our information, but then subsequently losing it via a MongoDB with no password and now having it float around the web in data breach trading circles.” wrote Hunt.
At the time I was writing GeekedIn website is down.
In the last year, we observe several cases related to MongoBD databases poorly configured and exposed online. In July 2015, John Matherly, the creator of Shodan search engine for connected devices, revealed that many MongoDB administrators have exposed something like 595.2 terabytes of data by using bad poor configurations, or un-patched versions of the MongoDB.
Pierluigi Paganini is member of the ENISA (European Union Agency for Network and Information Security) Threat Landscape Stakeholder Group and Cyber G7 Group, he is also a Security Evangelist, Security Analyst and Freelance Writer.
Editor-in-Chief at "Cyber Defense Magazine", Pierluigi is a cyber security expert with over 20 years experience in the field, he is Certified Ethical Hacker at EC Council in London. The passion for writing and a strong belief that security is founded on sharing and awareness led Pierluigi to find the security blog "Security Affairs" recently named a Top National Security Resource for US.
Pierluigi is a member of the "The Hacker News" team and he is a writer for some major publications in the field such as Cyber War Zone, ICTTF, Infosec Island, Infosec Institute, The Hacker News Magazine and for many other Security magazines.
Author of the Books "The Deep Dark Web" and “Digital Virtual Currency and Bitcoin”.