Data leaks appearing on the dark web are an actual problem of the modern world. Events of the recent past show that even world-renowned enterprise, financial and IT giants are not immune to data leaks. Equifax, British Airways, SingHealth, Marriott International, Sephora, Canva, Zynga, Microsoft, Tokopedia, T-Mobile, Linkedin, Twitch — these are just a few names that have been spotted in high-profile data breach scandals in the past several years. Оne would hardly attempt to question if these companies had enough resources to ensure their security, but here is the fact — they all did fall victim to cyberattacks.
Even if the company's data was leaked not as a result of its own actions, but rather the actions of its contractor or a partner with which it shared data, the result will be the same. Data leaks can both disclose sensitive information about the company's internal processes affecting business decisions or reputation and personal data of the company's customers, making them turn their back on a brand or company that used to be their favorite. Both outcomes can bring the business at the verge of collapse, which is why one of the first instincts of breached companies, burning from thirst for revenge, is to find attackers behind the network compromise and data leak to dispense justice.
These data leaks can often end in the shadow part of the Internet. Underground resources, which are closed to unauthorized visitors, are rife with discussions that are directly related to planned or previously committed crimes. These resource regulars are attackers themselves, their accomplices looking for an opportunity to make some money, but also visitors who have their own agenda, though, distinct from others. This is the case for employees of law enforcement agencies and special services, as well as corporate and private security specialists. Under the guise of bad guys, they collect cyber intelligence data and research and analyze criminal activity in order to investigate and prevent cybercrimes.
An average underground forum has huge volumes of daily text messages going through it, which makes analyzing this information manually almost impossible and also ineffective. Algorithm development and machine learning (hereinafter - ML) implementation can break this stalemate, significantly reducing the amount of manual work and enabling analysts to connect the dots in the tremendous amount of data.
This article aims to show the methods cybersecurity analysts who come to the aid of compromised companies in such cases can use to, firstly, determine if an alleged data breach did take place and a database put up for sale in the dark web was authentic, and, secondly, identify the threat actor responsible. It demonstrates how machine learning algorithms can facilitate the processes of cyber intelligence data analysis and cyber investigations, while at the same time further enriching its results. And if it happens that you somehow embarked on the path of investigating a data leak, this guide will give you the ideas from what to begin and how to further proceed.
This guide is intended for:
- cybersecurity greeners who are doing their first steps in the cybersecurity world. Even if not all the terms and techniques described in the text are familiar to you, you can get your first impression of cyber investigation process and strengthen your intention to pursue a cybersecurity career;
- cybersecurity analysts and corporate security team members. From the text you'll learn the methods that can be used to probe into a data leak, even if for the time being you're sure that your customers are reliably protected;
- machine learning algorithm developers who will get a broader perspective of the cybercrime investigation process and be able to apply this knowledge in the future to advance the cybercrime investigation industry, making the process more efficient and prompt.
Thus, we'll focus on two major points:
- Real breach or fake. There are leaks that are purported to be new to the public, but in reality they turn out to be databases that were earlier released somewhere else or are new only to some extent, comprising both old and fresh data.
- Skilled threat actor or a newbie. Many attackers today use multiple accounts on the underground forums in order to better hide their activities. To determine the goal of an attack and proceed with its further probe, one has to know the adversary.