Group-IB unveils its Graph

The story about Group-IB searching for graph analysis solution and creating its own unique instrument
Dmitry Volkov,
Group-IB CTO and Head of Threat Intelligence
Throughout years of investigating phishing, botnets, fraudulent transactions, and cybercriminal groups, Group-IB experts have used graphs to detect such threats. Each case has its own set of data, its own algorithm to establish links, which can be visualized in a graph, an instrument that once used to be Group-IB's internal tool, available only to the company's staff.

Network graph analysis became Group-IB's first internal tool to be incorporated into the company's public products – Threat Intelligence, Threat Detection System, Secure Bank and Brand Protection service. Before creating the network graph, we analysed several solutions available on the market and failed to find a single product that met our expectations. In this article, we describe how we created the network graph, the ways in which we use it, and the challenges it presents.
What can be achieved thanks to a network graph?
Since Group-IB was established in 2003, the company's top priorities have been identifying and de-anonymising criminals and bringing them to justice. With time, all cybercrime investigations came to be carried out at the same time as analysis of attacker network infrastructure. In our early years, this analysis was a painstaking ordeal to establish links that might help identify criminals, i.e. information about domains, IP addresses, and server fingerprints.
Most attackers do their best to remain anonymous online. Like everyone else, however, they make mistakes. The main goal of network graph analysis is to track down projects that cybercriminals carried out in the past — legal and illegal projects that bear similarities, links in their infrastructure, and connections to the infrastructure involved in the incident being investigated. In illegal projects, a threat actor aims to hide their identity, but all the hackers are ordinary people and some of them might also have some legal projects online - forums or ecommerce websites. If a cybercriminal's previous legal project is detected, identifying him or her becomes simple. If only illegal projects are detected, however, it takes much more time and effort to identify the cybercriminals because they always attempt to anonymise or hide registration data.
Nevertheless, our chances of identifying the hackers remain high. As a matter of fact, hackers pay less attention to their personal security and make more mistakes when they first embark on their criminal path, which means that the success of our investigation depends on how far the traces we find date back to. That is why graphs with in-depth retro-analysis are crucial in such investigations. In short, the more historical data a company has, the more effective its graph will be. For example, the data for a five-year period could help solve one or two crimes out of ten, while data for a period of 15 years could help solve all ten crimes.
Detection of phishing and fraud
Every time we detect a suspicious link to a phishing, fake, or pirated website, we create a graph of linked web resources and check all the detected hosts for similar content. This helps us find both old phishing pages (which remained active but undetected) and new phishing pages (which were created for future attacks but have not been used yet). A common example: we find a phishing website belonging to a server that hosts five websites and a quick check shows that all the remaining pages have phishing content as well. As a result, we block five web pages instead of one.
Search for backends
This process is necessary to find out where the malicious server is located. As much as 99% of cardshops, underground hacker forums, phishing pages, and other malicious servers use both their own proxy servers and proxy servers provided by legitimate services, including Cloudflare. Information about a genuine backend is extremely important for incident response: as soon as a hosting provider is identified, we are able to seize the malicious server and establish links with other malicious projects.
For example, we find a phishing site that collects bank credentials and resolves to the IP address, and a cardshop page that resolves to IP-address As a result of the analysis, we might find out that the phishing page and the cardshop have the same backend IP address, e.g. This information helps us establish links between the phishing attacks and the cardshop, which might be used to sell the compromised credentials.
Event correlation
When we come across two different types of alerts (IDS alerts, for example) involving the use of different malware and C&C servers, we usually view them as two independent events. If there are strong links between two malicious infrastructures, however, it becomes obvious that these are not two independent attacks but the stages of a single multiphase attack. If one of these attacks has already been attributed to a cybercriminal group, the same can be done with the second attack. The fact that the attribution process is far more complicated should be taken into account, and this is just one example.
Indicator enrichment
This matter does not require too much attention given that indicator enrichment is the most common scenario for the use of network graph analysis in the cybersecurity world: there is one indicator at the input and a set of linked indicators at the output.
Pattern identification
Pattern identification is essential for effective threat hunting. Graphs help researchers not only find linked elements but also determine common features that characterise a given cybercrime group. Knowledge of such unique characteristics helps cybersecurity researchers identify attacker infrastructure at the preparation stage and without evidence confirming the attack, such as phishing emails or malware.
Why did we create our own network graph?
It is worth reiterating that we considered solutions designed by various companies with the view to adopting them as part of our work before we decided to create our own tool that would be capable of something no other solution could do. It took several years to develop it and we have since completely redesigned it, several times. Despite the considerable time it took us to create the network graph, we have no regrets: we have yet to find an analogue that meets all our expectations. Thanks to our own product, we eventually managed to solve all the problems detected in existing network graphs. Let's take a closer look to these problems:

How does our graph work?
To start using the network graph, an IP address, an email address, or an SSL certificate fingerprint should be typed in the search bar. Analysts can regulate three parameters: the timeframe, the number of steps, and the refine option.
The timeframe is a specific date or a period when the searched element was used for malicious purposes. If this parameter is not defined, the system will determine the period when this web resource was controlled by cybercriminals for the last time. For example, on 11 July, Eset released a report about Buhtrap using a zero-day exploit for cyberespionage. In its paper, the company also included six indicators of compromise. One of them ("secure-telemetry[.]net") was re-registered on 16 July. As a result, if you try to create a graph for the period after 16 July, the system will show irrelevant results. As soon as you specify that this domain was registered before that date, however, the graph reveals 126 new domains and 69 IP addresses that were not included in Eset report:








and others.

Apart from network indicators, we can also see links to malicious files related to this infrastructure and tags, which suggests that Meterpreter and AZORult were used.

What makes graph analysis so convenient is the fact that all this information is obtained in just a second and it is no longer necessary to spend days analysing data. This can save much time during incident response, which is always crucial.

Number of steps
The default number of steps set in the settings is three. This means that the searched element will be connected with all the elements that have direct links to it and each new element will then be linked to other elements, which in turn will be linked to the third series of elements.

Let's take the example that does not relate to APT or zero-day exploits. The technological website Habr has recently published a story about cryptocurrency fraudulent schemes. In particular, the report mentioned the domain "themcx[.]co", which was used by cybercriminals to host what was disguised as Miner Coin Exchange, and "phone-lookup[.]xyz", used to attract traffic.

The scheme used a large infrastructure to attract traffic to fraudulent websites. Group-IB decided to examine this infrastructure using a graph based on four steps. As a result, we obtained a graph with 230 domains and 39 IP addresses. We divided the domains into two groups: the first one for pages disguised as services linked to cryptocurrencies and the second for pages linked to generating traffic.
Refine option
In default settings, the refine option is active and removes all irrelevant elements from the graph. Coincidentally, it was used in all the above-mentioned examples. This begs an obvious question: how can we ensure that some crucial elements are not removed from the graph? Analysts who prefer creating graphs manually can turn "refine" off and use one step to build the graph. As a result, the analyst will be able to build the graph further and remove irrelevant elements from it.

The built graph has some additional tools, namely the "whois" change history, DNS, and open ports and services launched within.
Financial phishing

Group-IB has monitored the activities of an APT group that launched phishing attacks on clients of various banks for several years. The defining feature of this APT group was registering domains resembling real names of banks, while the phishing pages had similar interfaces. The pages differed from each other through bank names and logos.
In this case, automated graph network analysis greatly facilitated our work. Based on the domain "lloydsbnk-uk[.]com", we built a graph with three steps within seconds. The graph identified more than 250 malicious domains that the group had used since 2015. Some of the domains have been purchased by the banks, but their history shows that they used to belong to cybercriminals.

The picture shows a graph with two steps.

In 2019, attackers modified their tactics and started registering not only bank domains for hosting phishing pages but also domains of various consulting companies in order to subsequently send out phishing emails, namely swift-department[.]com, saudconsultancy[.]com, and vbgrigoryanpartners[.]com.
Cobalt gang attack
In December 2018, the cybercriminal group Cobalt, which specialises in targeted attacks on banks, sent emails purporting to be from the National Bank of Kazakhstan.
An email purporting to be from the National Bank of Kazakhstan
The emails contained links to hXXps://nationalbank[.]bz/Doc/Prikaz.doc. The document contained a macros that launched powershell, which, in turn, tried to download and execute a file from hXXp://wateroilclub[.]com/file/dwm.exe in %Temp%\einmrmdmy.exe. The file %Temp%\einmrmdmy.exe aka dwm.exe is a CobInt stager, configured to work with the server hXXp://admvmsopp[.]com/rilruietguadvtoefmuy.

Imagine that there is no opportunity to obtain these phishing emails and thoroughly analyse the malicious files. The graph created based on the malicious domain "nationalbank[.]bz" immediately shows links to other malicious domains, attributes them to a specific group, and shows what files were used in the attack.
Let's take one of the elements of this graph, IP address 46.173.219[.]152, and create another graph based on it, with one step and with refined option turned on. This will lead to finding at least 40 new domains, including:

- bl0ckchain[.]ug
- paypal.co.uk.qlg6[.]pw
- cryptoelips[.]com.

Judging from the domain names, it seems that they are used in fraudulent schemes. The refine mechanism has determined, however, that they do not have any relation to this attack and they were therefore not included in the graph, which facilitated the analysis and attribution process.
If we create another graph using the domain "nationalbank[.]bz" but with the refine option turned off, there will be more than 500 elements on the graph, with most of them having no relation to either the cybercriminal group Cobalt or to the attacks. The picture below depicts such a graph:
After many years of delicate refining, tests in real incident response operations, and threat hunting, we have successfully not only developed a unique tool but also changed experts' attitude towards it within our company. At first, technical researchers tried to insist on total control over the graph creation process. It was extremely difficult to persuade them that creating a graph in automated mode can be more accurate than in manual one. Nevertheless, time and numerous manual checks of the graph analysis outcomes have done their magic. Our experts now not only trust the system but also use the analysis results in their daily work. Graph technology is used in all our solutions and helps improve the threat detection process. The interface for manual graph analysis has been integrated into our services and has considerably expanded threat and cybercrime hunting opportunities. The analysts working for our clients have confirmed this. For our part, we will continue to enrich our graph with data and design new algorithms using artificial intelligence to improve the graph's accuracy.