ENGLISH
ENGLISH




26.01.2021

The source of everything:
forensic examination of incidents involving source code leaks

Anatoly Tykushin
Digital Forensic Analyst at Group-IB
Introduction
Seven of the ten largest businesses by market cap in 2020 were IT companies. Some offer on-premises or cloud software or products, and the source code is one of their most valuable assets. Like any corporate critical asset, source code is of huge interest to threat actors. Compromising the source code opens up possibilities for attackers, who can subsequently sell it to the company's competitors or other cybercrime/APT groups to conduct "white-box" security research and identify system vulnerabilities. Product builds provide an opportunity to deploy the product in private networks, perform black-box security analysis, and resell pirated copies.

Security incidents involving source code occur all the time. Big-name companies are no exception: Nissan source code was leaked in early 2021 after a Git repository misconfiguration, Microsoft 10 source code and builds were leaked, and Watch Dogs: Legion source code ended up online after Ubisoft was hit with Egregor ransomware.

During a recent incident response engagement, I and other Group-IB DFIR experts discovered that adversaries had accessed a misconfigured CI server to steal product source code by launching a CI pipeline and downloading build artifacts from the job workspace.

Security misconfiguration (e.g., public-facing web-service) and weak authentication (no two-factor authentication and weak passwords or insecure cryptographic storage) are most often to blame.

The abovementioned incident prompted me to write this blog post and share our experience. While the vectors and attack scenarios might be different, what is always the same is the financial and reputational losses as a result of source code compromise. This article covers the sources of digital forensic evidence and investigation patterns for various "atomic" activities of development team tools in case of source code and related incidents. Atomic activity is a plain action made by human (e.g., user login attempt, API token creation) or automated scripts (e.g., git push web-hooks). Such activities help the investigators identify whether the attacker used credential access or persistence techniques. The post contains recommendations on secure product development practices. Security Operation Centers (SOCs) and blue teams in organizations can use this article to improve their security monitoring and detection measures.

For the sake of convenience, we will use the following: GitLab CE (community edition) version 13.7.1 as VCS (version control system) and Jenkins (LTS) version 2.263.1 as CI. According to surveys, these two tools are the most popular among developers and DevOps engineers.

Acknowledgements
I would like to thank my colleague Vladislav Azerskiy for contributing to current research by performing forensic analysis of Jenkins CI and developing the examination plan.
GitLab CE
As a VCS, GitLab stores source code and provides a web-hook mechanism for integrating with continuous integration and delivery systems.

As a continuous integration and delivery solution, GitLab builds software on pre-configured agents, stores build artifacts, and further deploys the code to the staging/production servers based on the written pipeline. We focus on attempts to access/download build artifacts and on various pipeline execution evidence stored on the CI server.
Sources of digital forensic evidence

Thanks to official documentation, we were able to identify the list of artifacts that are valuable for the investigation.

GitLab provides detailed documentation on the log system. By default, GitLab logs almost everything and generates tons of logs.
We are interested in:

  • GitLab-Nginx logs GitLab Docs link - access logs that don't contain a request body. They are switched off by default
  • "production_json" logs (/var/opt/gitlab/gitlab-rails/production_json.log) GitLab Docs link - json formatted data about all performed requests
  • "application_json" logs (/var/opt/gitlab/gitlab-rails/application_json.log) GitLab Docs link - json-formatted audit logs (creating users, removing projects, etc.)
  • "api_json" logs (/var/opt/gitlab/gitlab-rails/api_json.log) GitLab Docs link - direct requests to the API

These logs can be either pre-configured (installation from the source code or a Docker container) or turned on after installing them from a package. We assume that everything is pre-configured in our installation.

Git repository metadata stored by the Gitaly engine (commit history, branch tree, pull request, etc.) does not store any digital evidence related to IP thefts.
Investigation patterns for various "atomic" user actions

In this section we will cover several "atomic" events that will help us to build the puzzle of the incident picture:

  • user login - initial access
  • failed user login - credential access (password guessing) user creation event - persistence
  • user api token creation - persistence
  • git repo clone event - data exfiltration
User login event

User login event is handled by "SessionsController". The sequence of authentication is: (1) GET the request to receive the sign-in web page where the user enters their credentials and (2) POST the request with authentication data after clicking the "Sign in" button.

GET request:

{
  "method": "GET",
  "path": "/users/sign_in",
  "format": "html",
  "controller": "SessionsController",
  "action": "new",
  "status": 200,
  "time": "2021-01-05T14:43:59.570Z",
  "params": [],
  "remote_ip": "172.17.0.1",
  "user_id": null,
  "username": null,
  "ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0)
Gecko/20100101 Firefox/83.0",
  "correlation_id": "01EV9G2CDE60H8KFZGRCX09E9W",
  "meta.caller_id": "SessionsController#new",
  "meta.feature_category": "authentication_and_authorization",
POST request:

{
  "method": "POST",
  "path": "/users/sign_in",
  "format": "html",
  "controller": "SessionsController",
  "action": "create",
  "status": 302,
  "location": "http://gitlab.test.net/",
  "time": "2021-01-05T14:44:05.694Z",
  "params": [
{
"key": "utf8", "value": "✓"
}, {
      "value": "[FILTERED]"
    },
    {
      "key": "user",
      "value": {
        "login": "root",
        "password": "[FILTERED]",
        "remember_me": "0"
"key": "authenticity_token",
} }
  ],
  "remote_ip": "172.17.0.1",
  "user_id": 1,
  "username": "root",
  "ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0)
Gecko/20100101 Firefox/83.0",
  "correlation_id": "01EV9G2J6ST7SR3VEZNVX8H1R8",
  "meta.user": "root",
  "meta.caller_id": "SessionsController#create",
  "meta.feature_category": "authentication_and_authorization",
After successful login user is redirected to URI / with handling controller RootController and username field set up.

{
  "method": "GET",
  "path": "/",
  "format": "html",
  "controller": "RootController",
  "action": "index",
  "status": 200,
  "time": "2021-01-05T14:44:05.784Z",
  "params": [],
  "remote_ip": "172.17.0.1",
  "user_id": 1,
  "username": "root",
  "ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0)
Gecko/20100101 Firefox/83.0",
Another source of evidence is application_json.log that has rather simpler log string.

{"severity":"INFO","time":"2021-01-
05T14:44:05.613Z","correlation_id":"01EV9G2J6ST7SR3VEZNVX8H1R8","
message":"Successful Login: username=root ip=172.17.0.1
method=standard admin=true"}
Failed user login event (brute-force)

The only difference in the production_json.log log entry from successful login attempt is in the POST request status field. Other fields in JSON structure remain the same.

"status": 0
However, the "application_json.log" message is much more informative than the "status" field in "production_json.log".

 {"severity":"INFO","time":"2021-01-
05T15:06:15.406Z","correlation_id":"01EV9HB4TXPTXT5V030S0M37KG","
message":"Failed Login: username=ferdinand ip=172.17.0.1"}
User creation event

New user creation event is handled by the controller "Admin::UsersController". First, user opens web-page with URI "/admin/users/new" to enter data about a new user. Log entry also has additional data such as "username", client's "remote_IP" address and user agent "ua" fields.

{
"method": "GET",
"path":"/admin/users/new",
"format":"html",
"controller":"Admin::UsersController",
"action":"new",
"status":200,
"time":"2021-01-05T12:47:29.765Z",
"params":[],
"remote_ip":"172.17.0.1",
"user_id":1,
"username":"root",
"ua":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0)
Gecko/20100101 Firefox/83.0",
"correlation_id":"01EV99D1Y1JCDHFPM06J7QZEPA"...
After forms are filled, user clicks button "Create", thus, generates POST request. The "production_json.log" entry analysis is shown below.

{
	"method": "POST",
	"path": "/admin/users",
	"format": "html",
	"controller": "Admin::UsersController",
	"action": "create",
	"status": 302,
	"location": "http://gitlab.test.net/admin/users/ferdinand",
	"time": "2021-01-05T12:49:08.116Z",
	"params": [
...
		{
			"key": "authenticity_token",
			"value": "[FILTERED]"
		},
		{
			"key": "user",
			"value": {
				"name": "Ferdinand",
				"username": "ferdinand",
				"email": "ferdinand@test.net",
				"projects_limit": "10",
				"can_create_group": "1",
...
				"note": "[FILTERED]"
			}
		}
	],
	"remote_ip": "172.17.0.1",
	"user_id": 1,
	"username": "root",
	"ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0",
	"correlation_id": "01EV99G210H6TN462XKAY0G7HE",
After successful user creation ("action" field value), we can see that the "status" field equals to "302", which means a redirect to the user profile web page "/admin/users/ferdinand".

{
        "method": "GET",
	"path": "/admin/users/ferdinand",
	"format": "html",
	"controller": "Admin::UsersController",
	"action": "show",
	"status": 200,
	"time": "2021-01-05T12:49:08.423Z",
	"params": [
		{
			"key": "id",
			"value": "ferdinand"
		}
	],
	"remote_ip": "172.17.0.1",
	"user_id": 1,
	"username": "root",
	"ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0",
	"correlation_id": "01EV99G2GXPJWY46V286HDQZCW",
Using verbose logging, it is possible to find out the list of users created by a privileged account within a specific period of time or during an active session.

The "application_json.log" file is much simpler, but it does not mention the creator.

{"severity":"INFO","time":"2021-01-
05T12:01:20.627Z","correlation_id":"01EV96RJ0KKE2SQVJ6726W6A5Z","
message":"Successful Login: username=root ip=172.17.0.1
method=standard admin=true"}
{"severity":"INFO","time":"2021-01-
05T12:49:07.994Z","correlation_id":"01EV99G210H6TN462XKAY0G7HE","
message":"User \"Ferdinand\" (ferdinand@test.net) was created"}
User API token creation

To create a token, it is necessary to visit the "User Settings" page --> "Access Tokens", enter name and expiration date, and set the token scope (permissions). It leaves the following traces in the logs.

{
        "method": "POST",
	"path": "/-/profile/personal_access_tokens",
	"format": "html",
	"controller": "Profiles::PersonalAccessTokensController",
	"action": "create",
	"status": 302,
	"location": "http://gitlab.test.net/-/profile/personal_access_tokens",
	"time": "2021-01-05T15:17:31.666Z",
	"params": [
		{
			"key": "utf8",
			"value": "✓"
		},
		{
			"key": "authenticity_token",
			"value": "[FILTERED]"
		},
		{
			"key": "personal_access_token",
			"value": "[FILTERED]"
		}
	],
	"remote_ip": "172.17.0.1",
	"user_id": 2,
	"username": "ferdinand",
	"ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0",
	"correlation_id": "01EV9HZSBJMDKJF6M541YT0TJJ",
	"meta.user": "ferdinand",
	"meta.caller_id": "Profiles::PersonalAccessTokensController#create",
	"meta.feature_category": "authentication_and_authorization",

As regards the "production_json.log" POST request, we could not find any token-specific data here.

However, the "application_json" log contains "token_id", timestamp and "correlation_id" that are identical to the log entry in "production_json.log".

{
	"severity": "INFO",
	"time": "2021-01-05T15:17:31.664Z",
	"correlation_id": "01EV9HZSBJMDKJF6M541YT0TJJ",
	"message": "PAT CREATION: created_by: 'ferdinand', created_for: 'ferdinand', token_id: '1'"
}
Unfortunately, there was no sign of a token name in the GitLab logs.
Git repository clone event

GitLab provides two ways to clone repositories: web GUI and CLI using "git clone". There are also a couple authentication methods, but we do not cover them in this section.

Clicking the web GUI button generates GET request that leaves a trail in "production_json.log".

{
	"method": "GET",
	"path": "/root/secure-service/-/archive/master/secure-service-master.zip",
	"format": "zip",
	"controller": "Projects::RepositoriesController",
	"action": "archive",
	"status": 200,
	"time": "2021-01-06T13:25:27.449Z",
	"params": [
		{
			"key": "namespace_id",
			"value": "root"
		},
		{
			"key": "project_id",
			"value": "secure-service"
		},
		{
			"key": "id",
			"value": "master/secure-service-master"
		}
	],
	"remote_ip": "172.17.0.1",
	"user_id": 1,
	"username": "root",
	"ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0",
	"correlation_id": "01EVBXZ9HK0QN0W6C00EECV8KX",
According to the user agent, the "Secure-service" project master branch was downloaded in zip format by the user "root" from the IP address 172.17.0.1 using the Mozilla Firefox web browser.

When using "git clone" command, we get the following `production_json.log` entry:

{
	"method": "POST",
	"path": "/root/secure-service.git/git-upload-pack",
	"format": null,
	"controller": "Repositories::GitHttpController",
	"action": "git_upload_pack",
	"status": 200,
	"time": "2021-01-06T13:58:37.138Z",
	"params": [
		{
			"key": "repository_path",
			"value": "root/secure-service.git"
		}
	],
	"remote_ip": "172.17.0.1",
	"user_id": 1,
	"username": "root",
	"ua": "git/2.17.1",
	"correlation_id": "01EVBZW0EX9WFV6YRAVN78NAYT",
In this case, we get quite obvious differences: "Repositories::GitHttpController" and action "git_upload_pack". Unfortunately, we won't be able to distinguish "git clone", "git fetch" or "git pull" as they trigger the same "git_upload_pack" action.
GitLab summary

The GitLab log system is detailed and fine-grained. It makes it possible to correlate events between log files. Logs are stored in both csv space-delimited and json formats.
Jenkins
Jenkins CI provides similar capabilities to GitLab CI: it runs CI jobs on agents, stores build artifacts, and deploys code to the staging/production servers based on the written pipeline. We focused on attempts to access/download build artifacts and various pipeline execution evidence stored on the Jenkins side.
Sources of digital forensic evidence

The list of digital evidence sources is shown below:

  • web access logs based on winstone.accesslog.SimpleAccessLogger ("/var/log/jenkins/access_log") to track incoming web requests
  • Jenkins log: ("/var/log/jenkins/jenkins.log")
  • Job logs: "/var/lib/jenkins/jobs/<repo>/builds/<build number>/log"
  • Jenkins configuration data "/var/lib/jenkins/config.xml"
  • "thinBackup" directories with Jenkins configuration data (plugin ThinBackup must be installed link "/var/lib/jenkins/thinBackup//")
Investigation patterns for various "atomic" user actions

We examined the following "atomic" events:

  • user login event: initial access
  • failed user login: credential access (brute-forcing)
  • user creation event: persistence
  • build launch: execution
  • webhook trigger: execution
  • workspace access: collection
  • download files from workspace: data exfiltration
User login

172.17.0.19 - - [08/Jan/2021:15:18:41 +0000] "GET /login HTTP/1.1" 200 887 "http://jenkins.test.net:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"
172.17.0.19 - - [08/Jan/2021:15:18:45 +0000] "POST /j_acegi_security_check HTTP/1.1" 302 0 "http://jenkins.test.net:8080/login?from=%2F" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"
172.17.0.19 - - [08/Jan/2021:15:18:45 +0000] "GET / HTTP/1.1" 200 4713 "http://jenkins.test.net:8080/login?from=%2F" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"

Unfortunately, even with Audit Trail plugin installed and configured, we still obtain poor logs ("/var/logs/jenkins/audit.log.0"):

Jan 8, 2021 3:01:12,718 PM- - -/login by 172.17.0.19
Jan 8, 2021 3:04:02,768 PM- - -/logout by forensics

Unfortunately, System Log Recorders (Manage Jenkins --> System Log), for example based on the Logger "jenkins.security.SecurityListener", shows username login and logout, is stored in the memory, and is cleared after the service is stopped.
Failed user login

172.17.0.19 - - [07/Jan/2021:09:19:37 +0000] "GET / HTTP/1.1" 200 4855 "http://jenkins.test.net:8080/loginError" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"
As we can see, no user-defining data is present.

Audit Trail plugin logs are of no use ("/var/logs/jenkins/audit.log.0"):

Jan 8, 2021 2:31:43,972 PM- - -/loginError by 172.17.0.19
Custom-defined delimiter in the Audit Trail logs is "- - -".
User creation

There is no data about the user who initiated the creation procedure. There are also no log entries related to the event type. During user creation procedure, Jenkins creates the user's configuration file "$JENKINS_HOME/users/<username>_<number>.xml" and modifies "$JENKINS_HOME/users/user.xml" file.

172.17.0.19 - - [08/Jan/2021:15:01:55 +0000] "GET /securityRealm/ HTTP/1.1" 200 3128 "http://jenkins.test.net:8080/manage" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"

172.17.0.19 - - [08/Jan/2021:15:01:58 +0000] "GET /securityRealm/addUser HTTP/1.1" 200 3036 "http://jenkins.test.net:8080/securityRealm/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"

172.17.0.19 - - [08/Jan/2021:15:02:54 +0000] "POST /securityRealm/createAccountByAdmin HTTP/1.1" 302 0 "http://jenkins.test.net:8080/securityRealm/addUser" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"

172.17.0.19 - - [08/Jan/2021:15:02:54 +0000] "GET /securityRealm/ HTTP/1.1" 200 3187 "http://jenkins.test.net:8080/securityRealm/addUser" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"

Credential access

By design, Jenkins security does not allow viewing credentials (aka secrets). The secret is stored in encrypted form and never displayed as plain text in the job log or as a credential update. Nevertheless, credentials can be used within the defined scope. For example, someone could launch a pipeline project using a GUI-defined pipeline with stored credentials and achieve the desired result. An example of a credential list view is shown in the log entry below:

127.0.0.1 - - [11/Jan/2021:14:02:35 +0000] "GET /credentials/store/system/domain/_/credential/jenkins-repo-check-user HTTP/1.1" 200 3377  "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"
Build launch/Job start

Job start event entry in the log "/var/log/jenkins/access_log":

172.17.0.19 - - [08/Jan/2021:14:32:27 +0000] "POST /job/secure-service/build HTTP/1.1" 201 0 "http://jenkins.test.net:8080/job/secure-service/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"
Let's look into the job log "/var/lib/jenkins/jobs/secure-service/builds/14/log" to obtain more data about the executed build job (find the match with the file creation timestamps in the directory "/var/lib/jenkins/jobs/secure-service/builds").

Started by user unknown or anonymous
Replayed [8mha:////4J4SQwCqwxREHW2nZ0MF69r6vTOLthEAyGyQqr/fST8EAAAAox+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzmEgZx/az8JP3i1OTSolTd4tSisszkVH1DI30A/pRzc8wAAAA=[0m#12
Checking out git http://gitlab.test.net/root/secure-service into /var/lib/jenkins/workspace/secure-service@script to read Jenkinsfile
The recommended git tool is: NONE
using credential jenkins-repo-check-user

Interesting data can be found in the audit log: parameters passed to the job start ("/var/logs/jenkins/audit.log.0"):

Jan 11, 2021 4:24:00,010 PM- - -secure-service #14 Started by user anonymous, Replayed #12, Parameters:[] on node #unknown# started at 2021-01-11T16:23:56Z completed in 3717ms completed: FAILURE
You might see the replay of the build job #12 in the log.
Workspace access

Traces of CI job workspace access in the Jenkins access log:

172.17.0.19 - - [07/Jan/2021:15:06:36 +0000] "GET /job/secure-service/12/execution/node/3/ws/ HTTP/1.1" 200 3380 "http://jenkins.test.net:8080/job/secure-service/12/ws/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"
Workspace stores build artifacts such as the git repository (in our case, the GitLab project "secure-service").
Download files from workspace

If the build artifact was downloaded from the workspace, the following entry is written in the access log.

172.17.0.19 - - [07/Jan/2021:15:06:44 +0000] "GET /job/secure-service/12/execution/node/3/ws/*zip*/secure-service.zip HTTP/1.1" 200 21411 "http://jenkins.test.net:8080/job/secure-service/12/execution/node/3/ws/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0"
The existence of the substring "*zip*/secure-service.zip" in the URL points to a download attempt by an unknown user using the IP address 172.17.0.19.
Checking security settings

To make sure that anonymous access to the project was prohibited, "$JENKINS_HOME/config.xml" should be checked. Access control is represented in xml form. If it was changed, "$JENKINS_HOME/thinBackup/<timestamp>/config.xml" must be checked.
Jenkins summary

The Jenkins log system is poorer than GitLab's. Several activities can be recovered, except for specific details such as username. Most of the digital evidence should be correlated manually by IP address or timestamp.
Conclusion
There are many attack scenarios. However, all of them consist of various atomic events. Some actions (such as user login, failed user login, user creation, and token creation) help investigators identify whether the attacker used credential access or persistence techniques. Data exfiltration events (such as git repository clone/push or artifacts download) in correlation with employee interviews help us investigate the circumstances of the IP theft. Default configuration in the GitLab log system makes it possible to conduct an in-depth investigation, while the Jenkins log system is relatively difficult and requires correlating logs and filesystem artifacts.
Recommendations:

By implementing security monitoring and configuring correlation and alerting rules, SOCs and Blue Team engineers can improve security controls over the company's or organization's infrastructure. To ensure a secure product development, build and delivery cycle, keep the following in mind:

  • Identify the most important assets (assess your risk)
  • Build your development, testing, and production infrastructure based on an approach of zero trust and minimal access rights
  • Implement secrets (credential) storage with fine-grained permissions and usage audit
  • Develop and apply a password policy that includes an internal procedure to detect weak passwords (e.g., brute-force)
  • Implement a secrets retention and rotation policy (including for developer accounts)
  • Use only well-known tools and follow best practices relating to security configurations
  • Forward your tool logs to a Log Management System or SIEM
  • Perform regular security scans and patch management system to reveal and fix any security issues relating to the tools you use.
Group-IB's services help enhance the capabilities of security teams. Group-IB Pre-IR Assessment helps ensure that your security monitoring coverage is comprehensive. Group-IB RedTeaming reveals security misconfigurations that could lead to source code leaks. If an incident has already occurred, our DFIR services will help you identify the attacker's TTPs (tactics, techniques and procedures) and we will provide a detailed report. We will also provide a consultation about how to improve security controls and offer recommendations on how to avoid similar security incidents in the future.