In a multi-tenant CI/CD environment, developers trust and delegate CI/CD systems to deploy their applications to production. But, what is the basis of this trust? How the trust is enforced from commit-to-deploy? What is the trustworthiness of the application deployed by CI/CD through automation? This talk highlights security risks with CI/CD deployments and offer solutions to mitigate those risks
25. Common Threats
◎ User account compromise & insider
threats.
○ User and platform admins.
◎ Network Intrusion.
○ CI/CD internal and external endpoints.
26. Build Slave Compromise
Large attack surface, spread across multiple networks (iphone, Android,
Server app etc.)
How?
◎ A network level compromise, exploiting a vulnerability in build slave.
◎ Jobs break out of build container.
Impact
◎ Access to production servers.
◎ Listen to the network, spoof identity and access unauthorized data.
28. SSH over Unrestricted Shell
Allows arbitrary commands to get
executed on a remote host.
29. Building External Code
An attacker can take this path to get into internal
networks, either by adding backdoors or exploiting
known vulnerabilities with open source software.
35. ◎ Events:
○ Commit trigger.
○ Manual trigger from build UI.
○ Automated/cron job.
○ Trigger a downstream job.
◎ Upstream service stores downstream service credentials
(OAuth, Shared Keys etc.).
◎ Equal trust on all components in the pipeline
40. Do not pull PII or other sensitive info to build
machine.
Minimal Builds
41. Few more...
SSH: Use
Restricted
Shells
Headless SSH access for
automated deployment
should use a restricted
shell.
Roll Keys
Periodically
Establish a process to
periodically roll trust anchor
keys (and do it periodically).
Restrict Job
Console Logs
Restrict build job console logs
only to authorized users
Enable 2FA
Admins must follow good
security hygiene and use
2FA to access platform
application and hosts
Prune Admin
Access List
Keep admin list small for
build systems and Git repo
access
Vulnerability
Patch Mgmt
Maintain an inventory of all
packages in use and have a
mechanism to patch the
system in response to a
disclosure
43. Let’s recap major concepts
Ephemeral
Keys
Ephemeral keys are the
future. Service providers
should start supporting
ephemeral keys for
authorization
Stateless Auth
Architecture
Augment the trust dependency
of 1:1 relationship between the
pipeline components with
workflow job tokens.
Audit Logs
A verifiable chain of trust
based on traceable audit
logs is a foundational
requirement for CI/CD.
Minimal Builds
Avoid pulling PII or other
production sensitive data to
build environments. Keep
the builds to minimum
Network
Segmentation
Network level Isolation of CI/CD
machines from other machines
ToolChain
Hardening
Build tools and Docker
containers must be
adequately hardened.
46. Use
◎ Source code
◎ Pre-built packages
Risks
◎ Targeted back doors
◎ Vulnerabilities
Hard Problem: Securing Supply Chain
This threat is more to do with applications than the
CI/CD platform itself.
Open source components constitute a large part of
modern Internet based applications.
48. Credits
Special thanks to all the people who made and
released these awesome resources for free:
◎ Presentation template by SlidesCarnival
◎ Photographs by Unsplash & Death to the Stock
Photo (license)
Editor's Notes
Securing application deployments in multi-tenant/shared CI/CD environments.
Need to make sure the integrity of the application deployed through a multi-tenant CI/CD platforms. The focus is to reduce the large attack surface area with a shared CI/CD platform and
Here is the overview of what we are covering in today’s talk.
We start with a brief introduction to CI/CD systems, and the trends we are experiencing in past few years.
Define high level security objectives of this effort.
Next, we discuss the threat modeling in detail, talks about actors, trust boundaries, attack surface, and enumerate major threats
The last section shares security pattern that matters to a multi-tenant CI/CD platform.
This is a one minute introduction about CI/CD platform. CI/CD is a software engineering practice that consists of three steps or levels. Each level adds incremental maturity to the overall software development process.
The first step - CI, is all about making incremental changes to the code, merge, build and unit test several times a day. CI helps development teams to identify and fix integration errors early in the development life cycle with minimum effort.
Continuous Delivery is the extension to CI. It establishes a repeatable process to build, test and release deployable artifacts, typically to package repositories such as npm, artifactory, docker image repos etc.
Continuous Deployment - The step after Continuous Delivery automates the deployment of applications to production.
Many organizations deploy their applications multiple times a day. Some even deploy thousands of times a day.
From a security perspective, I would like to classify deployments into two.
Single tenant
Multi-tenant
The differentiation is important because these changes have a big impact on security. Our session is focused on the security of multi-tenant platform.
Single-tenant build systems have been there for more than a decade. In this model, each team owns a dedicated set of build machines to build their applications.
As part of the execution, build job requires access to protected services like source & package repositories. The common practice is to embed keys into these systems.
This is not a big threat because
(1) these systems are distributed and dedicated to each team, The access is restricted to team members.
(2) The impact of a compromise expose only a limited set of hosts.
In last few years, the industry is experiencing a major shift/trend towards multi-tenant shared build environments.
One of the factors behind the consolidation is the economies of scale.
A shared platform enables resource pooling of development and support resources, hardware. It also allow enterprise to centrally enforce development standards, security and compliance controls.
With centralization, we create a build corridor that acts as a gateway to production networks.
The earlier mode of embedding access keys inside build system is not safe because of its shared nature. Consolidation of keys makes the platform an attractive target for attackers.
This model also make some security controls obsolete. For example - an IP whitelist protected endpoint. Imagine your build job needs to access an IP whitelist-protected endpoint. By whitelisting a build slave IP, it allows other build jobs running on the same host to access protected resource. This was not an issue with dedicated build environments.
Here is a simple view of a modern CI/CD platform. Typically a workflow starts with a code commit from a developer. A code commit or a PR merge triggers a component build job. The job will get executed on a build slave, and creates a deployable artifact. A subsequent build triggered by component build notify the deployer. The deployer pulls artifacts and deploys the applications to production.
The previous slide provides a simplified view of a multi-tenant CI/CD platform. However in practice these systems are fairly complex with multiple inter-dependent sub-systems.
We logically grouped these systems into five.
Build platform
Source and deployable artifacts store
Identity and auth services
Platform and application services - saucelabs, your deployed applications etc.
Deployment environment, consists of deployer and product hosts. The deployer takes request from a build job and deploy requested application to production machines.
Let’s explain with a simple flow. The developer commits the code that triggers a build. In Yahoo’s case, we have a workflow engine, built on top of Jenkins and is designed to build, test and deploy software at scale. The workflow engine schedules and tracks build jobs. The build job will be executed inside a Docker container. As part of build execution, build jobs require credentials and keys to access protected services (package repos, database access, Saucelabs etc), and SSH access. The component job will build and publish the application, while subsequent job deploys the application in different environments (QA, stage, beta, prod etc.)
We briefly touched up on security risks associated with operating such system at scale in an enterprise.
This section talks about security properties or requirements for a shared build platform.
The system should allow a legitimate product engineer/team to safely build and deploy applications to production. To differentiate legitimate and illegitimate, we need to have controls to prevent unauthorized use of CI/CD as a platform to modify applications and/or reach target production hosts.
Image: https://www.dreamstime.com/royalty-free-stock-photography-sketch-piping-design-mixed-industrial-equipment-photos-photo-image32609927
Modern CI/CD platform spans across multiple trust boundaries, operated by different organizations.
How do we trust the application deployed through CI/CD platform?.
For that we need a way to establish verifiable chain of trust from commit to deploy.
This helps with monitoring and detect unauthorized activities within the platform.
===
Image source: https://www.dreamstime.com/stock-photo-generations-four-women-aging-young-to-old-image50612988#res16633042"
Just because you run build job on a shared platform, the job should not have elevated privileges, and not cross the streams. For instance, The build jobs should not break out of their trust boundary and gain elevated privileges.
Image source: http://www.istockphoto.com/photo/vip-pass-exclusive-access-gm523819320-92053949?st=_p_vippass
This section talks actors, trust boundaries, attack surface, and enumerate major threats we identified as part of the exercise.
We identified three types of actors related to this platform.
Developers are the users who own repos and do frequent code commits. These are the ones who make use of the platform to build and deploy their application.
Platform developers will have limited access to some or all components of the platform.
Admins, who support the platform have exclusive access to all systems and applications.
Icons: https://www.iconfinder.com/icons/532716/api_coding_configuration_development_html_programming_window_icon#size=128
Trust boundaries
https://www.flickr.com/photos/nasamarshall/14596371842
Trust boundary is distinct boundary within which a system trusts all sub-systems (including data). It enables implicit trust between its sub-systems.
The build web interface is the application trust boundary of a CI/CD platform. It includes both UI and REST APIs.
Data crossed the trust boundary through Jenkins master is trusted by slave.
The slave also entrust the master to do authentication.
The build jobs are executed inside a container on slave machines. The containers act as a trust boundary between host OS and build job.
Network segmentation provides network level isolation between CI/CD platform and other machines.
Co-locating CI/CD machines with less trusted machines increase security risks, because of the some implicit trust between systems within the same network trust boundary.
Segmentation using network or host based firewalls.
Discuss the entry and exit points of the CI/CD system.
An application entry point also serves as entry point for attackers.
The exit point is also important, because it may leak sensitive information outside.
Image: http://www.mazegenerator.net/
Entry points are interfaces to the platform.
Web interface is used to create a build project, start builds and view build status and console logs.
Platform also operates commit and build notification handlers to track commits and build status respectively
External source and package repos is another way to get into the system. Build often pull packages directly from public repositories. (Internet).
Internal endpoints that are exposed by sub-components (eg master, slave endpoints, etc.
This is important because the attacker may manage to bypass the external entry points and directly attack internal endpoints.
NOTE: The protection mechanism used by internal endpoints are often influenced by trust boundaries. A false assumption on trust boundary expose internal endpoints to various attacks.
Exit points are important, because it may leak confidential information to outside.
Think it as the trash bin that kept outside of your house.
There are 2 cases here
(1) unintentional exposure of sensitive data to console logs and build notifications
(2) manipulation of data or packages, exploiting vulnerabilities that exist elsewhere that leaks sensitive data. For example - a PR request build. Anyone can submit a PR. If the attacker modifies the code to print credential or a key material to build console, he can can easily steal it from the build’s console log, which is publicly/readable by all in most cases.
As part of the threat modeling exercise, we already discussed about actors, trust boundaries and attack surface of CI/CD platform.
This section talks about major threats.
When we talk about user account compromise, we have multiple user roles associated with this role.
With a developer account compromise; attacker may use stolen credentials to:
Modify source code in Git
Build and push malicious code to production
A CI/CD platform developer account compromise - attacker may gain access to few of the CI/CD machines. In most cases, that should be sufficient for an attacker to reach production machines.
An admin account compromise would be the most dangerous one. This will give complete access to all CI/CD machines, and can gain access to production systems.
Network Intrusion:
A network intruder gain access to CI/CD components due to an application vulnerability (eg RCE, web vulnerabilities, lack of authentication or authorization) or lack of network segmentation etc.
Denial of service issues - Abuse that affects other builds and deployments
====
The build slaves constitute more than 90% of the CI/CD platform that spread across multiple clusters, often on multiple networks. The build jobs run inside a container or a VM on a build slave.
How to compromise?
A build job that is running inside a container can break out of the container
A network level compromise, by exploiting a vulnerability on a build slave or through unprotected internal endpoints.
Impact:
To reach target production hosts
Spoof other jobs, access sensitive materials etc.
Locally storing long-lived keys. These keys are required to access protected services behalf of the build jobs. We have seen two patterns here.
Use of shared key. In this model, the build use the same key for all jobs to push their artifacts to artifact repository.
Use of job specific keys: In this model, build jobs do not share keys. An example is OAuth tokens commonly used with hosted CI platform.
The per-job key makes harder to spoof the identity of other jobs, However, the impact of a security compromise is same for both shared key and job specific keys.
SSH is one of the widely used mechanism to access a remote system.
For automated deployments, we use SSH with a headless user and often it requires sudo access to deploy applications.
By using unrestricted shells like bash shells, we are allowing the headless user to run arbitrary commands as root.
This is a clear violation of principles of least privilege.
Open source/untrusted packages and code is pulled into the build system at build time, and executed. This can be explicit via opensource.git, or implicit via, e.g., npm module installation / activation scripts.
To make CI/CD platform secure, where do we start? We expect to have the baseline security control enabled for all systems in 2016, not just CI/CD platform. Let’s review those controls.
Jenkins endpoints
This last section discuss about security patterns that can significantly reduce the security risk with operating a multi-tenant CI/CD platform.
Ephemeral keys are the future. There are few factors driving behind this:
(1) Detecting a key compromise is hard, and there is a good chance that a key compromise may go undetected for long period of time.
(2) The second issue is with key revocation. Key revocation is an equally hard problem, especially with environments that span multiple trust boundaries.
(3) As we move more towards cloud/multi-tenant based solutions, delegation has become a necessity. For fully automated deployments, you delegate CI/CD platform to build, release and deploy your application behalf of you. To delegate a job, the recommended practice is to use scoped ephemeral tokens instead of using a long-lived keys or credentials.
Amazon short lived tokens, JWT, SSH-CA, short lived certificates are all in the right direction
Stateless auth architecture.
Modern CI/CD environment spread across multiple trust boundaries and operated by different organizations. So the question is: How a job’s identity and capabilities are delegated/ propagated through this pipeline? Let’s examine this in next few slides.
Every build is triggered by some events. We can relate an event to a commit trigger, in response to a code commit, a manual build trigger from build UI, an event generated from an automated cron job or even a event from upstream job to start a downstream job.
Here is the abstract representation of a pipeline. In this diagram we can see each component stores creds to access downstream component. As we can see, it puts equal trust on all components. Now we have to deal with a large attack surface area, which is a big security risk.
The job perm and capabilities are delegated through the 1:1 trust relationship that exists between components, and is transitive in nature.
In current model, we need to equally trust each component in the pipeline, which is risky from a security perspective. Your build system is not designed to be a security system. One remediation we propose is to reduce Trusted Computing Base (TCB) footprint to few security augmented dedicated hosts.
The second diagram shows the desired state. In this model, we have a new dedicated trust anchor that authenticates events, and create a cryptographically signed job token. The job token is ephemeral and is delegated to the downstream services. This token may get mutated, but ultimately should be able to track it back to the original build event. A downstream service honor only the requests with a valid job delegation token. This forms the chain of trust from commit to deploy.
You can think about this model, very similar to the cookie based stateless authentication. You servers get session cookie only when a user initiates a request. This server may pass the cookies to downstream services (eg backend servers), and once you serve the request, you discard the cookie.
This is a significant change from the existing model. This will disrupt existing practices, but I expect providers to start thinking on these lines.
Immutable, append-only audit trails help us to monitor build events and its correlations.
However the challenge is - how to consolidate and correlate audit trails from sources operated by different parties.
It would be great if we can have a way to track events and its subsequent actions in one central place.
A state machine built around the auth system would be helpful to track build events, and its states from commit to deploy.
==
Audit trails produced by auth component is especially useful because it keeps track of all build events.
Hardening toolchain and build containers.
[1] Hardening the build containers strengthen the trust boundary that isolates untrusted build jobs from build control plane.
For example, docker build container should not run in privileged mode, and run it with minimum capabilities.
[2] Toolchain - is the use of secure default configurations that integrate with available platform security to build packages.
Example: building openssl package for your application, disabling insecure features - SSLv2, SSLv3, compression etc.
A secure toolchain is not a silver bullet. It is one piece of an overall strategy in the engineering process to help ensure success.
[3] Another aspect is the signing of packages. But the question is who keep the signing key and what is the assertion? Buld job, build platform or both?
Ref: https://github.com/GDSSecurity/Docker-Secure-Deployment-Guidelines
Couple of reasons to zone CI/CD platform from other systems.
From a hosted platform perspective, it is running an untrusted code. But that should not allow a rogue or a compromised job to use build salve as a jump host to reach internal networks.
Sharing the network with less trusted systems increase the risk of a network attack against CI/CD platform. It also help limit the access to an insider, partner or a third party.
The platform may employ either network or host based firewalls.
We all have complex build pipelines, but we need to limit what we pull/expose into our build jobs. If you are running integration tests or complex functional tests that require direct access to production services, it may expose production data or other sensitive materials to build platform.
The recommendation is to use build job as a trigger to initiate complex tests outside build environment on a separate environment that you control, and track the test status.
So far we covered the main points, however in practice, we still need to do more to protect our platform.
We have few more, mostly OpSec related which are equally important.
We widely use SSH, but 99% percent of the time we use unrestricted shells. The use of unrestricted shells with sudo privileges is very dangerous because an unrestricted shell like bash allows the headless user to execute arbitrary commands on connected hosts. The recommendation is to use a shell that allow only whitelisted commands to execute. One example chef deploy. To deploy your application using chef, one common pattern is to SSH to application hosts and run chef-client that converge the host with chef server. Just to run a chef-client on a target host, why do we need an unrestricted shell?
Trust anchor keys should be rolled periodically. Though we focus on using ephemeral keys, we still need to use long-lived keys in few places, and we need to roll the keys periodically.
The job console job, for example what we see in jenkins, may contain sensitive info depending. Keeping it public may leak this info.
Enable 2FA. The admins must follow good security hygiene and use 2FA to access production hosts.
Prune admin lists
Patch management is also an important part in recovering from an public disclosure.
I would take this opportunity to acknowledge and appreciate their contributions and support
The summary of recommendations.
I would the this opportunity to acknowledge their appreciate their contributions and support
Supply chain is one of the main risk components
Business risk
Over 30% of Official Images in Docker Hub Contain High Priority Security Vulnerabilities
https://banyanops.com/blog/analyzing-docker-hub/
Tech debt and security
http://devops.com/2015/05/01/security-devops-and-the-shift-to-a-software-supply-chain/