Running large numbers of containers to deploy an application requires a rethink of the role of the operating system. Google’s Container-Optimized OS and AWS’s Bottlerocket take the traditional virtualization paradigm and apply it to the operating system, with containers the virtual OS and a minimal Linux fulfilling the role of the hypervisor.
Various flavors of Linux optimized for containers have been around for a few years and have evolved ever smaller footprints as the management and user-land utilities moved to the cluster management layer or to containers. These container-optimized operating systems are ideal when you need to run applications in Kubernetes with minimal setup and do not want to worry about security or updates, or want OS support from your cloud provider.
Container OSs solve several issues commonly encountered when running large container clusters, such as keeping up with OS vulnerabilities and patching potentially hundreds of instances, updating packages while dealing with potentially conflicting dependencies, degraded performance from a large dependency tree, and other OS headaches. The job is challenging enough with a few racks of servers and nearly impossible without infrastructure support when managing thousands.
Bottlerocket is purpose-built for hosting containers in Amazon infrastructure. It runs natively in Amazon Elastic Kubernetes Service (EKS), AWS Fargate, and Amazon Elastic Container Service (ECS).
Bottlerocket is essentially a Linux 5.4 kernel with just enough added from the user-land utilities to run containerd. Written primarily in Rust, Bottlerocket is optimized for running both Docker and Open Container Initiative (OCI) images. There’s nothing that limits Bottlerocket to EKS, Fargate, ECS, or even AWS. Bottlerocket is a self-contained container OS and will be familiar to anyone using Red Hat flavors of Linux.
Bottlerocket integrates with container orchestrators such as Amazon EKS to manage and orchestrate updates, and support for other orchestrators can be adding by building variants of the operating system to add the necessary orchestration agents or custom components to the build.
Bottlerocket’s approach to security is to minimize the attack surface to protect against outside attackers, minimize the impact that a vulnerability would have on the system, and provide inter-container isolation. To isolate containers, Bottlerocket uses container control groups (cgroups) and kernel namespaces for isolation between containers running on the system. eBPF (enhanced Berkeley Packet Filter) is used to further isolate containers and to verify container code that requires low-level system access. The eBPF secure mode prohibits pointer arithmetic, traces I/O, and restricts the kernel functions the container has access to.
The attack surface is reduced by running all services in containers. While a container might be compromised, it’s less likely the entire system will be breached, due to container isolation. Updates are automatically applied when running the Amazon-supplied edition of Bottlerocket via a Kubernetes operator that comes installed with the OS.
An immutable root filesystem, which creates a hash of the root filesystem blocks and relies on a verified boot path using dm-verity, ensures that the system binaries haven’t been tampered with. The configuration is stateless and /etc/ is mounted on a RAM disk. When running on AWS, configuration is accomplished with the API and these settings are persisted across reboots, as they come from file templates within the AWS infrastructure. You can also configure network and storage using custom containers that implement the CNI and CSI specifications and deploy them along with other daemons via the Kubernetes controllers.
SELinux is enabled by default, with no way to disable it. Normally that might be a problem, but in the container OS use case relaxing this requirement isn’t necessary. The goal is to prevent modification of settings or containers by other OS components or containers. This security feature is a work in progress.
Bottlerocket open source
The Bottlerocket build system is based on Rust, which is fine considering there’s nothing to build except for support for Docker and Kubernetes. Rust just broke into the top 20 programming languages and seems to be gaining traction due to its C++ like syntax and automatic memory management. Rust is licensed under the MIT or Apache 2 license.
Amazon does a good job of leveraging GitHub for their development platform, making it easy for developers to get involved. The toolchain and code workflow will be familiar to any developer, and by design end users are encouraged to create variants of the OS. This is to cater to support for multiple orchestration agents. In order to keep the OS footprint as small as possible, each Bottlerocket variant runs on a specific orchestration plane. Amazon includes variants for Kubernetes and local development builds. You could, for example, create your own update operator or your own control container by changing the URL of the container.
Managing Bottlerocket instances
Bottlerocket isn’t intended to be managed with a shell. Indeed, there is little of the OS that requires management, and what is required is accomplished by the HTTP API, the command-line client (eksctl), or the web console.
To update you need to deploy an update container onto the instance. See the bottlerocket-update-operator (a Kubernetes operator) on GitHub. Bottlerocket accomplishes single-step updates using the “two partition pattern,” where the image has two bootable partitions on disk. Once an update has been successfully written to the inactive partition, the priority bits in the GUID partition table of each partition are swapped and the “active” and “inactive” partitions roles are reversed. Upon reboot, the system is upgraded, or, in the event of an error, rolled back to the last known-good image.
There are no packages that can be installed, only containers, and updates are image based, as in NanoBSD and other embedded operating systems. The reason behind this decision was explained by Jeff Barr, AWS evangelist:
Instead of a package update system, Bottlerocket uses a simple, image-based model that allows for a rapid and complete rollback if necessary. This removes opportunities for conflicts and breakage, and makes it easier for you to apply fleet-wide updates with confidence using orchestrators such as EKS.
To access a Bottlerocket instance directly you run a “control” container, which is managed by a separate instance of containerd. This container runs the AWS SSM agent so you can execute remote commands or start a shell on one or more instances. The control container is enabled by default.
There is also an administrative container that runs on the internal control plane of the instance (I.e. on a separate containerd instance). Once enabled, this admin container runs an SSH server that allows you to log in as ec2-user using your Amazon-registered SSH key. While this is useful for debugging, it is not really suitable for making configuration changes due to the security policies of these instances.