Kubernetes: debugging with ephemeral containers

Anyone who has ever had to manipulate Kubernetes have found themselves confronted with the resolution of pod errors. The methods provided for this purpose are effective and make it possible to overcome the most common errors. However, in some situations these methods are limited: debugging then becomes sensitive. During Cubecon 2022 in Valencia, presented by Cloud Native Computing FoundationI could attend Aaron Alpar's presentation about a new way to debug pods in Kubernetes that is in beta in its version 1.23: kubectl debug.

First, we'll see the classic methods for debugging pods. Then we will develop the concept of namespace. Finally, we will define what temporary containers are.

How to debug a pod?

Until now, after consulting the logs of a pod with kubectl log two solutions were available to debug deeper: exec and copy.

The first is in the form of:

kubectl exec         \
  -it                \ 
  -n <namespace_pod> \
  <pod>              \
  -c <container>     \ 
  -- /bin/sh           

This command opens a command prompt in the target container. The scope of the user's rights to issue commands will then depend on the Kubernetes role that the prompt started with. If your privileges are elevated, you'll be able to do almost anything in your container… as long as it knows how to do it. Containers are actually designed to be lightweight: each contains only its application and its dependencies. The tools necessary for effective troubleshooting will be useless because they do not exist. List files in a directory with lssearches for a specific file with find or change access rights on a file with chmod: all these actions will usually be possible because they are built into the container runtime system. On the other hand, a more advanced analysis of active network ports with netstator connection tests with curl most of the time will not be feasible.

The second command is in the following form:

kubectl debug                  \
  -it                          \ 
  -n <namespace_pod>           \
  <pod>                        \
  --copy-to=<pod_name>         \ 
  --container <container_name> \ 
  --image=busybox              \ 
  --share-processes            \ 
  -- /bin/sh                     

This command creates a new pod and restarts our application in a new container. A command prompt for our new container will then open. Being able to choose the image we choose here gives our new container relevant tools for troubleshooting. However, this method has two major disadvantages:

  • creating a new pod requires restarting the program
  • if it is a pod with replicates (for deployments and the stateful), this method can be dangerous because new replicas can be created involuntarily.

Linux namespaces

What is a container? The idea that we have of a container is sometimes not quite in line with reality. A container is a kind of sandbox whose isolation depends on a key feature of the Linux kernel: namespaces.

A namespace groups all processes that have a common view of a shared resource (for example, all processes in a container). Namespaces control the isolation of the container and its processes and delimit its resources: they are what prevent it from looking outside itself to the rest of the system. There is a namespace for each property of an environment:

  • mnt: isolates mount points
  • pid: isolates the process ID
  • net: isolates the network interface
  • ipc: isolates communication between processes
  • uts: isolates host and domain names
  • user: isolates user identification and privileges
  • cgroup: isolates process membership to a control group

The pid namespace, for example, allows the container to have its own process IDs, since it has no knowledge of the host's PIDs. In the same way uts namespace allows the container to have its own hostname, independent of the host computer. A container can belong to several types of namespaces: for example, it can have its own mount points and network interfaces. Additionally, these namespaces can be copied from one container to another.

Namespaces are used by all processes running on a computer. The /proc/ /ns/* folder contains all namespace related files for a process and the namespaces currently used by that process. Namespaces used by containers have a parent-child relationship with those on the machine: a parent namespace is aware of its children, while the reverse is not true. This can be checked with nsenter command, which allows you to run a command in a namespace (ie run from a shell in a parent namespace):

  --target <pid> \ 
  --all          \ 
  /bin/ps -ef      

This command lists all processes belonging to the namespaces used by the specified process. By specifying the PID of a container (ie, a process using a child namespace), we get the list of processes running in this container, from the host's point of view. Below is an example of this command to a pod with a PostgreSQL container, running from its host node:

nsenter --target $(pgrep -o postgres) --all /bin/ps -ef

nsenter output

If we then perform the same action but this time with kubectl exec, we get the list of processes running in this container, this time from the point of view of the container itself. Below is an example from the same PostgreSQL pod:

kubectl exec -it -n pg pg-postgresql -- ps -ef

kubectl exec output

We notice that the two lists are identical: the host is therefore aware of its child namespaces, so we say that the namespaces are shared.

Ephemeral containers

A temporary container is a new container placed in the same capsule as the target container. Since they are in the same pod, they share resources, which is ideal for tricky situations like debugging an immediately falling container.

The command to create a temporary container is as follows:

kubectl debug          \
  -it                  \ 
  -n <namespace_pod>   \
  <pod>                \
  --image busybox      \ 
  --target <container> \ 
  -- /bin/sh             

Once created, the temporary container appears in the spec: two new entries are then found in “container” and in “status”.

kubectl describes the output

It is then possible to list the active ephemeral containers with the following command:

kubectl get pod -n <namespace> <pod> -o json 
  | jq '{"ephemeralContainers": [(.spec.ephemeralContainers[].name)], "ephemeralContainersStatuses": [(.status.ephemeralContainersStatuses[].name]}'

When we create a temporary container this way, we notice that two namespaces differ from the original container: cgroup and mnt. This means that the resources related to all other namespaces are shared by the original container and its temporary version. These new containers make it possible to combine the integrity of the resources managed with a exec command and the tools available to the user with a copy command. In fact, the container generated with this last command would just have different namespaces than the original one.

The mnt namespace cannot be shared because some important mount points should not be shared. However, if any mount points identical to the original container are needed in your temporary container, it is still possible to mount them manually.


This new feature to Kubernetes standardizes a powerful and complete approach to pod troubleshooting, while addressing new tricky cases. In addition, it facilitates the democratization of so-called “distroless” containers, lighter containers that offer no debugging tools and are therefore faster to deploy. The tools would then become completely independent of production, in line with native cloud thinking.

#Kubernetes #debugging #ephemeral #containers

Source link

Leave a Reply