Working of Docker

Credits to LiveOverflow Youtube How Docker Works - Intro to Namespaces

Key Concepts

What Are Namespaces?

  • A namespace is a "private space" in Linux.
  • Docker uses namespaces to isolate environments for each container.
  • This isolation makes containers feel like separate machines, but they are not full virtual machines (VMs).

When You Run a Docker Container

  • Docker creates a set of namespaces for isolation:
    • pid: isolates processes
    • net: isolates network interfaces
    • mnt (mount): isolates filesystem mount points

Real Example

  • Inside container: user is www-data with UID 1000
  • On host: user is user with same UID 1000
  • Same UID, different usernames due to /etc/passwd files in different environments

Processes View

  • Inside container: fewer visible processes (isolated view)
  • On host: all processes visible including container ones
  • Same process has different PIDs inside vs outside container (e.g., watch)

Who Spawns Container Processes

  • systemd: starts Linux system
  • dockerd: Docker daemon started by systemd
  • containerd: manages container lifecycle
  • runc: spawns actual container processes

What is runc

  • A CLI tool that follows OCI specs
  • Directly responsible for setting up namespaces using Linux syscalls

Behind The Scenes: Using strace

  • Use strace -f -p <pid> to trace syscalls made by containerd
  • Observe the creation and management of namespaces

Key Syscall: unshare()

  • Used to isolate parts of a process environment
  • Example: CLONE_NEWPID isolates process ID namespace
  • First child becomes PID 1 inside the container

Process ID Flow

  • runc calls unshare()
  • Then uses clone() to create new PID namespace
  • New process gets PID 1 inside container
  • Host sees different PID (e.g., 29866), container sees PID 1

Mount and Network Namespaces

  • CLONE_NEWNS: isolates mount points (filesystem)
  • CLONE_NEWNET: isolates network stack

Checking Namespaces

Use:

readlink -f /proc/<pid>/ns/*
  • Shows all namespace identifiers for a given process
  • Compare host and container processes to confirm isolation

User Namespace

  • CLONE_NEWUSER allows UID/GID remapping
  • A process can be root (UID 0) inside but remain unprivileged outside
  • In this example, UID mapping was not used, so 1000 was same inside and outside

Lab:

  • To prove that Docker uses Linux namespaces to isolate containers and to show how a container shares the same kernel with the host but operates in a separate environment.

To fix the Docker permission error using chmod 666 (not recommended for production, but fine for a this lab).

# Fix permission only if required.
sudo chmod 666 /var/run/docker.sock

# Start container with a background sleep process
echo "[+] Starting test container..."
docker run -dit --name test-ns alpine sleep 10000

# Install procps and start a background 'watch' process with TERM fix
echo "[+] Installing procps and starting 'watch' in background inside container..."
docker exec test-ns sh -c "export TERM=xterm && apk add procps && watch 'ps aux' > /dev/null &"

# Get PID of 'watch' inside the container
echo "[+] Getting PID of 'watch' inside the container:"
docker exec test-ns pgrep watch

# Get container's main PID as seen by the host
echo "[+] Getting container's main PID on the host:"
docker inspect --format '{{.State.Pid}}' test-ns

# Show namespaces of container process from host
echo "[+] Showing namespace IDs of container process from host:"
readlink -f /proc/$(docker inspect --format '{{.State.Pid}}' test-ns)/ns/*

# Show namespaces of current shell on host
echo "[+] Showing namespace IDs of current host shell:"
readlink -f /proc/$$/ns/*

Explaination

  • Start a container with sleep 10000
  • Install procps and run watch inside container
  • Get the PID of watch inside container
  • Get the main PID of the container from the host
  • Check namespaces of container process on the host
  • Check namespaces of your current shell (host)
  • Compare both outputs

You’ll see different namespace IDs → this proves containers are isolated using namespaces.

Reference

  • https://www.youtube.com/watch?v=-YnMr1lj4Z8