Working of Docker
Credits to LiveOverflow Youtube How Docker Works - Intro to Namespaces
Key Concepts
What Are Namespaces?
- A namespace is a "private space" in Linux.
- Docker uses namespaces to isolate environments for each container.
- This isolation makes containers feel like separate machines, but they are not full virtual machines (VMs).
When You Run a Docker Container
- Docker creates a set of namespaces for isolation:
pid
: isolates processesnet
: isolates network interfacesmnt
(mount): isolates filesystem mount points
Real Example
- Inside container: user is
www-data
with UID 1000 - On host: user is
user
with same UID 1000 - Same UID, different usernames due to
/etc/passwd
files in different environments
Processes View
- Inside container: fewer visible processes (isolated view)
- On host: all processes visible including container ones
- Same process has different PIDs inside vs outside container (e.g.,
watch
)
Who Spawns Container Processes
systemd
: starts Linux systemdockerd
: Docker daemon started by systemdcontainerd
: manages container lifecyclerunc
: spawns actual container processes
What is runc
- A CLI tool that follows OCI specs
- Directly responsible for setting up namespaces using Linux syscalls
Behind The Scenes: Using strace
- Use
strace -f -p <pid>
to trace syscalls made bycontainerd
- Observe the creation and management of namespaces
Key Syscall: unshare()
- Used to isolate parts of a process environment
- Example:
CLONE_NEWPID
isolates process ID namespace - First child becomes PID 1 inside the container
Process ID Flow
- runc calls
unshare()
- Then uses
clone()
to create new PID namespace - New process gets PID 1 inside container
- Host sees different PID (e.g., 29866), container sees PID 1
Mount and Network Namespaces
CLONE_NEWNS
: isolates mount points (filesystem)CLONE_NEWNET
: isolates network stack
Checking Namespaces
Use:
readlink -f /proc/<pid>/ns/*
- Shows all namespace identifiers for a given process
- Compare host and container processes to confirm isolation
User Namespace
CLONE_NEWUSER
allows UID/GID remapping- A process can be root (UID 0) inside but remain unprivileged outside
- In this example, UID mapping was not used, so 1000 was same inside and outside
Lab:
- To prove that Docker uses Linux namespaces to isolate containers and to show how a container shares the same kernel with the host but operates in a separate environment.
To fix the Docker permission error using chmod 666 (not recommended for production, but fine for a this lab).
# Fix permission only if required.
sudo chmod 666 /var/run/docker.sock
# Start container with a background sleep process
echo "[+] Starting test container..."
docker run -dit --name test-ns alpine sleep 10000
# Install procps and start a background 'watch' process with TERM fix
echo "[+] Installing procps and starting 'watch' in background inside container..."
docker exec test-ns sh -c "export TERM=xterm && apk add procps && watch 'ps aux' > /dev/null &"
# Get PID of 'watch' inside the container
echo "[+] Getting PID of 'watch' inside the container:"
docker exec test-ns pgrep watch
# Get container's main PID as seen by the host
echo "[+] Getting container's main PID on the host:"
docker inspect --format '{{.State.Pid}}' test-ns
# Show namespaces of container process from host
echo "[+] Showing namespace IDs of container process from host:"
readlink -f /proc/$(docker inspect --format '{{.State.Pid}}' test-ns)/ns/*
# Show namespaces of current shell on host
echo "[+] Showing namespace IDs of current host shell:"
readlink -f /proc/$$/ns/*
Explaination
- Start a container with sleep 10000
- Install procps and run watch inside container
- Get the PID of watch inside container
- Get the main PID of the container from the host
- Check namespaces of container process on the host
- Check namespaces of your current shell (host)
- Compare both outputs
You’ll see different namespace IDs → this proves containers are isolated using namespaces.
Reference
- https://www.youtube.com/watch?v=-YnMr1lj4Z8