Overview
It’s time to give back to the open-source community again. This time, our infra specialist, Dan, has created a clever way to solve Kubernetes GPU monitoring. In this article, we discuss what the solution is, how it works, and why it’s important for the open-source community.
Github page: https://github.com/ConfidentialMind/cm-purplepill
Current Problems with Kubernetes GPU Monitoring
- Standard Kubernetes GPU monitoring tools only track pods with explicit
nvidia.com/gpu
resource requests. - Many applications utilize GPUs without declaring these resource requests.
- There are significant monitoring gaps where GPU usage is effectively “invisible.”
The Solution: CM PurplePill
CM PurplePill is a lightweight Prometheus exporter for NVIDIA GPU metrics in Kubernetes that tracks pod-level GPU usage without requiring explicit GPU resource declarations.
It’s important because tools like NVIDIA DCGM can’t map per-pod GPU usage without explicit nvidia.com/gpu
declarations. That means you can’t reliably use features like vLLM parallel GPU sharing and assign less than 100% of a GPU while still observing correct per-pod usage. With CM PurplePill, you are free to use any parallelism parameter combinations while still monitoring actual per-pod GPU usage. CM PurplePill is also much lighter than NVIDIA DCGM and relies on fewer components.
Key Value
- Complete visibility into all GPU workloads, including those without resource declarations.
- Lightweight solution with a small operational footprint.
- Full control over the monitoring stack.
- Simple deployment as a DaemonSet on GPU-enabled nodes.
Note: The current release supports NVIDIA only, but with slight modifications we will support AMD and other GPUs in upcoming releases.
Features
- Exposes GPU metrics in Prometheus format.
- Shows per-pod usage of GPUs.
- Does not rely on GPU resource declarations in the Pod manifest (*).
- Can show GPU usage by pods that claim less than a whole GPU (*).
- Not limited to a particular GPU vendor (*).
- Can run as a Kubernetes DaemonSet or as a host-level service.
- With slight modification, can monitor GPU usage in any containerized environment (not limited to Docker-like runtimes).
* Unlike the NVIDIA DCGM Prometheus metrics exporter.
Core Metrics (Explained)
CM_PURPLEPILL_GPU_MEMORY_TOTAL_MIB
— Total GPU memory.CM_PURPLEPILL_GPU_MEMORY_USED_TOTAL_MIB
— Total used memory.CM_PURPLEPILL_GPU_MEMORY_FREE_MIB
— Free memory.CM_PURPLEPILL_GPU_UTILIZATION
— GPU utilization percentage.CM_PURPLEPILL_GPU_MEMORY_USED_POD_MIB
— Pod-specific memory usage.
Deployment Options
1) Kubernetes DaemonSet (Recommended)
All-in-one container deployment in Kubernetes. Runs on GPU nodes with node selector:
- Requires
hostPID
access to monitor processes. - Works with Prometheus Operator’s
ScrapeConfig
. Prometheus Operator - Uses standard NVIDIA software for Kubernetes hosts.
2) Direct Host Installation
Deployable as a systemd service or Docker container; installable via pip or from source.
Minimal dependencies
- Python 3.7+
- NVIDIA drivers with the
nvidia-smi
tool - No external Python packages (standard library only)
Install docs & unit file: Installation (pip) · systemd service
CM PurplePill vs. NVIDIA DCGM
Digital sovereignty benefits explained:
Factor | CM PurplePill — Open Source | NVIDIA DCGM — Proprietary |
---|---|---|
Implementation Control | Open architecture with visibility into monitoring logic | Black-box implementation |
Vendor Independence | Adaptable for non-NVIDIA GPUs by modifying collection layer | NVIDIA-specific |
Customizability | Easily modifiable for specific environments | Configuration limited to provided options |
Conclusion
CM PurplePill offers complete visibility into all GPU workloads, including those without resource declarations — a long-standing challenge in Kubernetes. With its lightweight design and small operational footprint, it ensures efficient monitoring without GPU usage gaps. It provides full control over the monitoring stack and a simple DaemonSet deployment model for fast integration.
Start here: ConfidentialMind/cm-purplepill