Introducing CM PurplePill: Kubernetes GPU Monitoring Solution

Overview

It’s time to give back to the open-source community again. This time, our infra specialist, Dan, has created a clever way to solve Kubernetes GPU monitoring. In this article, we discuss what the solution is, how it works, and why it’s important for the open-source community.

Github page: https://github.com/ConfidentialMind/cm-purplepill

Current Problems with Kubernetes GPU Monitoring

Standard Kubernetes GPU monitoring tools only track pods with explicit nvidia.com/gpu resource requests.
Many applications utilize GPUs without declaring these resource requests.
There are significant monitoring gaps where GPU usage is effectively “invisible.”

The Solution: CM PurplePill

CM PurplePill is a lightweight Prometheus exporter for NVIDIA GPU metrics in Kubernetes that tracks pod-level GPU usage without requiring explicit GPU resource declarations.

It’s important because tools like NVIDIA DCGM can’t map per-pod GPU usage without explicit nvidia.com/gpu declarations. That means you can’t reliably use features like vLLM parallel GPU sharing and assign less than 100% of a GPU while still observing correct per-pod usage. With CM PurplePill, you are free to use any parallelism parameter combinations while still monitoring actual per-pod GPU usage. CM PurplePill is also much lighter than NVIDIA DCGM and relies on fewer components.

Key Value

Complete visibility into all GPU workloads, including those without resource declarations.
Lightweight solution with a small operational footprint.
Full control over the monitoring stack.
Simple deployment as a DaemonSet on GPU-enabled nodes.

Note: The current release supports NVIDIA only, but with slight modifications we will support AMD and other GPUs in upcoming releases.

Features

Exposes GPU metrics in Prometheus format.
Shows per-pod usage of GPUs.
Does not rely on GPU resource declarations in the Pod manifest (*).
Can show GPU usage by pods that claim less than a whole GPU (*).
Not limited to a particular GPU vendor (*).
Can run as a Kubernetes DaemonSet or as a host-level service.
With slight modification, can monitor GPU usage in any containerized environment (not limited to Docker-like runtimes).

* Unlike the NVIDIA DCGM Prometheus metrics exporter.

Core Metrics (Explained)

CM_PURPLEPILL_GPU_MEMORY_TOTAL_MIB — Total GPU memory.
CM_PURPLEPILL_GPU_MEMORY_USED_TOTAL_MIB — Total used memory.
CM_PURPLEPILL_GPU_MEMORY_FREE_MIB — Free memory.
CM_PURPLEPILL_GPU_UTILIZATION — GPU utilization percentage.
CM_PURPLEPILL_GPU_MEMORY_USED_POD_MIB — Pod-specific memory usage.

Deployment Options

1) Kubernetes DaemonSet (Recommended)

All-in-one container deployment in Kubernetes. Runs on GPU nodes with node selector:

Requires hostPID access to monitor processes.
Works with Prometheus Operator’s ScrapeConfig. Prometheus Operator
Uses standard NVIDIA software for Kubernetes hosts.

2) Direct Host Installation

Deployable as a systemd service or Docker container; installable via pip or from source.

Minimal dependencies

Python 3.7+
NVIDIA drivers with the nvidia-smi tool
No external Python packages (standard library only)

Install docs & unit file: Installation (pip) · systemd service

CM PurplePill vs. NVIDIA DCGM

Digital sovereignty benefits explained:

Factor	CM PurplePill — Open Source	NVIDIA DCGM — Proprietary
Implementation Control	Open architecture with visibility into monitoring logic	Black-box implementation
Vendor Independence	Adaptable for non-NVIDIA GPUs by modifying collection layer	NVIDIA-specific
Customizability	Easily modifiable for specific environments	Configuration limited to provided options

Conclusion

CM PurplePill offers complete visibility into all GPU workloads, including those without resource declarations — a long-standing challenge in Kubernetes. With its lightweight design and small operational footprint, it ensures efficient monitoring without GPU usage gaps. It provides full control over the monitoring stack and a simple DaemonSet deployment model for fast integration.

Start here: ConfidentialMind/cm-purplepill