§ 1.3 · Notes From the Laboratory

A Comprehensive Guide to powercap-utils on Linux

Menra W. Romial
IMT ATLANTIQUE · INRIA STACK
Synopsis

The powercap framework is the Linux kernel’s standardised interface to hardware power limits — RAPL on Intel, AMD’s equivalent on newer Zen parts. This note walks through the userspace tooling, the sysfs layout, and the gotchas you’ll meet when you try to enforce a cap in anger. A companion bpftrace script is provided; it is imperfect; suggestions welcome.

1. Introduction

For a framework that directly controls watts, powercap is remarkably underdocumented. Most tutorials either copy the kernel docs verbatim or stop at cat /sys/class/powercap/.../energy_uj.

This note is the guide I wish I’d had in my first month. We start with what powercap actually is (a thin kernel abstraction over RAPL and friends), descend into the sysfs layout, and end with a small bpftrace script that logs every write to a power limit. If you’re here because you ran powercap-set once and got Permission denied, § 5 is for you.

2. The sysfs hierarchy

Everything lives under /sys/class/powercap/. Each control typeintel-rapl, intel-rapl-mmio, dram — exposes a tree of zones, each with energy_uj, constraint_0_power_limit_uw, and friends.

FIG 2 · SYSFS TREE DIAGRAM
Fig. 2. Anatomy of /sys/class/powercap on a dual-socket Xeon host. Note that MMIO and MSR zones can disagree — they usually shouldn’t.

The zones are hierarchical: package contains core, uncore, and dram. A cap at the package level implicitly bounds its children, but the kernel will not redistribute headroom for you. This matters when your workload is memory-bound.

3. Userspace tools

The powercap-utils package ships three binaries of interest:

  • powercap-info — pretty-print the hierarchy, enabled/disabled, current limits.
  • powercap-set — write a new limit to a specific constraint.
  • rapl-read — a minimal energy counter reader for benchmarks.
$ powercap-info -p intel-rapl -z 0 -c 0
Name: long_term
Power Limit (uW): 125000000
Time Window (us): 28000000
Enabled: 1

4. Setting budgets

Setting a cap is, on paper, one write:

echo 90000000 > /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw

In practice, three things will trip you up. First, constraints are pairs: a short-window and a long-window limit. Writing only one leaves the other active at its previous value, which is almost never what you meant. Second, the kernel silently clamps values above the TDP exposed in max_power_uw. Third, on some kernels the write appears to succeed but the counter never moves — that’s the enable bit, see § 5.

5. Common traps

Every power cap is a contract between three parties who don’t know each other exists: the kernel, the firmware, and the CPU. Any of them may veto.

… continued in the next section. (permalink)