“Same GPU” is not the equivalence class people think it is Two physical GPUs of the same model run the same benchmark. The numbers come back different. The instinct is to look for a fault — defective unit, bad thermal paste, suspicious silicon. Usually there’s no fault. The model number on the box is a hardware identity; it is not a performance contract. The performance the workload achieves is a property of the AI Executor — accelerator plus driver plus runtime plus framework plus precision plus host plus thermal envelope — and “same model number” holds constant only the first item in that list. Treating the model number as if it were a performance contract leads to two predictable failures: chasing phantom hardware faults that aren’t there, and reading benchmark differences as more meaningful than they are. What changes when the “same GPU” sits in two different hosts? The hardware identity holds. Almost everything else can shift: Axis Why it changes per host Driver version Different host install dates, different distro update cadence CUDA / runtime version Framework wheels vendor different toolkits; system installs differ Framework version + build Different wheel sources, different dependency resolutions Kernel libraries (cuDNN, cuBLAS) Vendored per framework wheel, system install can shadow OS kernel version Different distros, different update windows PCIe topology Slot generation, lane width, switch chip presence on motherboard CPU and host memory Affects host-side preprocessing, dataloader throughput Cooling configuration Server form factor, fan curves, ambient temperature Power-cap policy Vendor power caps configurable per host Co-tenant load Other workloads on the same host competing for memory bandwidth, network, storage Workload shape / batch / precision Operator-controlled, not always held constant in casual comparisons Any of these can shift the observed performance. Several typically do, and the effects compose. A benchmark difference between two hosts running the same GPU model is the natural consequence of holding only the silicon constant while letting the rest of the executor vary. The methodological consequence If “same GPU” is not a useful equivalence class for performance comparison, then benchmark reports must record the equivalence class that actually is useful — the AI Executor — and any comparison must hold that broader class constant. The minimum disclosure surface for an AI accelerator benchmark to be comparable to another report on the same hardware: Accelerator model and unit (where unit-to-unit variance is being investigated). Driver version. CUDA / runtime version (and source — system install vs framework-vendored). Framework version and wheel source. Kernel library versions (cuDNN, cuBLAS, etc.). OS and kernel version. Host platform (CPU, memory, PCIe topology relevant to data movement). Cooling and ambient conditions. Power-cap setting. Co-tenant load policy during measurement. Workload, precision regime, batch and concurrency configuration. Whether warm-up was excluded; the measurement window length. A report that names these can be compared meaningfully to another report that names them. A report that names only the GPU model and a throughput number is reporting on an unspecified executor, and “same GPU” between that report and any other is not a comparison the reader can perform. Why this is not an edge case The unit-to-unit variance from the silicon itself is typically small for modern AI accelerators — manufacturing tolerances are tight enough that two units of the same model produce the same throughput when placed in the same executor configuration. The variance from the executor configuration is typically larger — by enough that it dominates any silicon-side variance for almost any cross-host comparison. The pattern that this produces in practice: A team buys two of the same accelerator. Benchmark scores differ. The team investigates the silicon. They find no fault, and the difference persists. The actual cause is that the two hosts have slightly different driver versions or were thermally pre-conditioned differently before the test. The investigation is in the wrong layer. A team upgrades a driver across a fleet. Benchmark scores shift. The team attributes it to “the new driver.” The actual cause is the new driver’s interaction with the framework’s vendored libraries, which is a property of the executor configuration, not of the driver alone. The attribution is incomplete. A vendor publishes a benchmark on a specific stack. A buyer reproduces the test on their stack and gets a different number. The buyer suspects vendor inflation. The actual cause is that the buyer’s executor configuration differs from the vendor’s, and the benchmark is consistent within each configuration. The interpretation is misframed. In each case, the “same GPU” equivalence class hid the variable that actually mattered. The framing that helps The model number is a hardware identity, not a performance contract. Performance is a property of the AI Executor — silicon plus driver plus runtime plus framework plus precision plus host plus thermal envelope — and “same model number” holds only the first item constant. Benchmark differences between two same-model GPUs are the expected consequence of executor variance, not a sign of hardware fault. Comparing benchmarks across hosts requires the executor configuration to be disclosed and held constant, which is a different (and stricter) requirement than matching model numbers. Building on why identical GPUs perform differently, the operational expression is that identical hardware is the necessary, not sufficient, condition for identical performance — the executor configuration is the sufficient condition the benchmark methodology has to enforce. LynxBench AI treats the AI Executor (silicon + driver + runtime + framework + precision + host + thermal regime) as the unit of measurement, because the model number is an identity property of one component, and benchmark comparability requires the full executor configuration to be the unit of equivalence.