Precision is not a free model-design parameter A model architect writing a deployment plan picks the precision regime — FP16, BF16, FP8, INT8 — as if it were a configuration switch the runtime supports uniformly across hardware. The runtime does support the precision; on hardware that does not natively accelerate it, the support is by emulation, and emulation runs at a performance cost large enough to negate the reason the precision was chosen in the first place. The precision regime that delivers its expected throughput is the regime the target accelerator generation actually accelerates in hardware. The regime the target generation only emulates is, for performance purposes, a regime the target hardware does not support. This makes precision a hardware-conditional design decision, not a free model-design parameter. The decision and the hardware decision interact, and choosing one without the other locks in implications the chooser may not have intended. What does “supported” mean at the hardware level? Modern AI accelerators have specialized matrix-multiply engines (tensor cores on NVIDIA, equivalent matrix engines on other vendors) that natively execute specific precision formats. The set of natively-supported precisions differs by accelerator generation and is the practical determinant of which precisions the deployment can use at peak throughput. Three categories of “support” matter: Native acceleration. The matrix engine has dedicated paths for the precision. Throughput at this precision approaches the device’s design-target peak for that format, and the precision is the operationally usable one for high-throughput workloads. Software emulation. The precision is supported by the runtime via composition of operations on a different native precision (e.g. emulating FP16 by sequences of FP32 operations on a device that lacks FP16 tensor cores). Functionally correct; performance-wise, often slower than just running the workload natively at the supported precision in the first place. Unsupported. The runtime does not implement the precision at all on the target hardware. The workload either falls back to a different precision automatically (with the framework’s mixed-precision logic making the decision) or fails. A precision regime that delivers its expected speedup on one accelerator generation can be silently emulated on another, producing throughput that is worse than running the workload at a higher precision the older hardware does support natively. The “FP8 is 2× faster than BF16” statement is a property of accelerators that natively accelerate FP8; on accelerators that emulate it, the same statement can be false. Generation-conditional precision support The precision support landscape across accelerator generations is uneven and historically additive — newer generations add formats; older generations don’t gain them retroactively. A simplified picture: Format Native acceleration first appeared in Notes FP32 All generations Universally supported FP16 tensor cores Volta (compute capability 7.0) Mixed-precision standard for several generations INT8 tensor cores Turing (compute capability 7.5) Strong inference support BF16 tensor cores Ampere (compute capability 8.0) Wide dynamic range; preferred for training TF32 Ampere (compute capability 8.0) Reduced-precision FP32 training format FP8 tensor cores Ada Lovelace (8.9) and Hopper (9.0) E4M3 and E5M2 variants FP4 tensor cores Recent generations only Aggressive inference quantization Equivalent capability tables exist for other vendors’ architectures with different generation boundaries and different specific format support. The pattern that recurs across vendors is the same: precision support is generation-conditional, and “the hardware supports X” is a question that has to be answered per-generation, not per-vendor. The procurement consequence is that hardware choice and precision-regime choice are coupled. A deployment built on FP8 cannot run on hardware older than the FP8-introducing generation without emulating, which means the procurement decision to buy older hardware retires the FP8 deployment option for that fleet. A deployment built on FP16 + mixed precision can run on most modern hardware, which means a precision-regime choice that constrains the deployment to FP8 also constrains the procurement choice to FP8-supporting hardware. Why this couples precision and procurement decisions The standard mental model treats precision and hardware as independent choices: pick the hardware first, then pick the precision regime that runs on it. The mental model is wrong in both directions: Picking precision first locks the procurement window. A deployment that requires native FP8 acceleration to meet its throughput target cannot be run on accelerators older than the FP8-introducing generation. The procurement candidate set is therefore constrained by the precision choice. Picking hardware first locks the precision option set. A deployment running on accelerators that do not natively accelerate a given low-precision format cannot adopt that format later without buying new hardware. The precision-regime evolution is therefore constrained by the hardware choice. The two decisions are not independent; they are a joint decision that has to be made together. The framing that produces durable infrastructure choice is to enumerate the precision regimes the deployment will need over the planning horizon and the hardware generations that natively accelerate them, and to pick from the intersection. Picking from one set without considering the other produces deployments where one of the two becomes the constraint that closes off the other. A benchmark methodology that supports this joint decision must report the precision regimes the candidate hardware natively accelerates and the throughput at each. A benchmark that reports a single throughput number without the precision regime is reporting on an unspecified part of the joint decision, and a procurement decision built on that benchmark is locking in implications the benchmark did not characterize. What a precision-by-hardware matrix looks like in a benchmark The reporting form that supports the joint decision is a matrix: precision regimes on one axis, candidate accelerators on the other, throughput (and accuracy) at each cell. The matrix exposes: Which precisions each accelerator natively accelerates. Where emulation is happening (cells where throughput is far below the format’s expected peak). Where the precision option is unavailable (cells with no entry). The trade-off space across the (precision, hardware) joint decision rather than along either axis alone. A benchmark that produces a row (single precision across hardware) supports a hardware-only comparison. A benchmark that produces a column (single hardware across precisions) supports a precision-only investigation. A benchmark that produces a matrix supports the joint decision the procurement actually faces. Precision constrained by hardware architecture makes the broader case; the operational expression here is that precision is constrained by what the hardware natively accelerates, and the set of viable precision regimes is therefore an artifact of the hardware-architecture choice — making precision and hardware decisions a single joint decision rather than two independent ones. The framing that helps Hardware precision support is generation-conditional; native acceleration delivers expected throughput, while emulation does not. Precision regime and hardware choice are coupled — picking either first locks implications for the other. Procurement and architecture decisions about AI deployments must therefore be made jointly, against the precision-by-hardware matrix the candidate set actually presents, not against a single throughput number that hides which precision regime produced it. LynxBench AI is structured around performance-per-precision-per-AI-Executor as required disclosure — the matrix form that supports the joint precision-and-hardware decision — because the precision regimes the hardware natively accelerates are the ones the deployment can actually use. The question to ask of any hardware-evaluation matrix is whether it surfaces that precision-vs-hardware distinction, or collapses it into a single number that cannot inform the joint decision the procurement is making?