Compute capability is not a CUDA version CUDA compute capability is a hardware property of each NVIDIA GPU — a version number (e.g., 7.0, 8.0, 8.9, 9.0) that defines which features, instructions, and precision formats the GPU silicon supports. It is distinct from the CUDA toolkit version (the software SDK). A common confusion: developers see “CUDA 12.x” installed and assume all features are available. In reality, CUDA compute capability determines which tensor core operations and precision formats a GPU supports — the software version only determines which APIs are callable. What each compute capability enables Compute Capability Architecture Key AI features 7.0 Volta (V100) First-gen tensor cores (FP16), mixed-precision training 7.5 Turing (RTX 20xx, T4) INT8 tensor cores, RT cores 8.0 Ampere (A100) TF32 tensor cores, BF16, sparsity acceleration (2:4), 3rd-gen NVLink 8.6 Ampere (RTX 30xx) Same tensor core features as 8.0, lower memory bandwidth 8.9 Ada Lovelace (RTX 40xx, L40) FP8 tensor cores, 4th-gen tensor cores 9.0 Hopper (H100) FP8 tensor cores, Transformer Engine, dynamic programming The practical impact: code compiled for FP8 precision will not run on a compute capability 8.0 GPU — the hardware instruction set simply doesn’t include FP8 operations. BF16 training requires compute capability 8.0+. Sparsity-accelerated inference (2:4 structured sparsity) requires 8.0+. How to check your GPU’s compute capability nvidia-smi --query-gpu=compute_cap --format=csv Or programmatically: torch.cuda.get_device_capability() returns a tuple like (8, 0). Why this matters for AI workload deployment When choosing between CUDA, OpenCL, and SYCL, the compute capability of your target GPUs determines which optimisation paths are available. A model quantised to FP8 for maximum inference throughput cannot deploy on compute capability 8.0 hardware — you need 8.9+ (Ada Lovelace) or 9.0 (Hopper). Teams deploying across mixed GPU fleets must compile multiple execution paths or accept the lowest common denominator. For inference serving at scale, compute capability is the first filter in hardware selection — before price, before availability, before memory capacity. If your workload requires BF16 tensor core operations, any GPU below compute capability 8.0 is functionally incompatible regardless of its CUDA core count.