What TOPS means — and what it leaves out TOPS — Tera Operations Per Second — is the headline metric that hardware manufacturers use to communicate AI processing capability. Apple quotes it for Neural Engine chips. Qualcomm quotes it for Snapdragon NPUs. Intel quotes it for Meteor Lake. The number is technically defined: it represents the theoretical maximum number of integer or floating-point operations the accelerator can perform per second at a specified precision (typically INT8). The problem is not that TOPS is wrong. It is that TOPS measures theoretical throughput at a single precision under ideal conditions — and tells you nothing about real-world AI performance because it ignores the three factors that actually determine how fast your workload runs: memory bandwidth, software stack overhead, and workload fit. Why TOPS fails as a performance predictor Factor TOPS ignores Why it matters Example impact Memory bandwidth Most AI workloads are memory-bound, not compute-bound — the accelerator stalls waiting for data Two chips with identical TOPS but 2× bandwidth difference can show 50–80% throughput gap on transformer inference Software stack efficiency Drivers, compilers, and framework support determine how much of theoretical TOPS is achievable A well-optimized stack on a lower-TOPS chip routinely outperforms a poorly-supported higher-TOPS chip Workload fit TOPS assumes dense operations at one precision; real models mix precisions, use sparse operations, and have irregular memory access patterns Advertised INT8 TOPS is irrelevant if your model runs in FP16 or requires BF16 for accuracy Thermal sustained performance TOPS reflects instantaneous peak, not sustained throughput under thermal constraints Mobile NPUs throttle within seconds of sustained load; sustained TOPS may be 40–60% of peak A chip rated at 45 TOPS (INT8) and a chip rated at 30 TOPS (INT8) can perform identically — or the lower-rated chip can win — on a real inference workload, depending on memory subsystem design, compiler maturity, and whether the workload’s operational profile matches the chip’s architecture. The metric manufacturers should report (but don’t) What actually predicts AI inference performance is not a single number — it is a profile: sustained throughput on a representative workload, at the precision the model actually uses, under thermal steady-state, with the production software stack. This is exactly what multi-dimensional GPU performance evaluation requires — and what no spec sheet provides. TOPS persists in marketing because it is simple, large (bigger number = better, intuitively), and incomparable across vendors without workload context — which means every vendor can claim leadership by choosing the precision and configuration that maximises their number. It is a marketing metric, not an engineering metric. Teams making hardware procurement decisions based primarily on TOPS comparisons are optimising for the wrong signal.