“Integrated” doesn’t mean “self-contained” A system-on-a-chip integrates compute, memory controllers, and accelerator blocks — neural processing units, GPU blocks, sometimes specialized ASIC blocks for vision or audio — onto a single die. The integration is real and substantial: data does not have to traverse a board-level interconnect to move between the CPU and the AI accelerator block, which removes a class of bottlenecks that constrain discrete-accelerator systems. It is reasonable to expect that this integration changes the performance reasoning that applies to AI workloads on the device. What it does not do is collapse the AI Executor into the silicon. The software stack — drivers, runtime, and framework support for the SoC’s specific accelerator block — is still the half of the executor without which the hardware does nothing. If anything, SoC integration makes the software stack more, not less, decisive: because the accelerator block is vendor-specific and physically tied to that SoC, the software stack that targets it is the only software stack that can extract its performance. What changes and what doesn’t with SoC integration Performance dimension Discrete accelerator System-on-a-chip CPU ↔ accelerator data movement Crosses board-level interconnect; latency-bound On-die; substantially lower latency Memory bandwidth to accelerator Dedicated memory subsystem Shared with CPU; bandwidth contention possible Software stack maturity Mature ecosystems for major vendors Per-SoC; varies widely with vendor investment Driver/runtime portability Standardized by major frameworks Per-SoC; rarely portable across SoCs Effect of stack version on result Significant Often dominant The bottom three rows are the rows that matter for evaluation. SoC integration changes the physical bottlenecks; it does not change the principle that performance emerges from the hardware-and-software stack. Why does the software stack matter more on an SoC than on a discrete accelerator? For a discrete accelerator from a major vendor — an NVIDIA GPU, an AMD GPU, an Intel accelerator — the software stack is mature, broadly portable across host platforms, and well-supported by the major frameworks. A team evaluating that hardware can reasonably assume that the software side of the AI Executor is approximately constant across deployments and that performance differences they measure reflect hardware differences with limited stack noise. For an SoC’s AI accelerator block, this assumption breaks. The drivers and runtime that target the block are vendor-specific to that SoC. Framework support — PyTorch, TensorFlow, ONNX Runtime, llama.cpp — varies by SoC and by SoC generation. Two devices built around the same SoC silicon, running the same model, can exhibit substantially different observed performance because the vendor-supplied software stacks differ in maturity for that block, the framework integration is at a different version on each device, or the accelerator-block kernels have been optimized for one version of a framework but not another. The implication is concrete: a benchmark report for an SoC that omits the software-stack version is reporting a number whose generalization to a different deployment cannot be assessed. The hardware identity is constant; the executor is not. What this means for evaluating an SoC for AI workloads Evaluating an SoC for an AI workload is a stack-disclosure exercise more than a hardware exercise. The dimensions that have to be captured before any benchmark result is interpretable include: The exact SoC model and silicon revision. The vendor SDK version that targets the AI accelerator block. The driver and runtime versions (often vendor-specific to the SoC). The framework version and the framework’s SoC backend version (e.g. PyTorch + a vendor-supplied execution provider). Whether the workload runs on the AI accelerator block, the integrated GPU block, or falls back to the CPU. The precision configuration the accelerator block supports for the workload. A benchmark that captures these dimensions produces a result that another team can reproduce or at least interpret. A benchmark that omits any of them is reporting a number whose validity is bounded to the original measurement environment. The point is not that SoC benchmarking is harder than discrete-accelerator benchmarking — it is that the methodological discipline that applies to both becomes operationally visible on SoCs because the stack variance is large enough to dominate naïve comparisons. SoC vs discrete: a worked comparison frame A team comparing an SoC-based deployment with a discrete-accelerator alternative is not comparing two pieces of silicon. It is comparing two AI Executors: (SoC, vendor SDK, framework backend, precision) versus (discrete accelerator, CUDA/ROCm/oneAPI stack, framework, precision). The two executors differ on the hardware axis — but they also differ on multiple software axes, on memory architecture (shared vs dedicated), and on the workload-handling pattern (continuous on-die data movement vs board-level transfer). A benchmark that holds the workload constant and varies only one of these axes is informative about that axis. A benchmark that varies all of them and reports a single comparison number is informative about the combined trade-off and uninformative about which axis drove the result. Both kinds of benchmark have their uses; conflating them is the methodological error. The general principle that performance emerges from the hardware × software stack applies in concentrated form to SoCs: the on-die integration changes the physical bottlenecks but not the principle that the executor is the unit of measurement. The framing that helps A system-on-a-chip is not a hardware-only object. It is an integrated AI Executor whose software stack is per-SoC, often vendor-specific, and frequently the dominant source of measured performance variance across devices built around the same silicon. SoC evaluation that omits the software stack is reporting a hardware identity and calling it a benchmark. LynxBench AI treats the SoC’s vendor SDK, runtime, framework backend, and precision configuration as part of the AI Executor specification — alongside the silicon — because the on-die integration removes some bottlenecks but not the methodological requirement that the full stack be disclosed for the benchmark result to transfer.