Coming Soon · Methodology Preview

AI performance is a property of the system, not the chip.

Q: What is an AI Executor?

The AI Executor is the hardware-and-software pair that actually runs the workload — the GPU together with its driver, runtime, framework, and kernel implementations. Performance is a property of this pair, not of the silicon alone, and the same chip under a different stack can perform very differently.

Q: How does this differ from existing AI benchmarks?

Three differences at the method level. First, we publish sustained throughput, not peak. Second, every result is reported per precision, with no collapsed aggregate score. Third, every entrant is tuned under a declared, bounded optimisation budget, so two numbers compare on equal footing. The aim is procurement-grade comparability, not a leaderboard.

LynxBenchAI measures AI performance the way it actually has to be measured — as a property of the complete hardware-and-software stack, sustained under realistic load, reported per precision, with bounded optimisation. The methodology is published. Results are next.

Notify me at launch Tell us about your workload

Why this exists

Today's AI hardware benchmarks mislead in three predictable ways.

Each one looks reasonable in isolation. Together they explain why the chart-topping number a buyer evaluates and the sustained throughput their workload actually sees can differ by an order of magnitude.

Spec sheets describe theoretical limits, not delivered performance.

Peak numbers are rare and brief; production workloads don't run on bursts.

Unbounded, undeclared optimisation makes results uncomparable.

Why spec-sheet benchmarking fails for AI →

What LynxBenchAI measures

Four principles, applied uniformly to every entrant.

The methodology is engineered so that every published number means a specific, reproducible thing — and so that two numbers can be compared without reading footnotes.

The AI Executor is the unit of measurement

Methodology

Performance is a property of the hardware and the software together — driver, runtime, framework, kernels, all of it. The same chip under different stacks delivers materially different throughput. We measure the pair, not the silicon. Read the article →

Sustained, not peak

Methodology

We measure steady-state throughput under continuous, realistic load — the number the system can actually hold once thermal limits, memory bandwidth, and power budgets all assert themselves. Bursts are noted; deployments run on what's sustainable. Read the article →

Every result is precision-tagged

Methodology

FP8, FP16, BF16, and INT8 are different operating regimes — each with its own accuracy, throughput, and economic profile. We report each separately. There is no aggregate score that hides which regime won. Read the article →

Optimisation is bounded and declared

Methodology

Every entrant is tuned within the same effort budget, recorded in the manifest, reproducible by a third party. Unbounded tuning makes results incomparable; bounded tuning makes them useful. Read the article →

What launch will deliver

Benchmark results designed to survive procurement review.

Each result is shipped as a self-contained artefact — measurement, configuration, and constraints — so a procurement reviewer can verify it without calling the vendor.

Steady-state throughput and latency per AI Executor.

Per-precision tables for FP8, FP16, BF16, and INT8.

Reproducibility manifest: driver, runtime, framework, kernel, tuning.

Declared optimisation budget, applied uniformly to every entrant.

Decision-grade output: “should we deploy this?” — not a leaderboard rank.

Read the decision framework →

What LynxBenchAI will deliver — illustrative

Coverage at launch

The hardware, runtime, and precision regimes the first release is being engineered to cover. Inclusion is intent — not a certified result.

NVIDIA H100 / H200

NVIDIA Blackwell

AMD MI300

Intel Gaudi

PyTorch

TensorRT

CUDA

ONNX Runtime

Triton

FP8

FP16

BF16

INT8

Container deployment

Edge & data centre

Frequently Asked Questions

What is LynxBenchAI?

LynxBenchAI is a benchmarking methodology — and, at launch, a corresponding set of results — for AI hardware. It measures performance as a property of the complete hardware-and-software stack, sustained under realistic load, reported per precision, with bounded optimisation. The point is to inform real procurement and deployment decisions, not to crown a winner.

Why publish the methodology before any results?

Methodology has to be settled before any number lands — otherwise the number means whatever the reader assumes. Publishing the framing first means that, when results arrive, what they measure (and what they don't) is unambiguous. Why methodology defines what you can compare →

What is an "AI Executor"?

The AI Executor is the hardware-and-software pair that actually runs the workload — the GPU together with its driver, runtime, framework, and kernel implementations. Performance is a property of this pair, not of the silicon alone, and the same chip under a different stack can perform very differently. Read the article →

How does this differ from existing AI benchmarks?

Three differences at the method level. First, we publish sustained throughput, not peak. Second, every result is reported per precision, with no collapsed aggregate score. Third, every entrant is tuned under a declared, bounded optimisation budget, so two numbers compare on equal footing. The aim is procurement-grade comparability, not a leaderboard. Why benchmarks mislead procurement →

Who is LynxBenchAI for?

Buyers and platform engineers who need defensible evidence for AI hardware decisions, procurement reviewers who need a comparable basis between systems, and the technical leadership inside vendors who want their products evaluated on the load they were designed for, with the trade-offs visible. Read the decision framework →

When do results land?

A precise launch date has not been published. The methodology is complete; the next phase attaches empirical artefacts to it. Ask to be notified at launch →

Where to start

Three articles that lay out the premise, the central claim, and the procurement application of LynxBenchAI.

Why Spec-Sheet Benchmarking Fails for AI — How GPU Benchmarks Actually Work

Apr 14, 2026

GPU spec sheets describe theoretical limits. This article explains why real AI performance is an execution property shaped by workload, software, and sustained system behavior.

Performance Emerges from the Hardware × Software Stack

Apr 15, 2026

AI performance is an emergent property of hardware, software, and workload operating together. This article explains why outcomes cannot be attributed to hardware alone and why the stack is the true unit of performance.

How to Choose AI Hardware and GPU for AI Workloads: A Decision Framework

Apr 16, 2026

Hardware selection is a multivariate decision under uncertainty — not a score comparison. This framework walks through the steps: defining the decision, matching evaluation to deployment, measuring what predicts production, preserving tradeoffs, and building a repeatable process.

Browse all articles See all