Last quarter, the cheapest option won
A procurement evaluation compared three GPU configurations for an inference workload. Configuration A had the highest throughput — and the highest price. Configuration B had moderate throughput at a moderate price. Configuration C had the lowest per-unit cost and reasonable throughput. The committee chose C. It scored well on “performance per dollar.”
Eighteen months and several operational surprises later, the team calculated total cost of ownership. Configuration C’s power draw was 40% higher than B’s per unit of sustained throughput. Its thermal characteristics in the target rack density required additional cooling investment. Maintenance costs were higher. Effective throughput under production conditions — at sustained load, with thermal settling — diverged from the evaluation benchmark by a wider margin than the other options.
Configuration B, the moderate option, would have delivered lower total cost over the deployment horizon. The procurement evaluation captured acquisition cost and peak throughput. It missed the rest.
Disclaimer: This article discusses frameworks for thinking about cost, efficiency, and value in AI infrastructure. It does not replace internal procurement policy, and nothing here constitutes legal, compliance, or financial advice. Infrastructure investment decisions should always follow your organization’s established financial evaluation and approval channels.
Three distinct dimensions, routinely conflated
When people say “cost-effective infrastructure,” they could mean three different things:
Cost: The direct financial expenditure — acquisition price, cloud instance price, power costs, cooling costs, floor space, maintenance contracts, staffing. Cost metrics answer: “how much money does this require?”
Efficiency: The ratio of useful output to resource consumed — throughput per GPU, tokens per watt, inferences per dollar-hour. Efficiency metrics answer: “how much work do we get per unit of resource?”
Value: The business outcome delivered per total investment — SLA achievement, time-to-model, competitive capability, risk reduction. Value metrics answer: “was the money well spent in terms of what the organization needed?”
These are not interchangeable. You can minimize cost (buy the cheapest hardware) and destroy efficiency (if it’s power-hungry and underperforms). You can maximize efficiency (buy the hardware with the best throughput-per-watt) and miss on value (if it can’t run the target workload at the required SLA). You can optimize for value (deploy infrastructure that perfectly serves the business need) and find it’s not the cheapest or the most efficient option.
Each dimension requires its own measurement, and each produces different rankings of the same hardware options.
Performance per dollar is context-dependent
“Performance per dollar” is the most commonly cited efficiency metric in hardware evaluation, and it’s among the most misleading when applied naïvely.
The numerator — “performance” — depends entirely on what’s measured. Peak throughput, sustained throughput, throughput at target latency, throughput at target precision — each produces a different number for the same hardware. A GPU with excellent peak throughput per dollar may have mediocre sustained throughput per dollar if it throttles heavily under continuous load.
The denominator — “dollar” — varies based on what costs are included. Acquisition cost only? Acquisition plus three years of power? Acquisition plus power plus cooling plus maintenance? Each scope produces different cost-per-performance rankings.
The interaction between numerator and denominator means that “performance per dollar” is not a metric — it’s a family of metrics, and the one that matters depends on the deployment duration, the cost structure, and the performance dimension that the workload demands.
As explored in how hardware evaluation should match deployment reality, the evaluation framework must reflect the actual operating conditions and cost structure. A metric that leaves out power costs for a deployment that will run for three years in a power-constrained data center isn’t measuring the right thing.
Power and operational costs matter over time
For short-term deployments or cloud-based burst capacity, acquisition cost dominates. For owned infrastructure running for 3-5 years, operational costs — primarily power and cooling — often exceed acquisition cost.
A GPU drawing 700W versus one drawing 400W produces 3,000 watts of difference across an 8-GPU node. Over three years of continuous operation at $0.10/kWh, that’s roughly $63,000 in power cost difference per node. In a 100-node cluster, the power cost differential exceeds $6 million — dwarfing any reasonable acquisition price difference between the two GPU options.
This arithmetic is straightforward, but it’s routinely excluded from benchmark-based evaluations because benchmarks measure throughput, not power efficiency. The result is hardware rankings that reflect one dimension of cost (compute throughput per acquisition dollar) while ignoring another dimension (operational cost per unit of sustained output) that may be larger over the deployment horizon.
Value emerges from sustained, usable performance
Performance that the organization can actually use is more valuable than performance that exists on paper.
A GPU that benchmarks at 1,500 tokens/second but requires software optimizations the team can’t deploy (because of framework compatibility, deployment constraints, or expertise gaps) delivers zero value from those 1,500 tokens. A GPU that benchmarks at 1,000 tokens/second and works with the team’s existing stack delivers 1,000 tokens/second of actual value.
Similarly, a system that achieves high throughput but can’t meet P99 latency requirements fails the value test, regardless of its efficiency metrics. A system that meets the SLA with moderate throughput and moderate efficiency delivers genuine value because it solves the business problem.
Value is harder to quantify than cost or efficiency because it depends on the organization’s specific requirements, constraints, and capabilities. It’s the dimension most likely to be omitted from benchmark-based evaluations because it doesn’t reduce to a single number. But it’s also the dimension that determines whether the infrastructure investment actually serves its purpose.
Aligning the metrics with the decision
The practical remedy is not to pick one dimension and optimize it in isolation. It’s to declare, before evaluation begins, which dimensions matter for this specific decision and how they’re weighted:
| Decision type | Primary metric | Secondary metric | What to watch for |
|---|---|---|---|
| Cost-constrained, flexible SLAs | Acquisition + operational cost per unit of sustained throughput | Efficiency floor (minimum acceptable throughput/watt) | Hidden operational costs — power, cooling, maintenance — that shift the ranking over the deployment horizon |
| Latency-critical production | P99 latency at target request rate, thermally settled | Cost ceiling (maximum acceptable $/request) | Throughput metrics that look good in benchmarks but mask tail-latency failures under production traffic patterns |
| Long-lived infrastructure investment | Total cost of ownership over deployment horizon (acquisition + power + cooling + maintenance + staffing) | Workload evolution headroom | Optimizing for today’s workload at the expense of flexibility for projected workload changes over 3-5 years |
Each framing produces a different evaluation methodology, a different set of metrics, and potentially a different hardware recommendation. The methodology makes the weighting explicit rather than leaving it implicit in the choice of benchmark.
This connects to the broader practice of using benchmarks as traceable evidence in institutional decisions. As explored in how benchmarks function in governance and risk management, the metrics included in an evaluation aren’t neutral — they encode assumptions about what matters. Making those assumptions visible is the difference between a defensible decision and one that merely looked good at the time.