How to Evaluate Whether a Generative AI Use Case Is Technically Feasible

A VP approves a generative AI budget. Six months later, the team has burned most of it on a use case that was never plausible with current models — not because the engineering was bad, but because nobody asked whether the task was feasible before they started building. The scope quietly assumed super-human capability, and the model never had a chance of delivering it.

This is the most expensive mistake in generative AI right now, and it is entirely preventable. The fix is not better engineering. It is a decision made before engineering starts: for each candidate use case, is this task something current models can actually do, or is the scope asking for capability that does not exist yet? Answer that honestly, per use case, and most of the wasted spend disappears.

The decision framework here operates per use case, not per organisation. That distinction matters, and we return to it. Whether your organisation is ready to run an AI project at all is a separate, earlier question — covered in how to assess enterprise AI readiness before starting a project. Readiness gates whether you start. Feasibility gates which use cases inside that project are worth pursuing.

What Does Technical Feasibility Actually Mean for a Generative AI Use Case?

Feasibility is not “can a large language model produce output that looks plausible.” Almost any task produces plausible-looking output. Feasibility is whether the model can produce output that is correct and reliable enough for the decision that depends on it, given the data you have and the error tolerance the use case allows.

Three things determine that:

Capability headroom. Is the task within the demonstrated competence of current models, or does it require reasoning, factual precision, or domain judgment that no model reliably delivers today? A model that drafts marketing copy is operating well inside its competence. A model asked to give legally binding tax advice with zero error is not.
Data readiness. Does the data needed to ground, retrieve against, or fine-tune for the task exist, and is it clean, accessible, and representative? Most stalled projects in our experience stall on data, not on models — a pattern we see repeatedly across engagements rather than a benchmarked rate.
Error tolerance. What happens when the model is wrong? A wrong product recommendation costs a click. A wrong dosage instruction costs a life. The same underlying capability can be feasible in one context and off-limits in another purely because of what failure does.

A use case clears the feasibility bar only when all three line up: the task sits inside current capability, the data exists, and the consequences of being wrong are survivable.

The Three-Bucket Classification

The core of the assessment is a classification. Each candidate use case lands in exactly one of three buckets, and the bucket dictates what you do next.

Bucket	What it means	Decision	What you fund
Automatable	Task is inside current model capability, data exists, error tolerance is met	Proceed	A scoped build with an ROI estimate for the automatable portion
Speculative	Task requires capability beyond what current AI reliably delivers	Do not proceed without a research phase	Either nothing, or a deliberate research investment with explicit exit criteria
Research	Task might be feasible but the evidence does not yet exist	Proceed with bounded scope	A time-boxed investigation that answers the feasibility question cheaply before any full commitment

The mistake that destroys budgets is treating speculative use cases as automatable — funding a full build for something that needed a research phase first. The buyer who approved that scope is accountable for the spend, which is exactly why the classification needs to happen before the cheque is signed, not after the prototype disappoints.

Be honest about the second bucket especially. “Speculative” is not an insult; some of the highest-value work lives there. It is a statement that you are funding research, with the uncertainty and the exit criteria that research demands — not a product build dressed up as one. This is the same boundary that separates research questions from engineering work in why generative AI projects fail, where mis-bucketing shows up as a recurring failure pattern.

How Do You Assess Data Readiness Before Committing to a Build?

Data readiness is where feasibility assessments earn their keep, because it is the dimension teams most often skip. A use case can be perfectly inside model capability and still be infeasible because the data to ground it does not exist in usable form.

Walk through these before you commit:

Existence. Does the data the task depends on actually exist somewhere you can reach? Tribal knowledge in people’s heads is not data.
Access. Can you legally and technically get to it? Contracts, PII rules, and system silos kill more projects than model limits do.
Quality. Is it clean, labelled where labels are needed, and free of the kind of inconsistency that a retrieval-augmented generation pipeline will faithfully reproduce as wrong answers?
Representativeness. Does it cover the cases the model will actually face in production, or only the easy ones someone happened to record?

When data is the blocker, the use case is usually a research bucket case, not an automatable one — the build cannot start until a bounded data-readiness phase establishes whether the foundation can be assembled at acceptable cost. Pretending otherwise is how a six-week prototype becomes a six-month data-cleanup project nobody scoped.

What Measurable Outcomes Should You Define Before Development Starts?

A feasibility assessment is only defensible if it names what success looks like in numbers, decided up front. Define these before development, not after the demo:

The target metric the use case must move (resolution rate, draft acceptance rate, time-to-first-response) and its current baseline.
The threshold at which the use case is worth deploying — the point below which it does not justify its own running cost.
The error budget: how often the model can be wrong, and what the fallback is when it is.
The go/no-go criteria that the research or pilot phase must satisfy before any further commitment.

Naming these before you start is what makes the assessment a defensible artifact later. If the project proceeds, the metrics justify the spend. If it does not, the same document shows you killed it on evidence rather than on a hunch. This discipline is the bridge between a feasibility decision and the work of moving a generative AI prototype into production, where these same thresholds become the production acceptance gates.

How Does Hallucination Change the Classification of a Customer-Facing Use Case?

Hallucination is not a bug you engineer away; it is a property of how generative models work, and it directly moves use cases between buckets. A model that confidently invents a fact is doing exactly what its training optimised it to do — produce fluent, plausible text — with no internal signal that the fact is false.

For internal drafting, hallucination is tolerable because a human reviews every output before it matters. For a customer-facing assistant that quotes policy, prices, or eligibility with no human in the loop, the same hallucination rate that was a nuisance becomes a liability. The capability is identical; what changed is the error tolerance.

So the practical rule: a customer-facing use case is automatable only when you can either keep a human in the loop, constrain the model to retrieve from a verified source and refuse otherwise, or absorb the cost of the wrong answers it will produce. If none of those hold, the use case is speculative until grounding and guardrails are proven — which is a research-phase question, not a build-phase one. The trade-offs of customer service automation, including where the model genuinely earns its keep, are explored in our work on the power of generative AI in customer service. For deployments that need formal sign-off, the grounding and refusal behaviour also has to survive a generative-AI model-risk review.

Per-Use-Case Feasibility Versus Organisational Readiness

These two assessments are easy to confuse and must not be merged. They answer different questions and they run in sequence.

Organisational readiness asks: is this company set up to run an AI project at all — data infrastructure, governance, skills, sponsorship? That is the gate on whether you should start, and it is the subject of how to assess enterprise AI readiness before starting a project.

Per-use-case feasibility — the framework on this page — asks: given that you are going to run a project, which specific use cases inside it can current models actually deliver? It is the filter you apply after readiness is established, candidate by candidate.

Run them out of order and you get one of two failures: a feasible use case stalled because the organisation could never support it, or a ready organisation pouring its budget into a use case that was speculative all along. Readiness first, then feasibility per use case. The whole point of generative AI is choosing the right targets, and knowing which model classes even apply feeds directly into the capability-headroom judgment.

You can see how the broader generative AI practice connects these pieces — capability, data, and consequence — into a single decision rather than three disconnected opinions.

FAQ

How do I judge whether a specific generative AI use case is technically feasible with current models?

Check three dimensions together: capability headroom (is the task inside what current models reliably do?), data readiness (does usable, accessible, representative data exist?), and error tolerance (what happens when the model is wrong?). A use case is feasible only when all three line up. Plausible-looking output is not the bar; correct and reliable enough for the dependent decision is.

What does a structured GenAI feasibility assessment look like, and what does it answer?

It classifies each candidate use case as automatable, speculative, or research; estimates ROI for the automatable portion; assesses data readiness; and names explicit go/no-go criteria before development starts. Its value is being a defensible artifact: if the project proceeds, it justifies the spend, and if it does not, it shows the decision was made on evidence.

Which use cases should we classify as automatable, speculative, or research — and why?

Automatable use cases sit inside current model capability with data present and survivable error tolerance — proceed with a scoped build. Speculative use cases require capability beyond what current AI reliably delivers — do not proceed without a deliberate research phase. Research use cases might be feasible but lack evidence — proceed only with a time-boxed investigation that answers the question cheaply before full commitment.

How do I assess data readiness before committing to a GenAI build?

Work through existence, access, quality, and representativeness: does the needed data exist somewhere reachable, can you legally and technically get to it, is it clean and labelled where required, and does it cover the cases production will actually face? When data is the blocker, the use case is usually a research-bucket case requiring a bounded data-readiness phase before any build starts.

What measurable outcomes should we define before development starts so the spend is defensible later?

Define the target metric and its baseline, the deployment threshold below which the use case is not worth running, the error budget with its fallback, and the go/no-go criteria for any research or pilot phase. Naming these up front is what makes the assessment defensible — proceeding is justified by the metrics, and stopping is justified by the same document.

How does per-use-case feasibility relate to (and depend on) organisational AI readiness?

They are sequenced. Organisational readiness asks whether the company can run an AI project at all and gates whether you start; per-use-case feasibility asks which specific use cases current models can deliver and runs once readiness is established. Out of order, you either stall a feasible use case the organisation can’t support, or fund a speculative one inside a ready organisation.

When a use case falls into the speculative or research bucket, what does a bounded research phase look like before we commit to a full build?

A bounded research phase is time-boxed, has explicit exit criteria, and is funded as research rather than as a product build. It exists to answer the feasibility question cheaply — does the capability or data foundation actually hold? — before any full commitment, so the uncertainty is resolved on a small budget instead of a large one.

How does generative AI hallucination affect whether a customer-facing use case should be classified as automatable, speculative, or off-limits?

Hallucination is a property of how generative models work, not a removable bug, so it directly moves use cases between buckets via error tolerance. A customer-facing use case is automatable only when you keep a human in the loop, constrain the model to verified retrieval with a refusal fallback, or can absorb the cost of wrong answers; otherwise it is speculative until grounding and guardrails are proven.

The decision that protects a generative AI budget is not made in the build phase. It is made once, per use case, before anyone writes code — and the buyer who skips it inherits the wasted spend. Name the bucket, name the data gaps, name the numbers that define success, and the projects that should never have started never do.