How to Tell Whether an AI Problem Is an Engineering Task or a Research Question

A project plan with milestones, a deadline, and a fixed deliverable is the right shape for an engineering task. It is the wrong shape for a research question — and the mismatch is one of the most expensive scoping errors in applied AI. When a team commits a production budget and a quarterly timeline to a problem that actually needs investigation before anyone can promise an outcome, the project does not fail loudly. It drifts. The milestones slip, the deliverable keeps mutating, and six months later there is spend but no shippable result and no clear reason why.

The widely quoted figures on AI project failure — the MIT and Gartner estimates that land somewhere around the “most AI initiatives don’t reach production” range (published-survey; analyst estimates, methodology and definitions vary by report) — get attributed to data quality, talent, or executive buy-in. Those matter. But a large share of the wreckage is something simpler and more fixable: a research question that was scoped, staffed, and budgeted as if it were an engineering task. Nobody asked the boundary question first.

What Makes a Problem an Engineering Task vs a Research Question

The distinction is not about difficulty. Engineering tasks can be brutally hard, and research questions can have tidy answers. The distinction is about whether the path to the answer is known before you start.

An engineering task has a known method. You may not have built this exact system, but the approach is established, the failure modes are understood, and the uncertainty is bounded — you know roughly how long it takes and what “done” looks like. A research question has open novelty: there is no reliable baseline, the data quality is unbounded or unknown, and you cannot honestly commit to an outcome because the outcome is the thing under investigation.

Here is the cleanest test we apply when scoping. Ask: if this works, will we have been surprised that it worked? If the honest answer is no — it was always going to work, the only question was effort and integration — it’s engineering. If the honest answer is “we genuinely didn’t know,” it’s research, and it needs a different contract.

Which Signals Classify a Problem as Engineering?

A problem sits on the engineering side of the line when most of the following hold:

Signal	Engineering indicator
Method	A known, published, or previously deployed approach exists for this class of problem
Baseline	A reliable performance baseline is achievable and you can state the target
Data	Data quality and availability are predictable; the schema is understood
Uncertainty	Bounded — the unknowns are about integration, scale, and effort, not feasibility
Evaluation	You can define “done” before starting and measure progress against it
Comparable systems	Similar systems exist in production somewhere

Object detection on a well-lit assembly line with a labelled dataset is engineering. The architecture choices are made, the metrics are standard, and the work is data collection, training, integration, and tuning. Hard, but bounded.

Which Signals Classify a Problem as Research?

A problem sits on the research side when these appear:

Signal	Research indicator
Method	No established approach; the method itself is the open question
Baseline	No reliable baseline; you cannot yet say what good performance even means
Data	Data quality is unbounded or unknown; the signal may not be present at all
Uncertainty	Open — you do not know whether the problem is solvable at the required level
Evaluation	“Done” cannot be defined in advance because the target depends on findings
Comparable systems	No comparable production systems exist

Detecting a defect class that has no visible signature, from sensor data nobody has confirmed contains the signal, with an accuracy requirement set by a regulator — that is research wearing an engineering job title. The honest first deliverable is not a model. It is an answer to can this be done at all.

Why Scope, Schedule, and Budget Are Framed Differently for Research

The reason this matters operationally is that the two demand different contracts.

An engineering task is framed with milestones and a deliverable. You agree on what “done” means up front, set a schedule, and track against it. If it slips, you know why, because the plan named the steps.

A research question must be framed as a bounded investigation with an explicit termination criterion. The deliverable is not a working system — it is a decision: is this feasible, and if so, what does the engineering project that follows look like? The budget buys an answer, not an outcome. The timeline is a spending cap, not a delivery date. And critically, the contract names the conditions under which you stop and conclude “not feasible at the required level” — because a research question without a kill condition is an open-ended drain.

This is why scoping the boundary early protects the buyer. When the investigation phase is separated from the build phase, an unsolvable problem costs a bounded research budget, not an open-ended production budget. We see this pattern across the engagements we scope: the projects that burn budget without producing a result almost always skipped the question of whether the hard part was knowable in advance. This is the same failure pattern that drives most enterprise AI project failure at the root — the root cause is rarely the model and frequently the framing.

A Worked Diagnostic: Scoring a Problem Before You Commit

Score the problem against these six axes. Each axis scores 0 (clearly research), 1 (ambiguous), or 2 (clearly engineering). Assumptions: you answer honestly, and “we’ll figure it out” counts as 0, not 1.

Known method — Does a published or deployed approach exist for this exact class? (0/1/2)
Reliable baseline — Can you state a measurable target today? (0/1/2)
Predictable data — Do you know the data exists, is labelled or labellable, and carries the signal? (0/1/2)
Bounded uncertainty — Are the unknowns about effort, not feasibility? (0/1/2)
Definable done — Can you write the acceptance criteria before starting? (0/1/2)
Comparable production systems — Does a similar system run somewhere today? (0/1/2)

Reading the score:

10–12 — Engineering. Scope it with milestones and a deliverable.
5–9 — Mixed. Decompose it: the A5 Risk Assessment exists partly to split a project into its engineering parts (deliverable) and its research parts (investigate first). Carve out the research, time-box it, then engineer the rest.
0–4 — Research. Do not commit a production budget. Fund a bounded investigation with a termination criterion first.

The middle band is where most real projects land, and it is the most dangerous because it looks like engineering. A proof of concept is the right instrument here — but only if the proof of concept actually proves the risky thing rather than rehearsing the parts you already knew would work.

How This Relates to Per-Use-Case GenAI Feasibility

The engineering-vs-research boundary is a general form of feasibility assessment, and generative AI makes it acute. A GenAI use case can look like a prompt-and-integrate engineering task and turn out to hinge on whether the model can reliably do something nobody has confirmed it can do at the required accuracy. The novelty hides inside what feels like configuration.

That is why the per-use-case lens matters. The question “can this specific use case work with current models on our data” is a feasibility question, and answering it whether a generative AI use case is technically feasible is itself a small bounded investigation — research framing, not build framing. Run it before the production plan, not inside it. Readiness work belongs here too: assessing enterprise AI readiness before a project starts surfaces the data and baseline gaps that decide which side of the line you are on.

How the Failure-Rate Statistics Are Misread

The headline statistics — the Gartner and MIT-style numbers on AI projects not reaching production (published-survey; figures and definitions differ across reports and years) — are usually read as a verdict on AI itself. That reading is wrong, and it leads teams to either avoid AI or throw more money at the same misframed projects.

What the failure data actually describes, in our reading, is a portfolio that mixed two kinds of work under one label. The engineering tasks that were genuinely engineering tended to ship. The “engineering” tasks that were really unbounded research consumed budget and produced no deliverable, because there was never a deliverable to produce — only a question that needed an answer. Counting both as “failed AI projects” obscures the actual lesson: the failure was a scoping decision made before any code was written. The fix is not better models. It is asking the boundary question first, and contracting accordingly. This is the kind of distinction a structured engagement is built to surface early — see how a structured AI consulting engagement works from scoping to delivery, and our broader services for where this assessment fits.

FAQ

How do I tell whether an AI problem is an engineering task or a research question?

Ask whether the path to the answer is known before you start. If the method is established, a reliable baseline is achievable, and the uncertainty is bounded to effort and integration, it is engineering. If there is no known method, no reliable baseline, and you genuinely don’t know whether the problem is solvable at the required level, it is research — and it needs a different kind of contract.

Which signals classify a problem as engineering?

A known or previously deployed method, a measurable baseline you can state up front, predictable data quality, bounded uncertainty (unknowns about effort rather than feasibility), a definable notion of “done,” and comparable systems already running in production. When most of these hold, the work can be scoped with milestones and a deliverable.

Which signals classify a problem as research?

No established method, no reliable baseline, unbounded or unknown data quality, open uncertainty about whether the problem is solvable, an undefinable “done,” and no comparable production systems. When these appear, the honest first deliverable is a feasibility answer — not a working system.

How is project scope, schedule, and budget framed differently when the work is research?

Engineering tasks get milestones, a fixed deliverable, and a delivery date. Research questions get a bounded investigation with an explicit termination criterion: the budget buys an answer rather than an outcome, the timeline is a spending cap rather than a delivery date, and the contract names the conditions under which you stop and conclude the problem is not feasible at the required level.

Why do projects framed as engineering when they were actually research consume budget without producing outcomes?

Because there was never a deliverable to produce — only a question that needed an answer. With no kill condition and a production-shaped plan, milestones slip, the deliverable keeps mutating, and spend accumulates against an outcome that may not be achievable. The investigation that should have come first never happened, so nothing bounds the loss.

How does the engineering-vs-research distinction relate to per-use-case GenAI feasibility?

The boundary question is a general form of feasibility assessment, and generative AI makes it acute because novelty hides inside what looks like configuration. Asking whether a specific GenAI use case can work with current models on your data is itself a small bounded investigation — research framing, not build framing — and it should run before the production plan, not inside it.

What does the MIT/Gartner data on AI project failure rates actually attribute the failures to?

The headline figures are usually read as a verdict on AI, but in our reading they describe a portfolio that mixed engineering and research under one label. The genuinely-engineering tasks tended to ship; the “engineering” tasks that were really unbounded research consumed budget without a deliverable. A meaningful share of reported AI failure is misclassified research-as-engineering — a scoping decision made before any code was written.

The boundary is not a label you assign once and forget. It is the question that decides whether a budget buys an outcome or buys an answer — and naming which parts of a project are which, before committing, is exactly what a risk assessment is for.