Engineering Task vs Research Question: Why the Distinction Determines AI Project Success

Why does the engineering-vs-research distinction matter?

“Build a model that classifies customer support tickets into 12 categories with 90% accuracy.” Is this an engineering task or a research question? In our experience across AI scoping engagements, the answer determines everything about how the project should be planned, staffed, budgeted, and evaluated — and getting the answer wrong is one of the most common causes of AI project failure.

An engineering task has a known solution path. The techniques exist, the data requirements are understood, the expected performance range is documented in the literature, and the implementation effort is estimable. A text classification model for 12 categories with sufficient labelled data is an engineering task — the solution path (fine-tune a pre-trained language model on the labelled data, evaluate, tune) is well-established, and the expected accuracy range (85–95% depending on data quality and class separability, as reported in published benchmark suites such as GLUE and industry survey reports) is predictable. PyTorch or JAX, a HuggingFace checkpoint, an evaluation harness, a deployment target — the moving parts are named and the integration shape is known.

A research question has an uncertain solution path. The techniques may not exist, the data requirements may be unknown, the expected performance range is not established, and the effort to reach a satisfactory outcome is unpredictable. As an illustrative example from our scoping engagements: “Build a model that predicts customer churn 90 days in advance with 80% accuracy from transaction data alone” may be a research question — the signal may not exist in the data at that prediction horizon, and no amount of engineering effort will create a signal that does not exist. You cannot fine-tune your way out of an information-theoretic ceiling.

Why the distinction matters for project planning

Engineering tasks and research questions require fundamentally different project management approaches.

Engineering tasks are estimable. The solution path has been implemented before, the effort for each step is known, and the total timeline can be estimated within a reasonable range. A project plan with milestones, deliverables, and a fixed budget is appropriate. The team can commit to a timeline and a performance target with reasonable confidence.

Research questions are not estimable. The solution path may require multiple attempts, dead-end explorations, and hypothesis revisions. A fixed timeline and a performance commitment are inappropriate — they either force the team to declare premature success (lowering the quality bar to meet the deadline) or force the project into indefinite extension (missing deadlines while pursuing an uncertain outcome). Research questions require time-boxed exploration: “we will invest N weeks exploring this question and evaluate whether the findings justify continued investment.”

The failure pattern, then, is a research question planned as an engineering task. The project has a fixed timeline, a fixed budget, and a fixed performance target. In our experience across AI scoping engagements, the team spends the first 60% of the timeline discovering that the initial approach does not work; the remaining 40% is insufficient for the revised approach (an observed pattern across our engagements, not a benchmarked industry rate). The project is over budget, over schedule, and under-performing — not because the team is incompetent, but because the project plan assumed certainty that did not exist.

How to classify your AI project

The classification is rarely obvious at project initiation. A short diagnostic helps.

Has this specific problem been solved before? Not “has AI been applied to this domain?” but “has a model been built for this specific task, with this type of data, at this performance level?” If the answer is yes, and you have comparable data, it is likely an engineering task. If the answer is no, or the comparable solutions used significantly different data or had significantly lower performance requirements, it may be a research question.

Is the signal known to exist in the data? For predictive tasks: is there evidence — from domain expertise, exploratory data analysis, or published research — that the data contains sufficient information to make the prediction at the required accuracy? If the signal’s existence is uncertain, and the team is hoping the model will find a pattern that has not been identified, the project contains a research component.

Is the performance target within the established range? Published benchmarks and industry survey reports establish performance ranges for common AI tasks. Text classification on well-separated classes: 85–95% (as reported in published benchmark suites). Object detection: 70–95% mAP depending on domain (per published COCO and Open Images benchmark suites). Document extraction: 80–95% field accuracy (as reported in published survey reports on document AI). If your performance target is within the established range and your data is comparable to the data used in benchmarks, the project is likely an engineering task. If your target exceeds the established range, the project requires research-level effort.

Does the project require novel data representation? If the input data is in a format that standard model architectures handle well — text, images, tabular data, time series — the project is more likely an engineering task. If the data requires novel representation, combining multiple modalities, handling unusual formats, or representing domain-specific structures, the representation engineering may itself be a research component. ONNX export and TensorRT deployment are engineering. Inventing a tokenisation scheme for a domain-specific structured object is not.

The hybrid case: projects with both components

Most real AI projects contain both engineering and research components. A customer churn prediction project might have an engineering component (build the data pipeline, train a classification model, deploy the serving infrastructure with the usual MLflow / Triton stack) and a research component (determine what features predict churn at a 90-day horizon, determine whether the accuracy target is achievable with the available data).

The project plan should separate the two components and manage them differently:

Research component. Time-boxed exploration. “We will spend 3 weeks on feature engineering and exploratory modelling to determine whether the 90-day churn prediction target is achievable. At the end of 3 weeks, we will have a report with: the best accuracy achieved, the features that contribute most to the prediction, and a recommendation on whether to proceed to the engineering phase.”

Engineering component. Standard project plan. “Given the features identified in the research phase and the validated accuracy range, we will build the production pipeline in 8 weeks, including data pipeline, model training automation, serving infrastructure, and monitoring.”

The research phase’s output is a go/no-go decision for the engineering phase. If the research phase shows that the target is not achievable, the project can be cancelled or rescoped before the engineering investment is committed. Our POC methodology for AI projects implements this approach — the POC is the research phase, the production build is the engineering phase, and the gate between them is real.

The organisational implication

Organisations that treat all AI projects as engineering tasks — fixed timelines, fixed budgets, fixed performance commitments — will experience a high failure rate on projects that contain research components. The failures are not failures of execution. They are failures of planning.

The fix is not to avoid research questions; some of the most valuable AI applications require solving novel problems. The fix is to identify which projects (or which components of projects) are research questions, plan them accordingly with time-boxed phases and explicit go/no-go criteria, and manage stakeholder expectations about the uncertainty.

This distinction is the first assessment we make in any new AI engagement. It determines the engagement structure, the timeline expectations, and the budget model. For generative AI projects, evaluating GenAI use case feasibility before you build applies this classification alongside data readiness and accuracy tolerance assessments. The patterns behind enterprise AI project failure are disproportionately caused by research questions managed as engineering tasks.

AI project classification intake

The following intake questions help classify an AI project as an engineering task, a research question, or a hybrid — before the project plan is written.

What specific business outcome will this project deliver? (Free text. If the answer is vague or aspirational, the project needs scope refinement before classification.)
Has this exact problem type been solved before with comparable data? (Yes, with published benchmarks → Engineering. Yes, but in a different domain → Hybrid. No or uncertain → Research.)
Does the required data exist, and has someone inspected it? (Data exists and has been examined → Engineering. Data exists but has not been examined → Hybrid. Data does not exist or must be collected → Research.)
What is the target performance metric and threshold? (Within published benchmark ranges → Engineering. Above published ranges or no benchmark exists → Research.)
Does the project require a novel data representation or model architecture? (Standard inputs and architectures → Engineering. Non-standard combinations or representations → Research.)
What systems must the model integrate with, and do APIs exist? (APIs exist and are documented → Engineering scope for integration. APIs do not exist or require significant development → adds engineering complexity.)
Is there a simpler non-AI solution that could deliver 80%+ of the value? (Yes → evaluate whether AI is justified. No → proceed with classification. This is a planning heuristic from our scoping engagements, not a benchmarked industry rate.)
What is the acceptable timeline for a conclusive result? (Fixed deadline with committed deliverable → must be engineering. Flexible with interim checkpoints acceptable → can accommodate research.)
What is the budget model? (Fixed price → requires engineering-level predictability. Time-boxed with decision gates → can accommodate research.)
Who is the executive sponsor, and have they agreed to the success criteria? (Sponsor identified and criteria agreed → proceed. Sponsor unclear or criteria not agreed → resolve before classification.)

Scoring. Count the Engineering, Hybrid, and Research answers for questions 2–5. If all four are Engineering, the project is an engineering task. If two or more are Research, the project contains significant research components and should be planned with time-boxed exploration phases. Mixed results indicate a hybrid project that should separate its engineering and research components into distinct phases with a decision gate between them.

If AI projects in the pipeline have not been classified as engineering tasks or research questions, an AI Project Risk Assessment provides the classification and the appropriate planning approach for each.

FAQ

How do I tell whether an AI problem is an engineering task or a research question?

Ask whether the solution path is known. An engineering task has documented techniques, understood data requirements, and a performance range established by published benchmarks. A research question has an uncertain solution path, possibly unknown data requirements, and no established performance range for the target conditions. The simplest test: has this specific problem, with comparable data, been solved at the required performance level before?

Which signals classify a problem as engineering?

A known method documented in the literature, predictable data with comparable examples in published benchmarks, bounded uncertainty about the expected performance range, and an estimable implementation effort. When all four are present, the work fits a standard project plan with milestones and deliverables.

Which signals classify a problem as research?

Open novelty (no published precedent for this specific task at this performance level), unbounded data quality questions (the signal may not exist, or the data may not be representative), and no reliable baseline to estimate the expected outcome. Any one of these turns part of the project into research; two or more indicate the dominant work is research.

How is project scope, schedule, and budget framed differently for research?

Engineering work takes a fixed plan: milestones, deliverables, fixed budget, committed performance target. Research work takes time-boxed exploration: a defined window (typically 2–6 weeks) with a structured deliverable that is a finding and a go/no-go recommendation, not a production artifact. Budget for research is bounded by the time-box, not by the outcome.

Why do projects framed as engineering when they were actually research consume budget without producing outcomes?

Because the fixed-plan structure has no mechanism to absorb the discovery that the initial approach does not work. The team spends most of the timeline learning the problem is harder than assumed; what remains is insufficient for a revised approach. The result is over budget, over schedule, and under-performing — a planning failure, not an execution failure.

How does the engineering-vs-research distinction relate to per-use-case GenAI feasibility?

It is the same diagnostic, applied earlier. The GenAI feasibility assessment (TK3-CCU-04) asks per use case whether the work is implementable today; the engineering-vs-research distinction asks the same question for any AI project. For GenAI specifically, novel prompt strategies, retrieval architectures, or hallucination-tolerance requirements often push otherwise-engineering work into the research column.