Most GenAI use cases should not be built
The pressure to “do something with GenAI” produces a pipeline of use case proposals that ranges from transformative to absurd. A customer service chatbot that reduces ticket volume by 40% — transformative, if the knowledge base is structured and the error tolerance is appropriate. An AI that generates legally binding contracts without human review — absurd, given current model reliability and hallucination rates. Most proposed use cases fall between these extremes, and the feasibility of each one depends on specific, assessable factors that are identifiable before any code is written.
The expensive mistake is not building the wrong thing — it is building the wrong thing for three months before discovering it is the wrong thing. A structured feasibility assessment at the start prevents that waste.
The four feasibility dimensions
Every GenAI use case can be evaluated along four dimensions. A use case that fails on any dimension is either infeasible or requires scope modification before development begins.
Is the data available and sufficient?
Generative AI models — whether used for text generation, image synthesis, code completion, or structured output — require data to function. For fine-tuning or RAG (retrieval-augmented generation), the data must be available, accessible, and of sufficient quality to support the use case.
For RAG-based applications: The knowledge base must contain the information the model needs to generate accurate responses. If the information is scattered across undocumented tribal knowledge, unstructured email threads, and informal processes, the RAG retrieval will not find what it needs — not because the retrieval mechanism is weak, but because the source data does not exist in a retrievable form. We have seen organisations spend months building RAG pipelines only to discover that the knowledge they wanted the system to access was never written down.
For fine-tuning applications: The training data must be representative of the desired output and available in sufficient volume. Fine-tuning a language model for a domain-specific task typically requires 1,000–10,000 high-quality examples. If the domain is narrow and the examples do not exist (or exist only in formats that require significant manual curation), the data preparation cost may exceed the development cost.
For prompt-engineering applications: The base model must have sufficient pre-training coverage of the domain. GPT-4, Claude, and Gemini have broad pre-training coverage, but domain-specific accuracy varies. A prompt-engineered application for a niche domain — say, rare-earth mineral extraction procedures — will produce less reliable output than one for a well-represented domain like software engineering, because the model’s pre-training data contained less relevant information.
What is the accuracy tolerance?
Every GenAI output has a non-zero error rate. For text generation, this manifests as hallucination — factually incorrect statements presented as fact. For image generation, it manifests as artifacts, anatomical errors, or brand-inconsistent output. For code generation, it manifests as syntactically valid but functionally incorrect code.
The feasibility question is not “does the model make errors?” (it does) but “is the error rate acceptable for this use case, given the cost and risk of each error?”
A marketing team using GenAI to draft social media posts can tolerate a 10–15% revision rate — the posts are reviewed before publication, and revisions are low-cost. A medical information system that generates patient-facing health guidance cannot tolerate a 1% hallucination rate — the consequence of an incorrect medical statement is a liability event.
The accuracy tolerance determines whether the use case is feasible with current model capabilities. The predictable failure patterns of GenAI projects illustrate what happens when this tolerance is not assessed upfront — whether it requires human-in-the-loop review (which changes the cost model), or whether it is infeasible until model reliability improves.
Does the integration complexity justify the value?
A GenAI capability that works in a demo environment but requires six months of integration work to connect to the production systems, data sources, and workflows that it needs to be useful may not be worth the integration cost — particularly if the value it delivers is incremental rather than transformative.
Integration complexity includes: connecting to data sources (APIs, databases, document stores) for RAG retrieval, integrating with existing workflow tools (CRM, ERP, ticketing systems) for action-taking, implementing authentication and authorisation for multi-tenant environments, and building monitoring and feedback infrastructure for ongoing quality management.
Our assessment of integration complexity focuses on the distance between the demo and production: how many systems must be connected, how mature are the APIs, and what security and compliance requirements apply to the data the model will access?
Is there a simpler solution?
The most overlooked feasibility question: does this use case actually require generative AI? A search feature that retrieves and presents existing content does not need a generative model — a well-implemented search engine with good indexing is simpler, faster, and more reliable. A classification task (route this ticket to the right team) does not need a generative model — a fine-tuned classifier or even a rule-based system may be sufficient and more predictable.
GenAI is appropriate when the output must be generated — when the system needs to produce new text, images, or structured data that does not already exist in the knowledge base. When the output is retrieval, classification, or routing, a non-generative solution is usually more appropriate. It is also worth assessing whether the use case is an engineering task or a research question — if the required capability is not yet production-proven, the project may need a research timeline rather than an engineering timeline.
The assessment process
We conduct GenAI feasibility assessments as structured evaluations:
-
Use case catalogue. Enumerate the proposed use cases with clear descriptions of the input, the expected output, the value delivered, and the current process the GenAI would replace or augment.
-
Dimension scoring. Evaluate each use case against the four feasibility dimensions — data availability, accuracy tolerance, integration complexity, and solution simplicity. Each dimension receives a red/amber/green rating with specific rationale.
-
Priority ranking. Rank feasible use cases by value-to-effort ratio. The highest-value, lowest-effort use cases go first. Use cases with amber ratings on one or more dimensions go into a “conditional” category with specific conditions that must be met before development begins.
-
POC scoping. For the top-ranked use cases, define the minimum POC that validates the riskiest dimension. If data availability is the risk, the POC validates retrieval quality. If accuracy tolerance is the risk, the POC measures the model’s error rate on representative inputs.
What the assessment prevents
The assessment prevents the two most common GenAI project failures: building a system whose data sources do not support the required quality, and building a system whose error rate is unacceptable for the operational context. Both failures are discoverable before development begins — but only if the assessment is conducted systematically rather than skipped in the rush to demonstrate AI capability. These failure patterns mirror the broader trend: most enterprise AI projects fail for the same structural reasons — data readiness gaps, unclear success criteria, and integration underestimation.
If your organisation has a pipeline of GenAI use case proposals and needs to determine which ones are worth building, a GenAI Feasibility Assessment evaluates each proposal against the four dimensions and produces a prioritised implementation roadmap. Learn more about our generative AI practice.