Why do most AI POCs fail to convert to production? The most common reason AI POCs fail to become production systems is not technical — it is definitional. The POC was built to demonstrate that “AI can help” rather than to answer a specific, measurable question. Without a clear question, a successful POC produces results that stakeholders interpret differently: the data science team sees a working model, the business team sees uncertain ROI, and the engineering team sees an undeployable prototype. Defining requirements before building prevents this misalignment. The requirements document forces agreement on: what question the POC answers, what data it uses, what success looks like, and what happens if it succeeds. What requirements must be defined? Requirement What to Specify Common Failure If Missing Business question Specific, measurable hypothesis POC answers the wrong question Success metric Quantified threshold (accuracy, latency, cost) No way to evaluate success Data access Specific tables, APIs, access credentials 3-week delay getting data access Scope boundary What is in scope and explicitly out of scope Scope creep delays delivery Timeline Fixed end date with checkpoints POC runs indefinitely Decision framework What actions follow each possible outcome Results sit in a report, unused The decision framework is the most frequently omitted requirement and the most important. It answers: “If the POC succeeds, what decision does the organisation make? If it fails, what does the organisation do instead?” Without this, a successful POC generates enthusiasm but no action — the organisation has not pre-committed to the investment required to productionise the result. How do you scope a POC correctly? A well-scoped AI POC has three boundaries: data boundary (which data sources, time periods, and segments are included), model boundary (which approaches will be evaluated and which are explicitly excluded), and evaluation boundary (which metrics, test datasets, and comparison baselines define success). In our POC design practice, we spend 20–30% of the total POC timeline on requirements definition and data exploration before any model development begins. This investment feels slow but prevents the most expensive failure mode: building a technically successful model that the organisation cannot or will not deploy. For the broader context of how POC design fits into AI strategy, our guide to what an AI POC should actually prove covers the evaluation methodology in detail. What does a good POC requirements document look like? A minimum POC requirements document is 2–3 pages and contains: Problem statement: One paragraph describing the business problem in concrete terms Success criteria: 2–3 quantified metrics with pass/fail thresholds Data specification: Exact data sources, access method, known quality issues Scope: Explicit inclusions and exclusions Timeline: Start date, checkpoint dates, end date (typically 4–8 weeks total) Decision matrix: What action follows each outcome (exceed threshold, meet threshold, fall below threshold) Resource requirements: Team members, compute, data access, stakeholder availability for reviews We share this document with all stakeholders (business sponsor, data science team, engineering team, data owners) and require sign-off before development begins. The sign-off process surfaces disagreements and misaligned expectations before they become expensive mid-POC discoveries. What are the most common POC requirement mistakes? After reviewing dozens of AI POC specifications across industries, the same mistakes appear repeatedly: Defining success as “high accuracy” without a number. “The model should be accurate” is not a success criterion. “The model should achieve ≥85% precision and ≥80% recall on the held-out test set, evaluated on 500+ samples” is a success criterion. Without quantified thresholds, POC evaluation becomes subjective — and subjective evaluation is influenced by organisational politics rather than technical merit. Using production data volumes in the POC. A POC should demonstrate feasibility, not production readiness. Using the full production dataset (millions of records, terabytes of data) extends the POC timeline and obscures the core question: does this approach work on representative data? We scope POC datasets to 10,000–50,000 representative samples — enough to evaluate model performance statistically, small enough to iterate quickly. Omitting the negative case. What happens if the POC fails to meet the success criteria? Without a pre-defined “fail” response, organisations tend to extend the POC indefinitely (“maybe we just need more data” or “let’s try a different model”), consuming resources without converging on a decision. Our POC requirements include a time-box: if the success criteria are not met by the end date, the POC is considered unsuccessful and the decision matrix specifies the next action (abandon, pivot, or extend with specific scope changes). Not specifying the evaluation dataset in advance. If the evaluation dataset is selected after the model is built, there is a risk (conscious or unconscious) of selecting data that makes the model look good. We specify the evaluation dataset split (or the splitting methodology) in the requirements document before any model development begins. This prevents evaluation bias and ensures that the POC results are credible to stakeholders who did not participate in the development. In our POC practice, we conduct a requirements review meeting before development starts. This meeting walks through each requirement with the full stakeholder group, identifies ambiguities, and resolves disagreements. The meeting typically takes 2 hours and saves 2–4 weeks of mid-POC rework.