What actually makes an AI system agentic? “Agentic AI” entered the mainstream vocabulary in 2024, and the definition has been stretched to cover everything from a chatbot that calls a function to a fully autonomous system that plans, executes, and self-corrects over extended task sequences. The confusion is counterproductive. Organisations evaluating agentic AI capabilities need a clear definition to make build-or-buy decisions, and the current marketing-driven terminology makes that difficult. The distinction is actually straightforward. Generative AI produces output in response to input — a prompt goes in, a response comes out. The model does not take actions, does not maintain state across interactions beyond the context window, and does not autonomously pursue a goal. Agentic AI uses a generative model as its reasoning engine but adds the capability to take actions (call APIs, query databases, execute code), maintain state across multiple steps, plan multi-step workflows toward a defined goal, and self-correct when intermediate steps produce unexpected results. Gartner’s Predicts 2025: AI Agents report (October 2024, published-survey) forecasts significant growth in agentic AI capabilities within enterprise software by 2028, up from less than 1% of applications in 2024. Venture capital trends point in the same direction: a growing share of AI startups receiving Series A funding now describe their product as “agentic” — a market-direction signal, not a technical maturity claim. The label “agentic” is applied more broadly in marketing than in engineering practice, which is why this article draws the boundary tighter than the prevailing usage. The model architecture is often the same. GPT-4, Claude, Gemini, or Llama can serve as the reasoning engine for both generative and agentic applications. The difference is not in the model but in the surrounding system: the tool interfaces, the planning loop, the state management, and the autonomy level. Deployment decisions should be based on architectural properties, not on label adoption. What makes a system agentic An agentic system has four properties that a purely generative system does not. Tool use. The model can invoke external tools — APIs, databases, calculators, code interpreters, web browsers, file systems — and incorporate the tool output into its reasoning. A generative model that can search the web and use the search results in its response is exhibiting tool use. A generative model that only produces text from its pre-trained knowledge is not. Planning. The model decomposes a high-level goal into a sequence of steps and executes them in order. The plan may be explicit (the model generates a step-by-step plan before executing) or implicit (the model decides the next action based on the current state without generating a complete plan upfront). Planning enables multi-step task completion that goes beyond single-turn generation. Memory and state. The agent maintains context across steps — what it has already done, what results previous steps produced, and what remains to be done. State management may use the model’s context window, an external memory store, or both. Without persistent state, the agent cannot reason about progress toward a goal across multiple actions. Autonomy and self-correction. The agent detects when an action produces an unexpected result and adjusts its approach. If a database query returns no results, the agent reformulates the query rather than reporting failure. If a code execution produces an error, the agent reads the error message and modifies the code. This feedback loop — act, observe, adjust — is what separates agentic behaviour from scripted automation. Where the boundary is drawn The boundary between “generative AI with tool use” and “agentic AI” is fuzzy, and reasonable people draw it differently. Our working definition: a system is agentic when it autonomously executes multiple steps toward a goal, with the ability to branch, retry, or adapt based on intermediate results. A system that takes a single action — searching the web and summarising the results, say — is tool-augmented generative AI, not agentic. Is ChatGPT generative AI or agentic AI? It is both, depending on which mode you invoke. The base chat completion endpoint is generative. The Assistants API, which adds tool calling, code interpretation, and file retrieval, exhibits agentic properties when the tools are wired up. The architectural question — what does this system do, what state does it hold, what actions can it take — matters more than the product label. This distinction matters for deployment because agentic systems introduce risks that single-turn generative systems do not. The agent can take a wrong action with real-world consequences: sending an email, modifying a database, executing a payment. The agent can enter infinite loops or take excessive actions, running up API costs. The agent’s decision-making is harder to audit — a 15-step reasoning chain is harder to review than a single response. None of these risks exists for a generative model that only emits text. McKinsey’s 2023 analysis (published-survey) estimated that generative AI broadly could unlock up to $4.4 trillion in annual productivity value across the economy, with agentic workflows representing a significant share of that opportunity. This is a market-direction estimate that explains the intensity of vendor investment, not a near-term deployment forecast. We mention it to anchor scale, not to make a project-level claim. Agentic vs generative vs predictive — one architecture In practice, the three categories coexist. A predictive model classifies, scores, or forecasts. A generative model produces text, image, or code output. An agentic system orchestrates calls to one or more of those models as tools, alongside calls to non-AI tools (databases, APIs, deterministic functions). Capability Generative Agentic Predictive Primary output Content (text, image, code) Sequence of actions toward a goal Score, class, or value State across calls Context window only Persistent across multiple steps Stateless per call Tool invocation None or single tool Multi-tool, conditional None Failure mode Hallucinated content Wrong action with side effects Misclassification Audit surface Prompt + response Action trace + intermediate state Input features + score Infra dependency Model serving Model serving + orchestrator + memory store Model serving The categories overlap at the edges, but the architectural footprint differs. An agentic deployment needs an orchestrator (LangGraph, AutoGen, a custom state machine), persistent memory (Redis, vector store, or external database), action logging for audit, and rate limiting to bound cost. A generative deployment needs model serving and a content moderation layer. A predictive deployment needs feature pipelines and model serving. Conflating them at scoping time produces infrastructure surprises in implementation. Agentic architecture example: invoice processing pipeline A concrete example illustrates the agentic properties in practice. Consider an invoice processing pipeline with three agents. Intake agent. Monitors an email inbox and shared drive for new invoices. Extracts the document, identifies the format (PDF, scanned image, structured EDI), and routes it to the appropriate processing path. Control boundary: the intake agent can read inbound documents and write to the processing queue, but cannot modify financial records or approve payments. Extraction agent. Receives a routed document, extracts structured fields (vendor, amount, line items, payment terms, PO number), and validates the extracted data against the vendor master and purchase order database. As a planning heuristic from our agentic AI engagements (observed-pattern, not a benchmarked industry rate), if extraction confidence is below roughly 85% on any field, the agent flags the invoice for human review rather than proceeding. Failure mode: OCR errors on scanned documents produce low-confidence extractions; the agent must recognise its own uncertainty rather than hallucinating field values. Approval agent. Matches the validated invoice against approval rules (amount thresholds, budget codes, duplicate detection) and either routes to the appropriate approver or auto-approves invoices below the threshold. Control boundary: the approval agent can route for approval and flag anomalies, but cannot execute payments — payment execution remains in the existing ERP workflow with human authorisation. The pipeline’s failure boundaries are explicit. No single agent can both extract data and approve payment (separation of concerns). Confidence thresholds force human review when the system is uncertain. Each agent’s write permissions are restricted to its specific function. These boundaries prevent the cascading failure mode where one agent’s error propagates unchecked through the entire workflow. Infrastructure that an agentic system needs and a generative one does not Treating an agent project as a generation project is the most common scoping error we see. The differences are concrete. A generative call has one log line: prompt, response, latency, token cost. An agentic run has a trace: every tool invocation, every intermediate state, every retry. Without that trace, you cannot debug a wrong outcome and you cannot pass an audit. The orchestrator layer (LangGraph, AutoGen, CrewAI, or a hand-rolled state machine) exists primarily to produce that trace, not to be clever. State management is the second gap. A generative service can be stateless. An agent, by definition, is not — it needs to know what it has already tried, what it learned, and how close it is to its goal. That means a memory store (Redis, Postgres, or a vector index for semantic recall), a schema for the agent’s working memory, and a policy for when memory is reset. Failure handling diverges sharply. A generative model that produces a poor response is regenerated or filtered. An agent that takes a wrong action may need to be rolled back — and rollback is only possible if every action is logged and every side effect is reversible. The design constraint “every agent write must be reversible or gated by human approval” is what separates a safe agent deployment from an expensive incident. Cost containment matters in a way it does not for single-shot generation. An agent that loops on a failing tool can issue hundreds of model calls before anyone notices. Hard caps on step count, wall-clock time, and total token spend per run are not optional. Our broader LLM agents primer walks through these infrastructure concerns in more depth, and the multi-agent coordination patterns post covers the additional failure surface that appears when several agents share state. Current agentic frameworks The practical implementation of agentic AI uses frameworks that provide tool use, planning, memory, and orchestration infrastructure. LangChain / LangGraph provides a composable framework for building agentic workflows with tool use, state management, and conditional branching. LangGraph extends LangChain with explicit graph-based workflow definitions, enabling complex multi-step agents with defined control flow. AutoGen (Microsoft) provides a multi-agent framework where multiple AI agents with different roles collaborate on tasks. The agents communicate through structured messages, with each agent specialising in a subset of the task — a “coder” agent, a “reviewer” agent, a “planner” agent. CrewAI provides a role-based multi-agent framework focused on defining agent “crews” with specific roles, goals, and tools. The framework manages the agent coordination and task delegation. OpenAI Assistants API / Anthropic Tool Use provide built-in agentic capabilities at the model API level — tool calling, code interpretation, and file retrieval as native API features rather than external framework layers. The architectures are not yet standardised, and best practices are still consolidating. We expect significant churn in framework APIs through 2026; betting heavily on framework-specific abstractions today is a known risk we flag explicitly with clients. When agentic AI is appropriate — and when it is not Agentic AI is appropriate when the task requires multiple autonomous steps, tool use, and adaptive decision-making: research tasks (gathering information from multiple sources, synthesising findings), workflow automation (processing multi-step business processes that require judgment), code generation and debugging (writing code, testing it, and fixing errors iteratively), and complex data analysis (querying multiple data sources, combining results, and interpreting findings). Agentic AI is not appropriate when the task is single-step generation (write a marketing email), when the error cost of autonomous action is too high (the agent should not autonomously modify production databases without human approval), or when the task is well-defined enough that traditional automation — scripts, workflow engines, rule-based systems — handles it more reliably and cheaply. We advise clients to evaluate whether the task genuinely requires the adaptive multi-step capability that agentic systems provide, or whether a simpler approach (a generative model with a single tool call, or a traditional automation pipeline) would achieve the same result with less complexity and lower risk. The GenAI feasibility assessment makes that determination as part of broader scoping, and the generative AI practice page outlines what we cover end-to-end. FAQ What is agentic AI, and how is it engineering-distinct from generative AI? Agentic AI is a system that uses a generative model as its reasoning engine but adds tool use, planning, persistent state, and self-correction to autonomously execute multi-step workflows. Generative AI produces output in response to a prompt and stops. The engineering distinction is in the surrounding system — orchestration, memory, action logging — not in the model itself. Is ChatGPT a generative AI or an agentic AI — and why does the distinction matter for scoping? It is both, depending on which mode you invoke. The chat completion endpoint is generative. The Assistants API with tools enabled is agentic. The distinction matters because an agentic deployment requires orchestration, persistent memory, action audit trails, and cost caps that a generative deployment does not. Scoping an agent project as a generation project consistently understates infrastructure cost. What are concrete examples of agentic AI versus generative AI in real workflows? Generative: drafting a marketing email, summarising a document, generating product copy, producing an image from a prompt. Agentic: an invoice processing pipeline that extracts fields, validates against vendor records, and routes for approval; a research agent that queries multiple sources, synthesises findings, and produces a report with citations; a code agent that writes a function, runs the test suite, and fixes errors iteratively. How does the infrastructure for an agentic system differ from a generative one (monitoring, state, failure handling)? A generative deployment needs model serving, content moderation, and per-call logging. An agentic deployment adds an orchestrator (LangGraph, AutoGen, or equivalent), a persistent memory store, a per-step action trace, hard caps on step count and token spend, and a policy for which actions are reversible or gated by human approval. The action trace is the audit surface; without it, debugging and compliance both fail. When does a use case need an agent, and when is a single generative call sufficient? A single generative call is sufficient when the task is one step, the output is content (not action), and the error cost is low. An agent is needed when the task requires multiple steps with branching, when intermediate results must influence the next action, or when external tools must be invoked and their results reasoned over. If you can write the task as a fixed script with a single model call inside it, you do not need an agent. How do agentic AI, generative AI, and predictive AI fit into one architecture without overlapping? Treat predictive and generative models as tools that an agentic orchestrator can call. The classifier scores; the generator drafts; the agent decides what to call, in what order, and what to do with the results. The architecture stays clean when the agent owns control flow and state, and the predictive and generative models stay stateless behind their respective endpoints. A clear agentic-vs-generative distinction is a prerequisite for honest feasibility assessment. The next question is whether the use case in front of you needs an agent at all, or whether a simpler pattern delivers the same outcome at lower risk.