The retrieval layer determines search quality Enterprise AI search quality depends on chunking strategy and retrieval pipeline design more than on the LLM — poor retrieval plus a powerful LLM produces confident wrong answers. This is the most common misunderstanding in enterprise AI search implementations: teams invest in upgrading the language model while neglecting the retrieval layer that feeds it. The architecture is straightforward: documents are chunked, embedded, stored in a vector database, and retrieved based on semantic similarity to the query. The LLM then generates an answer based on the retrieved chunks. If the retrieval step returns irrelevant or incomplete chunks, the LLM will generate a fluent, confident, wrong answer — because it only knows what it is given. Why chunking strategy matters more than model size Chunking — how documents are split into retrievable segments — is the highest-leverage decision in enterprise search architecture. Get it wrong and no amount of model quality compensates. Chunking approach Works well for Fails for Fixed-size (512 tokens) Uniform documents, FAQ-style content Documents where meaning spans paragraphs Paragraph-level Well-structured documents with clear sections Dense technical docs where context spans multiple paragraphs Semantic (topic-based) Long documents with distinct topics Highly interconnected content where topics overlap Hierarchical (parent-child) Complex documents needing both detail and context Simple, short documents (adds unnecessary complexity) The diagnostic question: when your enterprise search gives a wrong answer, is it because the LLM misunderstood the context, or because the context it received was incomplete or irrelevant? In our experience building RAG systems, 70–80% of wrong answers trace to retrieval failures, not generation failures. The permission enforcement gap The gap between demo and production enterprise search is data permission handling — production systems must enforce document-level access controls at retrieval time. A demo can retrieve from all documents for all users. A production system must ensure that User A (who has access to documents 1–50) never receives information from documents 51–100 in their search results. This requirement — access control at retrieval time — adds substantial complexity: Document-level ACLs must be stored alongside embeddings and enforced at query time Group membership changes must propagate to the search layer in near-real-time Partial document access (user can see Section 3 of a document but not Section 7) requires chunk-level permission metadata Audit logging must record what was retrieved and shown, not just what was queried Teams that build enterprise search without permission enforcement from the start face a painful retrofit. The vector database, chunking pipeline, and retrieval logic all need modification — it is not a layer you add on top. Retrieval patterns that work at enterprise scale Hybrid search (vector + keyword). Pure semantic search misses exact matches (product codes, policy numbers, people’s names). Pure keyword search misses conceptual matches. Hybrid search combines both, typically with a weighted fusion: final_score = α × semantic_score + (1-α) × keyword_score The optimal α varies by use case (typically 0.6–0.8 weight on semantic for knowledge-base search, 0.3–0.5 for technical documentation with many exact terms). Re-ranking. Initial retrieval casts a wide net (top 50–100 chunks). A cross-encoder re-ranker then scores each chunk against the full query, re-ordering by relevance. This two-stage approach is more accurate than single-stage retrieval while remaining computationally feasible. Query decomposition. Complex enterprise queries (“What is our policy on remote work for contractors in EMEA after the 2024 update?”) benefit from decomposition into sub-queries that retrieve different facets, then synthesis across results. What a structured consulting engagement delivers For organisations evaluating enterprise AI search, understanding how a structured AI consulting engagement works matters because enterprise search is not a tool you install — it is a system you build. The engagement defines the corpus (which documents), the access model (who sees what), the quality bar (acceptable accuracy), and the retrieval architecture (chunking, embedding, re-ranking, generation) that meets those requirements. The common mistake is treating enterprise AI search as a product purchase rather than a systems integration project. Products exist (Azure AI Search, Google Vertex AI Search, Amazon Kendra), but the differentiating work — corpus preparation, permission mapping, chunking optimisation, quality evaluation — requires engineering specific to your document landscape.