The retail shrinkage problem and why traditional methods fail at scale Global retail shrinkage exceeded $100 billion annually by 2024 (NRF National Retail Security Survey). Traditional loss prevention — EAS tags, human observation, receipt checks — catches a fraction of losses and scales poorly. Adding more security staff is linearly expensive; adding more cameras without intelligence just creates more footage that nobody watches. Computer vision for loss prevention promises automated detection: identifying concealment events, scan-avoidance at self-checkout, ticket switching, and organised retail crime patterns. The promise is real — deployed correctly, CV-based LP systems detect events that human operators consistently miss. But the gap between demonstration accuracy and production reliability is where most deployments struggle. Why loss prevention CV is harder than general object detection CV-based loss prevention must handle thousands of SKU variants under variable store lighting — off-the-shelf models degrade at this scale. The specific challenges: SKU diversity. A typical grocery store has 30,000–50,000 SKUs. Self-checkout fraud detection requires distinguishing between items that look similar (varieties of the same product, similar packaging across brands). Generic object detection models trained on COCO or ImageNet do not have this granularity. Lighting variability. Retail environments have mixed lighting — fluorescent overheads, natural light from windows that changes hourly, seasonal variation, refrigerator lighting. Models trained in controlled conditions degrade when lighting shifts outside the training distribution. Occlusion and angles. Shoppers’ bodies, carts, bags, and other products occlude the items of interest. Overhead cameras capture a fundamentally different view than aisle-level cameras. Multi-angle systems are necessary but multiply the pipeline complexity. Normal vs suspicious behaviour. Customers routinely handle, examine, and put back products. The difference between “examining an item” and “concealing an item” can be a matter of milliseconds and millimeters in hand position. False-positive rates on concealment detection are typically 5–20× higher than on simple object detection (an observed pattern across our retail CV deployments, not a published benchmark). The compound detection pipeline Effective retail LP systems combine detection, tracking, and POS reconciliation — single-model approaches produce unactionable alert volumes. A production LP pipeline requires: Stage Function Technology Person detection & tracking Maintain identity across camera views Multi-object tracker (ByteTrack, DeepSORT) Item detection Identify products being handled Domain-trained detector (YOLO + SKU-specific fine-tuning) Action classification Distinguish normal handling from concealment/skip Temporal action model (SlowFast, VideoMAE) POS reconciliation Match scanned items against detected items at checkout Event correlation engine Alert filtering Suppress false positives using contextual rules Rule layer with per-zone thresholds The POS reconciliation stage is what transforms noisy video detection into actionable intelligence. A concealment detection alone has limited value — the same event correlated with a subsequent checkout where the item does not appear in the scanned list becomes an actionable loss event. What “works” looks like in production Production-grade retail LP systems do not achieve zero false positives. They achieve a false-positive rate that is low enough for the LP team to investigate every alert — typically below 50% false-positive rate for high-confidence alerts. This is achieved through: Zone-specific models trained on each store’s camera geometry and lighting Confidence cascade — only alerts above multiple threshold stages reach human review Temporal confirmation — a single-frame detection is never an alert; sustained detection across frames is required POS correlation — alerts without corresponding POS anomalies are suppressed The ROI that computer vision delivers in retail depends on this pipeline maturity. Immature deployments (single-model, no POS integration, no zone calibration) generate alert fatigue and negative ROI. Mature deployments reduce shrinkage by measurable percentages — but require 3–6 months of on-site calibration to reach that maturity. Scale is the hard problem A solution that works in one store with 10 cameras does not automatically work across 500 stores with 5,000 cameras. Scale introduces: model drift across geographically diverse lighting conditions, fleet management for edge inference hardware, centralised alert triage across hundreds of locations, and the statistical certainty that even a 1% false-positive rate generates thousands of daily false alerts across the fleet. Loss prevention CV at scale is an infrastructure and operations problem as much as it is a machine learning problem. Teams that treat it as only a model accuracy challenge discover the operations gap after deployment — when the alert volume overwhelms the LP team and the system is disabled store by store.