A vision-QC model that scores 98% on a curated test set and sails through a lab demo can still let the worst defects through once it is bolted onto a moving line. The gap is not a model-quality problem you can buy your way out of with a bigger network. It is a deployment-condition problem, and it shows up as a specific, repeatable failure class the moment an off-the-shelf computer-vision model leaves the bench and meets the floor. We see this pattern regularly on the POC-to-production transition: the demo succeeds on hand-picked parts under controlled lighting, the purchase order goes through on the strength of that demo, and then the live defect-catch rate quietly disappoints. Nobody notices immediately, because the system keeps returning confident answers. It just returns confident wrong answers on exactly the cases it was bought to catch. Why an Off-the-Shelf Vision Model That Passes a Lab Demo Underperforms on a Live Line A lab demo measures the model against a closed world: a fixed set of defect classes, evenly lit, presented one part at a time. A manufacturing line is an open world. Parts arrive at line speed under lighting that drifts across a shift, with vibration, occlusion, specular glare on metal, condensation on the lens, and the occasional component the model has never seen. The model does not know which world it is in. It applies the same softmax it learned in the lab and produces a probability for every input — including inputs that belong to no class it was trained on. This is the same compound failure class that breaks off-the-shelf CV at retail scale: uncontrolled environments, edge throughput constraints, and unknown-object cases the model silently misclassifies rather than flagging. The vertical changes; the failure mode does not. A retail shelf and a stamping press are different worlds, but both punish a model that was validated against a clean test set and deployed against reality. The expensive part is the silence. A model that crashes is a problem you fix this afternoon. A model that returns a 0.91-confidence “pass” on a cracked casting because the crack pattern wasn’t in its training distribution is a problem you discover three weeks later in a customer return. The cost of pretending this is a model-upgrade problem — swap in a newer architecture, retrain on more images — is a QC system that keeps missing the exact failure modes it was deployed to catch. The Failure Class, Named The three mechanisms below account for most of the lab-to-line gap we encounter in industrial CV engagements. Treating them as separate symptoms of one underlying problem — a model deployed without a deployment discipline — is more useful than chasing each as an isolated bug. Failure mechanism What it looks like on the line Why a bigger model doesn’t fix it Uncontrolled environment Accuracy degrades across a shift as lighting, dust, and lens condensation drift away from training conditions. The model is accurate in distribution; the line keeps pushing inputs out of distribution. More capacity learns the lab better, not the floor. Edge throughput constraint At line speed the inference budget forces frame-dropping, lower resolution, or shorter exposure — quietly trading away the detail defects live in. A larger network is slower. On a fixed edge budget, bigger means you inspect fewer parts or smaller crops, lowering the achievable catch rate. Unknown-object / unexpected-defect A defect type — or a part variant — outside the trained classes is confidently mapped to the nearest known class. The classifier has no “I don’t know” output. Scaling the same closed-set design produces a more confident wrong answer, not an abstention. These are observed patterns across TechnoLynx computer-vision engagements, not a benchmarked failure rate — the specific numbers depend on the line, the optics, and the defect taxonomy. What is consistent is the shape: each mechanism is a property of the deployment condition, not of the model weights, which is why retraining alone rarely closes the gap. Early Warning Signs Before a Return Ships You can recognise this failure class before it reaches a customer if you watch the right signals. The diagnostic below is the checklist we run during a production-readiness review on a vision-QC line. The demo dataset and the line distribution were never compared. If nobody has plotted training-image lighting and part-pose against a week of live captures, the open-world gap is unmeasured, not absent. The pipeline reports a class for every frame and never abstains. A system that cannot say “low confidence — route to manual” has no mechanism to surface the unknown-object case at all. Defect-catch rate is reported as a single lab number, not a live, rolling measurement. A static accuracy figure from acceptance testing tells you nothing about drift across shifts. Throughput was validated on the dev workstation, not the edge target. A model that hits latency on an RTX-class desktop may miss it on the Jetson-class device actually on the line — forcing the silent resolution and frame-rate compromises above. There is no audit trail of borderline decisions. If you cannot pull the images the model was least sure about, you cannot tell whether it is failing safely or failing silently. If three or more of these hold, the deployment is in the failure class regardless of how the demo looked. The signs are about observability, not accuracy — because the defining property of this failure is that accuracy looks fine right up until it doesn’t. What Changes When the Pipeline Has to Handle the Unknown The fix is structural, not a model swap. A modular, observable pipeline — separate stages for capture, pre-processing, detection, confidence gating, and routing — restores the defect-catch rate and the safety-event-detection rate the deployment was purchased for, because it gives each failure mechanism somewhere to be caught. Explicit unknown-object handling is the centrepiece. Instead of a closed-set classifier forced to pick a known class, the pipeline adds an out-of-distribution gate: anomaly scoring, confidence thresholding, or an open-set head that can route a part it doesn’t understand to manual review rather than stamping it “pass.” This is the single change that converts a silent failure into a visible one — and a visible failure is one a quality engineer can act on. The mechanics of building this so it survives the move from a pilot bench to a running line are covered in our piece on how CV defect-detection models survive the move from pilot to production line. Observability is the other half. When detection runs as a distinct, instrumented stage — confidence distributions logged, borderline crops retained, drift tracked against the live capture stream — you get rolling measurement of the catch rate instead of a stale acceptance number. Common building blocks here are ONNX Runtime or TensorRT for the edge inference budget, OpenCV for the deterministic pre-processing that keeps inputs closer to the training distribution, and a lightweight logging layer that retains the low-confidence frames for periodic re-labelling. None of this requires a more exotic model. It requires treating the CV system as a production pipeline rather than a clever artifact. For the prior question — whether a given inspection task is even tractable with vision in the first place — see our feasibility framing in when industrial computer vision inspection actually works. Vision-QC and factory-floor safety detection sit inside the same computer vision engineering practice, because they fail the same way and demand the same deployment discipline. Worked Example: A Casting-Inspection Cell Consider a cell inspecting aluminium die-castings for surface cracks and porosity. Assumptions, stated explicitly so the reasoning is checkable: a closed-set classifier trained on ~12 labelled defect categories, a lab acceptance accuracy in the high-90s, an edge target with a fixed per-part inference budget at line speed, and a real defect population that includes occasional cold-shut and inclusion patterns absent from the training set. In the lab, the model is excellent. On the line, three things happen in sequence. Lighting drift across a shift pushes the porosity signal toward the edge of the trained distribution, so borderline porous parts start scoring just over the pass threshold. The edge budget, validated optimistically on a workstation, forces a resolution drop that erases fine hairline cracks. And the cold-shut defects — never in the training set — get confidently mapped to “clean surface,” because the classifier has no abstention path. Each effect is individually small. Together they drop the live catch rate well below the acceptance figure, and because every output is a confident pass, the dashboards stay green. The structural fix routes the same hardware differently: deterministic pre-processing to hold the input distribution steady, an out-of-distribution gate that diverts the cold-shut and inclusion cases to manual review, and rolling confidence logging that surfaces the lighting drift within a shift rather than a quarter later. The model did not change. The discipline around it did. FAQ What is manufacturing facility security AI gun software? It refers to computer-vision systems that detect firearms or other threats on a manufacturing site as a safety-event-detection task. The engineering challenge is identical to vision-QC: the same uncontrolled-environment, edge-throughput, and unknown-object failure class applies, and the same modular, observable pipeline discipline is what makes detection reliable rather than a confident-but-wrong demo. What is manufacturing safety AI gun software? This is factory-floor safety detection implemented with computer vision — flagging a weapon or hazardous object as a safety event. Like defect detection, it depends on handling cases the model was never trained on rather than forcing every frame into a known class, which is why an explicit out-of-distribution gate matters more than a marginally larger model. What is manufacturing safety technology AI gun? It is the broader category of vision-based safety inference on a manufacturing line, of which weapon detection is one instance. The reliability of any such system is governed by deployment conditions — lighting, throughput budget, and unexpected inputs — not by lab accuracy, so it must be validated against the live floor distribution rather than a curated test set. What is security manufacturing systems AI gun? These are security-oriented vision deployments on manufacturing sites that share the same failure class as quality-control CV. A static lab accuracy number tells you nothing about live safety-event-detection performance; only rolling, observable measurement against real floor conditions does. Why do off-the-shelf computer-vision models that pass lab demos silently underperform once deployed on a live manufacturing line? A lab demo tests a closed world of fixed defect classes under controlled lighting, while a live line is an open world of drifting conditions, throughput pressure, and unseen inputs. The model applies the same closed-set classifier to both and returns confident answers either way, so it fails silently — confidently passing exactly the parts it should flag — until a defect reaches a customer. What changes when a vision-QC pipeline has to handle unknown-object or unexpected-defect cases instead of only the defect classes it was trained on? It must add an explicit abstention path — an out-of-distribution gate, anomaly score, or open-set head — that routes a part the model does not understand to manual review instead of mapping it to the nearest known class. This converts a silent misclassification into a visible, actionable event and is the single structural change that most restores the real catch rate. How do edge throughput constraints on a factory floor affect the achievable defect-catch rate and safety-event-detection rate of a CV deployment? A fixed edge inference budget at line speed forces trade-offs — frame dropping, lower resolution, shorter exposure — that quietly remove the detail small defects and safety events live in. A larger model makes this worse because it is slower, so throughput must be validated on the actual edge target rather than a development workstation to know the catch rate you can really sustain. Where This Leaves a Manufacturing Team The uncomfortable part is that nothing in this failure class is detectable from a demo. The demo is precisely the condition under which an off-the-shelf model looks best. The question worth asking before a line deployment is not “how accurate is the model” but “what does this system do with a part it has never seen, at line speed, under shift-end lighting” — because that is the case the QC system exists to catch, and a closed-set model has no honest answer for it. A production CV readiness assessment exists to surface exactly that gap before it ships as a return.