What is the engineering approach to detecting anomalies in video data pipelines? Anomaly detection in production video streams has a data problem that most ML approaches are not designed to handle: anomalies are rare by definition, which means a labelled dataset of anomalous frames is either unavailable, inadequate, or unrepresentative of the anomalies that will actually occur in production. A supervised classification approach — training a model to distinguish normal from anomalous frames — requires labelled examples from both classes. For broadcast video quality control, the anomalous class includes transmission artefacts, encoding failures, timing errors, content policy violations, and equipment malfunctions. Some of these are sufficiently common to generate training examples; others occur once per year and cannot be anticipated in advance. The data scarcity problem means that a supervised model trained on the available anomaly examples will generalise poorly to anomalies it has not seen. This is not a model architecture failure. It is a consequence of asking a supervised approach to generalise from a small, non-representative anomaly sample. The generative model approach to anomaly scoring A generative anomaly detection model is trained on normal frames only — the class that is abundant, consistent, and well-characterised. At inference time, the model attempts to reconstruct each input frame. Frames that closely match the learned normal distribution produce low reconstruction error. Frames that deviate from the normal distribution — because they contain artefacts, encoding failures, or content anomalies — produce high reconstruction error. The reconstruction error is the anomaly score. This approach has two properties that make it well-suited to production video quality control: It does not require labelled anomaly examples. The model learns what normal looks like from the normal training data. Anomalies are detected because they deviate from normal, not because they match a labelled anomaly category. It generalises to anomaly types it has never seen. A supervised model trained on specific anomaly types may fail to detect new anomaly types that were not in the training set. A generative model that has learned the normal distribution will produce high reconstruction error for any frame that deviates significantly from that distribution, regardless of the specific mechanism of deviation. Supervised vs generative anomaly detection: the architectural tradeoffs Dimension Supervised classification Generative model Training data requirement Labelled examples of both normal and anomalous frames Normal frames only Anomaly generalisation Limited to anomaly types seen in training Generalises to unseen anomaly types via deviation from normal False positive control Threshold on predicted anomaly probability Threshold on reconstruction error; calibrated to normal distribution Interpretability Class probability + optional attribution (Grad-CAM, SHAP) Reconstruction difference map — visually shows where the anomaly was detected Suitable for High-frequency, consistent anomaly types with sufficient labelled data Rare or open-set anomalies; new deployment with limited anomaly history Runtime cost Single forward pass through classifier Encoder-decoder forward pass; higher but manageable for broadcast frame rates In the generative anomaly detection system we developed for a broadcast data pipeline client, the training corpus was exclusively normal broadcast frames. The generative model — a convolutional autoencoder with a regularised latent space — learned to reconstruct normal frames with high fidelity. Reconstruction error on normal frames settled into a stable distribution within the first 24 hours of training. Frames with transmission artefacts, encoded at lower quality than the expected baseline, produced reconstruction error 3–5× higher than the normal-frame distribution mean (project-specific measurement on the deployed model). Content policy anomalies — frames containing visual patterns outside the broadcast standard — produced reconstruction errors in the 8–12× range. The threshold for alert triggering was set at the 99.5th percentile of the normal-frame reconstruction error distribution, producing a manageable alert rate with strong recall on content-quality anomalies. Threshold calibration with synthetic anomalies: the procedure One practical limitation of the generative approach is threshold calibration in the absence of labelled anomaly examples. The threshold separates normal frames (reconstruction error below the threshold) from anomalous frames (above the threshold). Without real anomaly examples, the calibration relies on the normal distribution alone — setting the threshold at a high percentile of the normal distribution, which controls the false-positive rate but does not directly measure recall. Synthetic anomaly generation addresses this. The procedure below is the structure we use; the specific perturbation parameters depend on the broadcast standard and content profile being protected. 1. Define the anomaly category set. Enumerate the anomaly categories the system is responsible for detecting. For broadcast quality control these typically include: compression artefact bursts, resolution mismatches against the expected output, colour-space drift, frozen or duplicated frames, and content frames outside the broadcast format profile. Each category becomes a row in the calibration matrix. 2. Synthesise a parameter sweep per category. For each category, generate synthetic anomaly frames at a range of severity levels by applying the corresponding perturbation to a sample of normal frames. Compression artefacts: re-encode at progressively lower bitrates (e.g. 8 Mbps, 4 Mbps, 2 Mbps, 1 Mbps for a stream whose normal bitrate is 12 Mbps). Resolution: downscale and rescale through multiple intermediate resolutions. Colour drift: apply known colour-space transforms (gamma shifts, channel swaps, white-balance offsets) at calibrated magnitudes. Tooling: FFmpeg for the encoding and resolution perturbations, OpenCV for the colour-space and frame-duplication perturbations. 3. Score the synthetic set with the trained generative model. For each synthetic frame, run the generative model and record the reconstruction error. The result is a table with three columns: anomaly category, severity parameter, reconstruction error. 4. Plot the recall curve per category. For each category, plot the fraction of synthetic anomalies above the candidate threshold as a function of threshold value. The curve crosses 50% at the threshold where half the synthetic anomalies in that category are caught. The curve at the operationally relevant severity level (the level at which the anomaly is consequential to the broadcast operator) is the curve that matters. 5. Choose the threshold against the multi-category recall target. The threshold is the lowest value (highest sensitivity) at which the false-positive rate on normal frames remains within the operator’s capacity. Typical broadcast quality control deployments target a false-positive rate below 0.5% on normal frames — corresponding to the 99.5th percentile of the normal distribution — with the recall constraint requiring that the chosen threshold catches at least 90% of synthetic anomalies at the operationally relevant severity level for each category. If the constraints conflict (the threshold required for recall produces an unacceptable false-positive rate), the model must be retrained with a stronger normal-frame characterisation rather than the threshold being tuned further. 6. Validate against any available real anomaly data. Real anomalies, even in small numbers, are higher-fidelity calibration data than synthetic ones. As real anomalies accumulate from production operation, the threshold should be re-validated against them and adjusted if the recall on real anomalies diverges from the recall on synthetic anomalies of comparable severity. 7. Document the calibration as a release artefact. The chosen threshold, the validated false-positive rate at that threshold on the held-out normal set, and the recall rates per anomaly category at the chosen threshold should be released alongside the model. A model deployed without this calibration documentation has an undocumented operational behaviour. The synthetic data approach is particularly well-suited to broadcast quality control because the anomaly types of interest are well-characterised technically (encoding standard violations, format compliance failures, timing errors) and can be synthesised with known severity levels. The approach transfers less cleanly to anomaly detection problems where the anomalies are semantic rather than technical (a person performing an unusual action in a surveillance feed); for those, real anomaly accumulation matters more and the synthetic step is supplementary rather than primary. Latency and deployment architecture Broadcast and media pipelines typically require real-time or near-real-time anomaly detection — a detection delay measured in frames rather than seconds. The generative model’s inference cost must fit within the frame processing budget. For a standard autoencoder architecture, the encoder and decoder passes can be parallelised across multiple frames using batch inference. As a planning heuristic from our broadcast CV engagements (not a benchmarked industry rate): for a 30fps stream with a 5-frame latency budget, batch sizes of 5 frames are compatible with standard GPU inference throughput on frames of broadcast resolution. The key deployment decision is whether to run inference on-premise alongside the ingest pipeline or offload to a cloud-attached GPU. On-premise deployment eliminates network latency and keeps the content under the broadcast operator’s control; cloud deployment provides elastic capacity for burst traffic. The reconstruction difference map that the generative model produces as a byproduct of inference is directly useful as an operational output: operators see not just an anomaly score but a frame-level map showing where the deviation from normal is concentrated. This makes the alert immediately interpretable rather than requiring a separate attribution step. What remained imperfect The generative anomaly detection system met its operational targets, but two limitations were intrinsic to the approach and remain worth naming: First, the generative model’s reconstruction error is sensitive to deviation from normal in general, not specifically to the anomaly categories that matter operationally. A frame that is genuinely normal but unusually composed (a content type the broadcaster transmits rarely — a sponsor card, an emergency alert template) can produce elevated reconstruction error simply because it is rare in the training distribution, not because it is anomalous in the operational sense. The system handled this with a small allowlist of “rare-but-normal” frame templates that were excluded from alerting, but the allowlist required manual maintenance and was a recurring source of operator-facing edge cases. Second, the threshold calibration on synthetic anomalies is only as good as the synthesis fidelity. Real-world transmission artefacts have distributional properties (specific bitrate profiles, specific encoder behaviour under load) that synthetic perturbations approximate but do not reproduce. The recall figures from the synthetic calibration set were systematically optimistic against the recall measured on real anomaly events accumulated over the first six months of production. Threshold tightening based on the real-anomaly evidence reduced the gap, but the synthetic-vs-real recall delta remained the principal source of calibration uncertainty for the deployment. A Production CV Readiness Assessment for broadcast evaluates whether a planned anomaly detection deployment has a calibration procedure of the kind described here — or is relying on threshold tuning against an unmeasured anomaly population.