CASE STUDY

Barcode Detection for Autonomous Retail

For a multinational startup operating autonomous shopping carts in North American grocery retail, we built a camera-based barcode detection and decoding pipeline that runs on video captured inside a moving cart. It reached 86.7% video-level accuracy against a Dynamsoft baseline of 80% on the same 30-video test set.

YOLOv7 Ensemble Decoding Multi-Frame Polling

The Challenge

Barcode detection in a cart context is different from barcode scanning at a checkout counter. The camera is moving. The product is moving. The angle is variable. Autofocus systems designed for static close-range scanning struggle at the distances and motion profiles of in-cart capture. Commercial barcode libraries — designed for clean, stable input — fail at useful rates under these conditions.

Autofocus lag, reflections, and motion blur.

In-cart camera footage includes frames with autofocus transitions, specular reflections on product packaging, and motion blur from both the cart and the customer's hand. Any single frame may be undecodable. A system that evaluates frames independently will miss barcodes that are present but momentarily obscured.

Commercial libraries have high precision but poor recall.

Libraries such as Pyzbar decode reliably when they decode — but they fail to detect the barcode region in the first place at cart-camera distances with degraded image quality. The precision/recall imbalance means they produce correct outputs rarely, rather than useful outputs reliably.

Barcode type diversity requires multiple decoding strategies.

A grocery store carries products with EAN-13, UPC-A, and other barcode formats in varying print quality and orientation. No single decoder performs consistently across the full type distribution — a robust pipeline needs multiple decoding strategies and a way to aggregate across them.

In-cart camera view of a packaged product showing a printed barcode under variable lighting

Project Timeline

From YOLO localisation to a multi-frame polling pipeline that beat commercial baselines

Commercial Baseline Characterisation

Ran Dynamsoft and Pyzbar against the 30-video test set to establish commercial baselines before building a custom pipeline. Dynamsoft achieved 80% video-level accuracy. Pyzbar demonstrated high precision, poor recall — it decoded correctly when it decoded, but rarely decoded in the first place at cart-camera conditions.

Trained a YOLOv7 model to localise the barcode region within each frame, served via a Flask localhost HTTP endpoint. Localisation narrows the region of interest before decoding, substantially improving recall for downstream decoders that would otherwise fail on the full frame.

YOLOv7 Detection Stage

Crop Postprocessing

Applied Hough-based rotation correction and image enhancement to the detected crop before decoding — improving decodability on frames where the barcode is skewed or has degraded contrast.

Assembled an ensemble decoder — Pyzbar, EAN-13 reader, and type-specific CNN decoders backed by a barcode database — applied to each localised crop. Each decoder contributes a candidate; the ensemble aggregates across strategies rather than failing when any single decoder cannot read the barcode.

Ensemble Decoder

Multi-Frame Polling Aggregation

Rather than returning the first decoded result, the pipeline aggregates decode attempts across the full video clip and returns the most probable prediction — weighted by decode frequency and confidence. This is the mechanism that converts unreliable per-frame decoding into reliable video-level accuracy.

The Solution

A four-stage pipeline: detect the barcode region, correct for rotation and image degradation, apply an ensemble of decoding strategies, and aggregate across frames before returning a result. Each stage addresses a distinct failure mode. Removing any one of them reduces accuracy.

YOLOv7 Detection + Crop Preprocessing

Pyzbar has excellent precision on a clean, aligned barcode crop and poor recall when searching the full frame. Putting a YOLOv7 localiser in front of it converts the recall problem into a detection problem the network is good at — and the same crop benefits every downstream decoder. Hough-based rotation correction and image enhancement are applied to the crop before decoding. The decomposition — localiser in front, decoders behind — recurs across our computer vision deployments.

Multi-Strategy Ensemble Decoder

No single decoder performs consistently across EAN-13, UPC-A, and the other barcode types present in a typical grocery catalogue. Pyzbar, an EAN-13 reader, and type-specific CNN decoders backed by a barcode database each contribute a candidate per crop. The pipeline selects the most probable result rather than failing when any individual strategy cannot decode — a structural advantage that compounds across the dataset.

Multi-Frame Polling Aggregation

Single-frame barcode decoding is unreliable at cart-camera distances — any given frame may be motion-blurred, partially occluded, or mid-autofocus. Aggregating decode candidates across the full clip, weighted by frequency and confidence, converts unreliable per-frame accuracy into reliable video-level accuracy — the metric that actually matters for checkout, where the cart has many seconds to recognise the product, not one frame.

Technical Specifications

Detection YOLOv7 served via Flask localhost HTTP
Crop correction Hough-based rotation + image enhancement
Ensemble decoders Pyzbar, EAN-13 reader, type-specific CNN decoders backed by barcode database
Deployment path Android / TFLite evaluated as feasible (YOLOv7 → TFLite conversion confirmed); edge hardware not finalised
Test set 30 videos, in-cart capture conditions
Evaluation metric Video-level detect+decode accuracy (not per-frame)
Failure modes covered Autofocus lag, specular reflections, motion blur, barcode type diversity
Dynamsoft baseline 80% video-level accuracy on the same 30-video test set
Pyzbar baseline High precision, poor recall — rarely decoded at cart-camera conditions
TechnoLynx pipeline 86.7% video-level accuracy; 93.3% at top-5 aggregation
Branded packaging photographed at cart-camera distance, illustrating the input the decoder pipeline operates on

The Outcome

The pipeline reached 86.7% video-level detect-and-decode accuracy on the 30-video test set, against a Dynamsoft commercial baseline of 80% measured on identical conditions. At top-5 aggregation it reached 93.3%. Three compounding changes drove the improvement: YOLOv7 localisation gave downstream decoders the clean crops they perform well on; the ensemble decoder handled barcode type diversity that no single library covers consistently; and multi-frame polling converted variable per-frame reliability into consistent video-level accuracy.

Two boundaries are worth naming. The 30-video test set is a meaningful comparison set, not a national deployment population — the gap to Dynamsoft is the directly measured one, on the same input. And the pipeline still depends on the cart having many seconds with the product in view; pure single-frame accuracy is not what this architecture optimises for. This workstream sits inside a broader multi-year smart retail engagement, providing a complementary product-identification modality alongside camera-based SKU recognition.

Key Achievements

86.7% video-level accuracy on 30-video test set — versus 80% Dynamsoft commercial baseline on identical conditions

93.3% accuracy at top-5 multi-frame aggregation

YOLOv7 detection stage improved downstream decoder recall by isolating barcode region before decoding

Ensemble decoder (Pyzbar + EAN-13 + CNN decoders) — no single decoder covers the full barcode-type distribution consistently; the ensemble handles what each alone cannot

Multi-frame polling aggregation — the mechanism that converts unreliable per-frame decoding into reliable video-level accuracy

Part of a Broader Perception Stack

Computer Vision Services

Our services feature expertise in classical computer vision, human-supervised system design for legal compliance, video pipeline optimisation with tools like FFmpeg, custom adaptable models, and explainable AI for ethical transparency.

Computer vision

Retail AI Solutions

We build production-ready CV systems for smart retail environments — in-cart perception, shelf analytics, SKU recognition, and security — all deployable on existing camera infrastructure without costly hardware upgrades.

Retail

GPU Performance Engineering

We deliver GPU-accelerated inference pipelines optimised for constrained edge hardware and high-throughput server deployments — profiling-led, architecture-first, with measurable performance outcomes.

GPU

Decoding Barcodes from Camera Video?

Reading barcodes from in-the-field video is a different problem from scanning at a checkout counter. Detection-before-decoding, decoder ensembles, and multi-frame aggregation usually decide whether the pipeline holds up at the conditions a moving camera actually delivers.