Streamlining Sorting and Counting Processes with AI

Counting and sorting items accurately is a foundational task in manufacturing, food processing, and logistics. For decades it was manual, error-prone, and slow. Computer vision changed that — and the choice of which vision approach to use now matters more than the choice to automate at all.

This article walks through how AI-driven sorting and counting actually works on a line: where rule-based machine vision earns its place, where learned models like YOLOv8 and YOLO-World fit, and how to build a working prototype that grades fruit by size and counts items by visual class. The intent is practical: enough texture to choose the right tool for an inspection problem, with code you can run against your own images.

For the broader decision between a packaged machine-vision system (Keyence/Cognex-style, deterministic, hardware-bound) and a custom computer vision deployment, see our decision framework for machine vision versus computer vision in manufacturing inspection. This piece sits one level below: it assumes you’ve decided computer vision is in scope and you want to know how the counting-and-sorting layer is built.

What does “AI sorting and counting” actually mean?

The phrase covers three loosely related capabilities:

Counting — detecting discrete objects in an image or video frame and producing a tally, optionally broken down by class.
Sorting — deciding, per object, which downstream path it should take (accepted, rejected, routed by size, routed by colour, routed by defect).
Grading — assigning a continuous or ordinal score to each object (size, ripeness, defect severity) that downstream logic uses for sorting or counting.

In practice these collapse into the same vision pipeline: detect objects, extract per-object attributes, then aggregate. The interesting question is which model does the detection and which features drive the sorting decision.

Decision surface: which vision approach fits which task?

Task shape	Best fit	Why
Fixed part, fixed lighting, binary pass/fail	Rule-based machine vision (template matching, blob analysis)	Deterministic, auditable, low latency — observed pattern across high-throughput lines
Variable appearance (organic produce, textiles), known classes	Trained detector (YOLOv8, Mask R-CNN)	Tolerates variation; needs labelled data and revalidation when the input distribution shifts
Open vocabulary, classes change often, low-volume sorting	Zero-shot detector (YOLO-World, OWL-ViT)	No retraining for new classes; lower precision than a fine-tuned model on the same task
Defect detection with rare positives	Anomaly detection on top of a detector	Pure classification fails when defect examples are scarce; reconstruction-based methods do better
Continuous attribute (size, area, count of sub-features)	Instance segmentation + geometric measurement	Bounding boxes lose shape information; masks let you compute area, perimeter, aspect ratio

This is observed pattern from sorting and grading deployments we’ve worked on; it is not a universal ranking. The “best fit” column shifts when throughput, regulatory audit requirements, or maintenance team skill change. A line that runs three SKUs forever does not need YOLO-World. A produce sorter that swaps between berries and stone fruit by season probably does.

The vision stack: from pixels to decisions

A working sorting-and-counting system has four layers, and each carries its own failure modes.

Image acquisition. Camera, lens, and lighting choices matter more than the model. Inconsistent lighting kills learned models faster than it kills rule-based ones, because the training distribution rarely covers every lighting state of a real factory. Backlit setups, telecentric lenses, and polarised illumination exist for reasons — they remove ambiguity at the optical layer rather than asking the model to solve it.

Detection. This is where YOLO-family models, Mask R-CNN, and zero-shot detectors live. For counting and sorting work the choice between bounding-box detection (YOLOv8 detection head) and instance segmentation (YOLOv8-seg, Mask R-CNN) is driven by whether you need shape, not just presence. Counting apples? A box is enough. Grading apples by area? You need the mask.

Attribute extraction. Given a detection, what do you measure? Mask area in calibrated units, dominant colour in HSV space, texture features, sub-region defect probability. This is where lightweight OpenCV operations slot in cleanly between the detector’s output and the sorting decision.

Decision and aggregation. The sorting logic itself is usually simple — a threshold, a sort, a class lookup. The harder part is aggregation across frames: not double-counting an apple that appears in three consecutive frames, handling occlusion, and reconciling counts when objects enter and leave the field of view. Tracking-by-detection (ByteTrack, BoT-SORT) is the standard answer here.

Where AI counting and sorting is deployed

The list below is illustrative rather than exhaustive — these are application shapes that recur across our engagements and in published case studies.

Automotive assembly

Fastener counting and defect detection on assembly lines: robots equipped with cameras run a detector (typically a CNN-based model in the YOLO family) over each station, classify fasteners by type, flag defects, and feed counts back to the line control system. The throughput requirement is high (low milliseconds per frame), the part catalogue is fixed, and audit traceability matters — which pushes the design toward a fine-tuned detector with deterministic post-processing rather than a zero-shot approach.

Adjacent reading: AI is reshaping the automotive industry.

Traffic management

IoT edge cameras count vehicles, classify them by type (car/truck/motorcycle/bus), and aggregate counts at the cloud. The vision stack is straightforward; the engineering challenge is edge deployment — running a detector on a power-constrained device with reliable connectivity. Edge inference reduces end-to-end latency and avoids streaming video to the cloud, which is a bandwidth and privacy win.

Adjacent reading: AI’s role in smart solutions for traffic and transportation.

Pharmaceutical packaging

Pill counting and inspection in blister packs and bottles, often with 360-degree multi-camera rigs. The regulatory regime (GMP, FDA validation) drives the design more than the vision technology does: every decision must be auditable, every model change must be revalidated, and false-negative tolerance is effectively zero. This pushes pharma inspection toward deterministic rule-based machine vision with learned models in a supporting role, not the other way around.

Adjacent reading: AI in pharmaceutics — automating meds.

Agriculture and livestock

Drone-mounted detectors that count and classify livestock over large areas, applying instance segmentation to distinguish individuals in close groups. The class catalogue is small but the visual variation (lighting, pose, partial occlusion) is large, which is exactly the regime where learned models beat rule-based pipelines.

Adjacent reading: smart farming and AI in livestock management.

Food processing

Counting and grading produce at speed. The example we work through below — apple grading by size and apple counting by ripeness — is a stripped-down version of the same shape. The food and beverage AI market is forecast to reach roughly USD 214 billion by 2033 (Precedence Research; directional industry-scale macro estimate, not an operational benchmark for any single deployment).

Adjacent reading: how the food industry is reconfigured by AI and edge computing.

Worked example: grading apples by size with YOLOv8-seg

The goal is to detect apples in a still image, compute the area of each apple’s mask in calibrated units, and surface only the largest 50% — a simple size-grade sort.

1. Install and import

pip install ultralytics opencv-contrib-python

from ultralytics import YOLO
import numpy as np
from pathlib import Path
import cv2

2. Load the instance segmentation model

YOLOv8 segmentation weights pretrained on COCO already know what an apple looks like, so no fine-tuning is needed for the prototype.

model = YOLO('yolov8n-seg.pt')

3. Calibrate pixels to physical units

This is the step most prototypes skip and then regret. Without calibration the “size” you compute is in pixels, which means nothing once camera distance or zoom changes. A reference object of known size in the frame fixes this; here we hard-code a ratio for clarity.

RATIO_PIXEL_TO_CM = 78          # 78 pixels per cm at this resolution
RATIO_PIXEL_TO_SQUARE_CM = 78 * 78

4. Run prediction and iterate over detections

results = model.predict('path/to/image')
area_list = []

for r in results:
    img = np.copy(r.orig_img)

    for c in r:
        b_mask = np.zeros(img.shape[:2], np.uint8)
        contour = c.masks.xy.pop().astype(np.int32).reshape(-1, 1, 2)
        cv2.drawContours(b_mask, [contour], -1, (255, 255, 255), cv2.FILLED)

        x1, y1, x2, y2 = c.boxes.xyxy.cpu().numpy().squeeze().astype(np.int32)
        roi = img[y1:y2, x1:x2]
        grey = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
        _, threshold = cv2.threshold(grey, 150, 255, cv2.THRESH_BINARY)
        contours, _ = cv2.findContours(threshold, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

        area_cm = sum(cv2.contourArea(cnt) / RATIO_PIXEL_TO_SQUARE_CM for cnt in contours)
        area_list.append(round(area_cm, 2))

        cv2.putText(img, f"Size: {round(area_cm, 2)}", (x1, y1),
                    cv2.FONT_HERSHEY_PLAIN, 1, (255, 0, 255), 2)

5. Sort and filter to the largest 50%

area_list.sort(reverse=True)
half_index = max(1, len(area_list) // 2)
largest_50_percent = area_list[:half_index]

A second pass over the detections renders only the apples whose area falls in largest_50_percent — that’s the sort decision, surfaced visually.

The thing worth noticing: the heavy lifting is done by the detector, but the sorting logic is plain Python and OpenCV. The model gives you per-object masks; the rest is arithmetic. This separation is what makes the pipeline auditable.

Worked example: counting apples by ripeness with YOLO-World

YOLO-World is a zero-shot detector — you specify the classes you want to find as text prompts, no fine-tuning required. It is not as accurate on a fixed task as a model fine-tuned on that task, but it is dramatically faster to deploy when classes change or training data is scarce.

from ultralytics import YOLOWorld
import supervision as sv
import cv2

model = YOLOWorld('yolov8s-world.pt')
model.set_classes(["Red Apple", "Green Apple"])

img = cv2.imread("Image.png")
results = model.predict(img)

detections = sv.Detections.from_ultralytics(results[0])
detection_list = detections.data['class_name']

red_count = sum(1 for item in detection_list if item == "Red Apple")
green_count = sum(1 for item in detection_list if item == "Green Apple")

font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, f'Ripe Apples: {red_count}',   (10, 30), font, 1, (255, 255, 0), 2, cv2.LINE_AA)
cv2.putText(img, f'Unripe Apples: {green_count}',(10, 60), font, 1, (255, 255, 0), 2, cv2.LINE_AA)

Two things to flag for anyone moving this from prototype to production:

Ripeness via colour alone is a crude proxy. Real produce sorting uses spectral imaging (near-infrared bands) because chlorophyll content is a better ripeness indicator than RGB hue. The colour-based version is a useful teaching example, not a deployment design.
Zero-shot accuracy on agricultural produce varies substantially by class, lighting, and background. Validate on your own images before assuming the prompt works.

Where this fits — and where it breaks

The two code walk-throughs above demonstrate the shape of an AI sorting-and-counting pipeline, but they do not represent what a production system looks like. A few honest boundaries:

Single-frame inference is not enough on a moving line. You need tracking to avoid double-counting and to handle occlusion. Add ByteTrack or BoT-SORT on top of the detector.
Lighting is half the problem. The model will look brilliant in the lab and fail at 3am when the warehouse lighting changes. Controlled illumination is not optional for high-precision sorting.
Validation must be against your own data. COCO-pretrained weights know apples but they do not know your apples, your conveyor, or your camera angle. Plan for a labelled validation set from day one.
Auditability is a design constraint, not a feature. In regulated industries (pharma, aerospace, food safety) every sorting decision must be traceable to a model version, a training set, and a validation report. Build the lineage system before you scale the model.

What TechnoLynx does in this space

We design and deploy custom computer vision systems for sorting, counting, and inspection problems where off-the-shelf machine vision is too rigid and a generic ML model is too imprecise. Our engagements typically cover the full stack — optical setup, model selection and training, edge or GPU-accelerated deployment, and the validation harness that proves the system meets the production requirement. We work with manufacturing, food processing, and logistics teams who have already automated the easy parts and need help with the visually ambiguous ones.

If you’re at the point of choosing between a packaged machine-vision vendor and a custom CV build, our decision framework for machine vision versus computer vision in manufacturing inspection is the right next read. If the decision is made and the question is how to architect the build, contact us.

FAQ

Sources

Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., & Shan, Y. (2024). YOLO-World: Real-Time Open-Vocabulary Object Detection. arXiv preprint.
Dinh, D.-L., Nguyen, H.-N., Thai, H.-T., & Le, K.-H. (2021). Towards AI-Based Traffic Counting System with Edge Computing. Journal of Advanced Transportation, 2021, 5551976.
Precedence Research. AI in Food and Beverages Market.
Skalski, P., & Gallagher, J. (2024). YOLO-World: Real-Time, Zero-Shot Object Detection. Roboflow Blog.
Ultralytics. YOLOv8 Documentation.