MLOps for Hospitals - Building a Robust Staff Tracking System (Part 1)

Introduction

A hospital staff tracking system fails or succeeds long before any model is trained. It fails in the cameras, in the data pipeline, in the storage layout, in the question of who owns the labels. By the time someone is fine-tuning an object detector, the constraints that decide whether the system reaches production are already set. That is the part of MLOps most teams underweight on a first deployment — and the part this article covers.

Different staff members at a hospital. Source: Kate Dewhirst

We work with healthcare and life-sciences teams who have a working computer vision prototype — a notebook, a few hand-labelled clips, a model that runs locally — and no way to operationalise it. The gap between that notebook and a system a hospital can actually rely on is wider than it looks. Part 1 walks through how we set up the MLOps environment and the data pipeline for a real-time staff tracking system. Part 2 covers model training, deployment, and monitoring.

The global computer vision in healthcare market is projected to reach $22.2 billion by 2030 (Pragma Market Research, 2024, market-direction) — a directional industry-scale figure, not an operational benchmark. What matters at the project level is much narrower: can a specific hospital, with a specific camera fleet and a specific clinical workflow, get a model into production and keep it there.

What MLOps actually means for a first deployment

MLOps is the practice that combines machine learning with the operational discipline of DevOps. The phrase is wide; for a first hospital deployment, only a few capabilities matter, and the rest is overengineering you will regret paying for.

In our experience across applied vision engagements, the gap that kills first deployments is not the modelling — it is the absence of a repeatable data pipeline. A team can train a strong model from a curated clip set on a workstation. The same team cannot reliably retrain it next month when the cameras have moved, the lighting has changed, and the labels are stale. The MLOps environment exists to make that second iteration cheap.

For a first hospital project, the minimum viable stack is:

A way to ingest video continuously and version the resulting frames.
Object storage cheap enough to retain raw footage at clinical retention windows.
A processing layer that can run preprocessing reproducibly — same code, same inputs, same outputs.
A model registry so the model serving the floor today is identifiable, attributable, and rollback-able.
Monitoring that watches the input data, not just the model output.

Everything else — feature stores, automated retraining triggers, multi-region failover — is a deferred decision. Adding them to a first project is the most common observed pattern of overengineered MLOps stacks that never reach production.

How does MLOps differ from DevOps in a hospital setting?

DevOps assumes the artefact under management is code, and code is deterministic. MLOps assumes the artefact is code plus a dataset plus a trained weights file, and the dataset drifts. That difference shows up in three places that matter for hospitals.

First, rollback. Rolling back a service in DevOps is a redeployment of an older container. Rolling back an ML model also requires retrieving the dataset and training config that produced it — otherwise the rollback is reproducible only by accident. Second, monitoring. DevOps monitors latency, error rates, and uptime. MLOps adds input drift: the distribution of camera frames at 03:00 on a Sunday is not the distribution at 14:00 on a Tuesday, and a tracker tuned to one can degrade silently on the other. Third, data governance. A hospital’s data is regulated; the MLOps pipeline has to make that auditable from raw frame to prediction, not just from container build to deploy.

Choosing infrastructure that fits the hospital, not the demo

The three large cloud platforms — Amazon Web Services, Google Cloud Platform, Microsoft Azure — all offer enough to run this kind of system. AWS provides Amazon SageMaker for training and serving, S3 for object storage, and RDS for the structured metadata. GCP offers Vertex AI and Cloud Storage. Azure offers Azure Machine Learning and Blob Storage. Technically, any of the three works.

The decision is rarely technical. It is decided by what the hospital already runs. If the hospital’s electronic health record system is hosted on Azure, integration costs and procurement friction make Azure the default. If the IT team already operates an AWS landing zone with the right compliance controls, AWS wins. Choosing a platform because its ML tooling is marginally better, against the IT department’s existing investment, is a decision we have watched fail repeatedly.

Cloud machine learning platforms. Source: Datagrom

The minimum viable stack we deploy on a first project:

Layer	Purpose	Realistic choice
Object storage	Raw video, processed frames	S3, GCS, or Azure Blob — whichever matches IT
Structured DB	Camera registry, staff metadata, schedules	RDS, Cloud SQL, or Azure SQL — managed
Processing	Frame extraction, preprocessing, feature prep	Containers on the platform’s managed compute
Model registry	Versioned weights with training lineage	MLflow self-hosted, or the cloud-native equivalent
Orchestration	Pipeline runs, retraining jobs	A single workflow engine — Airflow, Argo, or the cloud-native one

Notice what is absent: a feature store, a separate experiment tracker, a dedicated drift-detection service. Those become useful on the second or third deployment, when the team can articulate a problem the existing stack does not solve. Adding them on day one is paying for tools you cannot yet use.

The data pipeline before the model

For a staff tracking system, the data pipeline carries more weight than the model. The model is an interchangeable component — a YOLO variant, a tracking-by-detection stack built on OpenCV and a re-identification network, possibly something more recent. The pipeline that feeds it is the project’s permanent infrastructure.

Staff being tracked in a hospital. Source: Houston Methodist

Data requirements

The system needs continuous video footage from cameras placed deliberately — not opportunistically. Each camera has a location identifier, a known field of view, and a timestamp synchronised across the fleet. Staff metadata — roles, departments, shift schedules — sits alongside the video in a structured database. Without that metadata, the tracker can find a person but cannot tell you whether they should be in that corridor.

Two specifics that first projects routinely get wrong:

Timestamps must be synchronised to a single source, not to each camera’s local clock. NTP drift across a 200-camera fleet produces tracking errors that look like model failures and waste weeks of debugging.
Camera placement is a permanent decision. Moving a camera retrospectively invalidates the labels collected against its previous field of view. Treat the camera registry as ground truth and version it.

Pipeline stages

The pipeline runs in four stages:

Ingest. Cameras stream to an edge node or directly to cloud object storage. Frames are extracted at a fixed rate — typically 5–10 fps for tracking, not the native 25–30 fps. Storing every frame is wasteful and runs into retention-cost problems quickly.
Preprocess. Resize to the model’s expected input size, convert colour space, optionally normalise lighting. Output is a versioned dataset, not a transient stream.
Store. Raw footage in object storage with a clinical retention policy. Processed frames and extracted features in a separate prefix with shorter retention. Metadata in the managed database.
Visualise. Tableau, Power BI, or a custom dashboard surfaces locations and movement patterns to the people who allocate staff. Without this layer the system produces telemetry no one reads.

A minimal preprocessing routine in Python, using OpenCV:



import cv2
import os

output_dir = 'processed_frames'
os.makedirs(output_dir, exist_ok=True)

cap = cv2.VideoCapture('path/to/video/file.mp4')

frame_count = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    processed_frame = cv2.resize(frame, (640, 480))
    processed_frame = cv2.cvtColor(processed_frame, cv2.COLOR_BGR2GRAY)

    frame_filename = os.path.join(output_dir, f'frame_{frame_count:04d}.jpg')
    cv2.imwrite(frame_filename, processed_frame)

    frame_count += 1

cap.release()
cv2.destroyAllWindows()

This script is illustrative — production code wraps it in a container, parameterises the input source, writes to object storage rather than a local directory, and emits a manifest that the model registry can reference. The substantive point is that preprocessing is code, versioned alongside the model that consumes its output. If the preprocessing changes, the model’s evaluation is no longer comparable to the previous run.

Why most ML models never reach production

The widely cited claim that most ML models never reach production is genuinely true — and the reasons cluster around MLOps gaps, not modelling weaknesses. The notebook-to-production gap is structural. A notebook is an exploratory environment optimised for the author’s current question. Production is a different environment optimised for reproducibility, observability, and rollback. Bridging the two requires the data pipeline, the storage layer, the model registry, and the monitoring described above to exist before the model is finalised.

The observed pattern across our engagements is that the first deployment costs disproportionately more than the second. The first deployment pays for the entire MLOps stack. The second deployment reuses it. Hospitals that operationalise one model successfully tend to operationalise the next one in a fraction of the time, because the infrastructure decision is already made. That payoff is the real return on the first project.

How TechnoLynx works on first MLOps deployments

We engage with healthcare and life-sciences teams that have a prototype and need a production system. Our work covers computer vision, GPU acceleration, and the MLOps scaffolding that makes a vision model maintainable. We bias toward smaller stacks on first deployments — fewer moving parts, clearer ownership boundaries, less to retrain when the team’s understanding of the problem improves. If a first MLOps project is on your roadmap, get in touch.

FAQ

What does MLOps actually mean for an organisation that has never operationalised a model?

It means building the infrastructure that lets a model live outside a notebook: versioned data, reproducible preprocessing, a model registry, and monitoring. On a first project the goal is not sophistication — it is a repeatable path from raw data to a deployed prediction.

Which MLOps capabilities does a first project genuinely need, and which are overengineering?

Needed: object storage, a processing layer, a model registry, basic input and output monitoring, and an orchestration tool. Overengineering on a first project: feature stores, automated retraining triggers, separate experiment trackers, multi-region failover. Defer until a concrete need names them.

Which MLOps tools and frameworks are realistic for a first deployment, and which assume mature data engineering?

Realistic: managed cloud services on whichever platform the hospital’s IT already operates, plus MLflow or the cloud-native registry, plus a single workflow engine. Assumes mature data engineering: Kubeflow on self-managed Kubernetes, custom feature stores, bespoke drift-detection stacks. Pick those only when the team has the operations capacity behind them.

What is the smallest viable MLOps stack that still produces a production-quality deployment?

Object storage, a managed structured database, a containerised preprocessing and serving layer, a model registry, and a dashboard. Five components, all managed where possible. Anything smaller cannot rollback or audit; anything larger is paying for unused capacity.

How does MLOps differ from DevOps in the data-pipeline, drift, and rollback dimensions?

DevOps manages code; MLOps manages code plus data plus weights. Rollback in MLOps requires retrieving the training dataset and config, not just the previous container. Drift monitoring tracks input distribution, not only output errors. Data governance has to be auditable from raw input to prediction, which DevOps pipelines do not handle natively.

Why do most ML models never reach production, and which MLOps gaps cause that?

The recurring causes are absent data pipelines, absent reproducibility, and absent ownership of the production environment. The model is the easy part; the surrounding infrastructure is what determines whether the model survives contact with the hospital’s actual operating conditions.

Conclusion

Setting up the MLOps environment and the data pipeline is the part of a hospital staff tracking project that determines whether anything else happens. Done well, the second deployment costs a fraction of the first. Done badly, the model stays in the notebook and the business case never materialises.

The next article in this series — Part 2: Model Training, Deployment, and Monitoring — covers what runs on top of this foundation: training the staff tracking model, deploying it, and keeping it useful as the hospital and its data change.

Sources for the images

Dedmari, M. A. (2021) ‘Demystifying MLOps: part 1’, NetApp, 17 June.
Dewhirst, K. (2017) ‘Professional Staff Leaves of Absence – is your hospital prepared?’, Kate Dewhirst Health Law, 2 October.
McCalley, E. (2020) ‘Released in 2020: Meet AWS SageMaker Studio, Azure Machine Learning Studio, & GCP AI Platform’, Datagrom, 12 October.
Osztrogonacz, P., Chinnadurai, P. and Lumsden, A.B. (2023) ‘Emerging applications for computer vision and Artificial Intelligence in management of the cardiovascular patient’, Methodist DeBakey Cardiovascular Journal, 19(4), pp. 17–23.

References

Pragma Market Research (2024) ‘Global Computer Vision in Healthcare Market — Forecast to 2030’, PMR Research.
Nayak, Y. (2021) ‘A Gentle Introduction to MLOps’, Towards Data Science.
Smith, A. (2020) ‘End-to-end Machine Learning Platforms Compared’, Towards Data Science, 13 July.