Infrastructure complexity should follow deployment maturity Teams building their first ML model often adopt enterprise-grade MLOps infrastructure before they have anything to maintain. The result is months of platform work before any model delivers value. The opposite extreme — no infrastructure discipline at all — creates fragility that becomes expensive to unwind once a model becomes business-critical. The workable middle is incremental: adopt infrastructure components in response to a demonstrated need, not in anticipation of one. In our experience working with teams shipping their first production model, the components that look most essential on a reference architecture diagram are usually the ones a small team gets the least value from in the first six months. What does MLOps infrastructure actually consist of? MLOps infrastructure decomposes into four functional layers: compute (for training and serving), storage (for data, features, artifacts, and model versions), orchestration (for running pipelines), and monitoring (for both data and model behaviour in production). Each layer has a sensible minimal form and a sensible mature form, and the gap between them is where most overbuilding happens. Training compute covers the GPU or CPU instances on which models are trained. Options range from local hardware through general-purpose cloud (AWS EC2 / SageMaker, GCP Vertex, Azure ML) to specialised GPU providers (Lambda, CoreWeave, RunPod). Spot instances are 60–80% cheaper than on-demand and are a fit for batch training where interruption is acceptable; they are not appropriate for production serving. Memory per model class and multi-GPU coordination (NCCL, FSDP, DeepSpeed) only become design constraints once models exceed single-GPU capacity. Serving compute runs inference. It has different shape than training: low and predictable latency, consistent availability, and a load profile that benefits from auto-scaling. A serving framework — FastAPI for lightweight cases, TorchServe or NVIDIA Triton when GPU batching and model versioning matter — sits on top of this layer. Storage, orchestration, and monitoring The storage layer covers four distinct concerns that are often collapsed prematurely into a single platform. Component What it stores Example tools Data lake / warehouse Raw and processed training data S3, GCS, Snowflake, BigQuery Feature store Computed, versioned feature values Feast, Tecton, Vertex Feature Store Artifact store Model weights, evaluation metrics, plots MLflow, S3, GCS Model registry Versioned models with metadata and deployment status MLflow Registry, SageMaker Orchestration runs and manages ML pipelines — training, evaluation, deployment. For most teams we work with, a workflow tool such as Prefect or Airflow, or the cloud-native equivalent (SageMaker Pipelines, Vertex Pipelines), is more than enough. Kubernetes plus Kubeflow Pipelines is appropriate when the team already runs on Kubernetes and has dedicated platform engineering support to maintain it; it is rarely justified on the strength of ML workloads alone. Monitoring is two layers in disguise. Data monitoring detects when input distributions shift away from training distributions — the failure mode that quietly degrades models long before any error rate moves. Tools include Evidently AI, WhyLabs, Great Expectations, and custom Grafana dashboards. Model monitoring tracks prediction distributions, error rates, and latency using standard observability stacks (Prometheus and Grafana, Datadog) augmented with ML-specific metrics. Both need to exist before a model is considered “in production” in any meaningful sense. An incremental adoption roadmap The decision-grade view is not “what is the complete MLOps stack” but “what does this team need to add next”. Across the engagements we run, the staging looks like this: Stage What to add What to defer First POC Local compute, S3 for artifacts, MLflow for experiment tracking Everything else First production model Cloud serving instance, MLflow Registry, basic Grafana dashboard Feature store, orchestration 3–5 production models Orchestration pipeline, data drift monitoring Kubernetes, feature store 10+ models Feature store, dedicated platform team Add incrementally The numbers attached to each stage are observed-pattern figures, not benchmarks. A team with one production model is usually fine on a single compute instance and a serving framework; setup time is two to three days for an experienced engineer and the running cost lands somewhere between $50 and $500 per month depending on whether the model needs GPU inference. By the time five to ten models are in production, the picture changes: a container orchestrator (Kubernetes or ECS), a model registry, an experiment tracker, automated training pipelines (Airflow or Prefect), and a monitoring dashboard become genuinely necessary. Setup is now two to four weeks of platform work and the platform cost moves into the $1,000–$5,000 per month range. Beyond ten models, MLOps stops being infrastructure and starts being a product. Feature stores, managed ML platforms (SageMaker, Vertex AI), drift detection, and A/B testing infrastructure all earn their place. So does a dedicated platform team — typically two to four engineers supporting ten to thirty data scientists. What are the common infrastructure anti-patterns? Three patterns account for most of the wasted effort we encounter. Over-investing in feature stores before having features. Feature stores solve a real problem — training-serving skew and feature reuse across teams — but they are operationally heavy. A team with one or two models and a handful of simple features extracts no measurable value from one, and pays the maintenance cost regardless. Kubernetes before scale. Kubernetes is the right answer for hundreds of services, multiple teams, and a strong platform engineering function. A team running three models with five engineers does not have the problem Kubernetes solves; they pay the complexity tax without the corresponding benefit. Building instead of buying. Cloud-managed MLOps services (SageMaker, Vertex AI) cover most infrastructure concerns for most teams. Building custom MLOps infrastructure is defensible only when cloud costs become prohibitive at sustained scale, or when there is a hard constraint the managed service cannot meet. For the deeper organisational arc — what it takes for a team that has never deployed a model to reach a sustainable production pipeline — see MLOps for organisations that have never operationalised a model. FAQ What does MLOps actually mean for an organisation that has never operationalised a model? For a first-time team, MLOps means the minimum set of practices and infrastructure that lets one model run reliably in production: a serving endpoint, request logging, health checks, alerting, and a manual but documented deployment process. Everything else — pipelines, registries, feature stores — is added when a second or third model makes the absence painful. Which MLOps capabilities does a first project genuinely need? A first project needs experiment tracking (MLflow is the common default), an artifact store (S3 or GCS), a serving instance with a framework such as FastAPI or Triton, and a basic monitoring dashboard. CI/CD for models, automated retraining, and a feature store are deferrable until there is more than one model to coordinate. Which MLOps tools are realistic for a first deployment, and which assume mature data engineering already in place? Realistic for a first deployment: MLflow, S3/GCS, a managed serving option (SageMaker endpoints, Vertex endpoints, or a single EC2 instance with FastAPI), and Grafana for monitoring. Tools that assume mature data engineering already exists include Feast and Tecton (feature stores), Kubeflow (full Kubernetes-native pipelines), and Tecton-style real-time feature platforms. What is the smallest viable MLOps stack that still produces a production-quality deployment? A compute instance with a serving framework, MLflow for experiment tracking and the model registry, S3 for artifacts, structured request logging, and a Grafana or Datadog dashboard with alerts. That stack is sufficient to run one production model with operational discipline, and it leaves room to grow without forcing a rewrite. How does MLOps differ from DevOps in the data-pipeline, drift, and rollback dimensions? DevOps assumes that a deployed artifact behaves the same way over time given the same code. MLOps cannot make that assumption: input distributions drift, label distributions drift, and model behaviour can change without any code change. The data-pipeline layer, drift monitoring, and the ability to roll back to a previous model version (not just previous code) are the three places MLOps extends the DevOps playbook. Why do most ML models never reach production, and which MLOps gaps cause that? The dominant gap is the absence of a serving and monitoring path from the notebook onward. Teams build models in a development environment with no plan for how the model will be packaged, deployed, observed, or updated, and by the time those questions are asked the model is treated as a research artifact rather than a candidate for production. A minimal MLOps spine — serving framework, registry, monitoring, deployment runbook — closes that gap before it forms.