Data Science Team Structure for AI Projects

One team structure does not fit all AI projects

Organizations stand up data science teams with headcount benchmarked against tech companies or consulting firms, without accounting for the actual scope and nature of their AI work. A team of 8 attempting a proof-of-concept moves slowly. A team of 2 maintaining 12 production models is overwhelmed. The right structure depends on what the team is actually doing — exploring, shipping a first model, or maintaining a portfolio.

This piece sits underneath the broader build internal AI team or hire AI consultants decision. Once that build-vs-buy call has been made in favour of internal capability, the next question is what that capability actually looks like in terms of roles, headcount, and phase-by-phase staffing.

Core roles in a data science team

Role	Primary responsibility	Necessary for
Data Engineer	Data pipelines, data warehouse, feature engineering infrastructure	Production ML, reliable training data
ML Engineer	Model training, optimisation, deployment (PyTorch, ONNX, TensorRT)	Production model development
Data Scientist	Analysis, exploratory modelling, business translation	Problem definition, prototyping
MLOps Engineer	Training and serving pipelines, monitoring, infrastructure (Kubernetes, MLflow)	Multiple production models
Domain Expert	Business and domain knowledge	Ensuring models solve the right problem

These are not interchangeable. An ML engineer who is excellent at building serving infrastructure may be poor at translating business problems. A data scientist who is excellent at exploratory analysis may have no production deployment experience. Treating the five roles as a single fungible pool of “AI people” is the structural error behind most stalled programmes we see.

Team sizing by project stage

The most useful sizing heuristic is not headcount-per-revenue or headcount-per-engineer; it is headcount per stage of maturity. A team that fits a first proof-of-concept will be wrong, in either direction, for a portfolio of production models.

Prototype or POC (1–3 months). Minimum viable: 1 data scientist plus 1 data engineer, with domain-expert involvement that may be a business stakeholder rather than a hire. No MLOps engineer is needed here — production infrastructure is not yet the goal, and adding that role early dilutes focus.

First production model. Add 1 ML engineer for deployment. Total: 3–4 people. An MLOps engineer is optional if the ML engineer can handle basic pipeline and monitoring needs. This is where teams typically discover whether their data engineer was actually a data engineer or a senior analyst with a Python notebook.

Multiple production models (three or more). Add a dedicated MLOps engineer. Infrastructure complexity now justifies specialisation: training pipelines, model registry, drift monitoring, rollback procedures. Total: 5–7 people for a team maintaining 3–5 models.

Platform (10+ models, multiple consuming teams). Requires a platform team (MLOps focused) separated from development teams. ML engineers and data scientists sit in product-aligned squads; the platform team provides tooling, standards, and shared services like feature stores and serving infrastructure.

What are the common skill gaps?

The gaps that cause the most production problems, across our engagements:

No data engineering. Data scientists struggle to build reliable training pipelines. Data quality remains the primary cause of model failure in production, and a notebook-bound data scientist cannot fix it alone.
No MLOps. ML engineers deploy models but have no monitoring. Models degrade silently for months before someone notices.
No domain expert. Models optimise the wrong thing because the business problem was not correctly specified, and the team had no one in the room to push back on the framing.
ML engineer vs data scientist confusion. Hiring one when you need the other, often because the job description was written by someone who treats the titles as synonyms.

How do you structure a team for different project phases?

AI projects have distinct phases with different team structure requirements. Using the same team throughout either under-resources early phases or over-resources later ones.

Exploration phase (4–8 weeks). A small team of 1–2 senior data scientists explores the data, establishes baselines, and determines whether the problem is tractable. This phase requires technical breadth — the ability to try multiple approaches quickly with PyTorch, scikit-learn, or whatever fits — rather than depth. Adding more people at this stage slows progress through coordination overhead.

Development phase (8–16 weeks). The team expands to include ML engineers alongside data scientists. Data scientists develop model architectures and training procedures. ML engineers build the data pipelines, training infrastructure, and serving systems that will eventually run in production. The two roles work in parallel but with different focuses.

Deployment phase (4–8 weeks). ML engineers and platform engineers dominate. The model is containerised (Docker), deployed to production infrastructure (often Kubernetes), integrated with monitoring, and stress-tested under realistic load — not peak burst. Data scientists remain available for model tuning and quality evaluation but are not the primary contributors.

Operations phase (ongoing). A smaller team maintains the production system: monitoring performance, investigating alerts, retraining models, and deploying updates. One ML engineer can typically maintain 3–5 production models, depending on complexity and retraining frequency. This is an observed-pattern range across our engagements, not a benchmarked rate; the actual number drops sharply if the models share little infrastructure or use very different serving stacks.

The mistake we see most often: staffing the exploration phase with a large team (5+ people) that produces redundant exploratory work, then under-staffing the deployment phase because the budget was consumed during exploration. Right-sizing each phase keeps the project on budget while ensuring adequate resources at every stage.

Organisational readiness factors

Technical capability is necessary but not sufficient for successful AI deployment. Organisational readiness — the ability to define clear business problems, provide quality data, staff appropriate roles, and sustain commitment through the learning curve — determines whether technical capability translates into business value.

We assess organisational readiness across four dimensions: data maturity (is the required data accessible, documented, and of known quality?), process clarity (can stakeholders define what success looks like in business terms?), technical foundation (does the team have the infrastructure and skills to support AI operations?), and leadership commitment (will the organisation sustain investment through the 6–18 months typically required to reach production value? — an observed pattern across our engagements, not a guarantee for any one programme).

Teams that score low on data maturity but high on everything else should start with a data quality initiative, not a model-building project. Teams with strong data but unclear business objectives benefit more from a problem-definition workshop than from hiring ML engineers. The most expensive mistake is hiring a full AI team before confirming that the organisation can feed them useful work.

Build vs outsource

Not every role needs to be a full-time hire. Data engineering and MLOps are often candidates for outsourcing early — hire a specialist firm or contractor — because the practices are relatively standardised across organisations: pipelines, registries, monitoring, CI/CD for models. ML engineering for production-critical models usually benefits from internal ownership, since the failure modes are specific to the model and the data. Data science is most effective as internal capability where domain knowledge matters and where the iteration loop with stakeholders is tight.

The trap is staff-augmentation by default: external headcount with internal technical direction, which gives you external cost without external accountability. For the framework that sits behind this choice, see build internal AI team or hire AI consultants.

FAQ

When should we build an internal AI team versus hire AI consultants?

Build internally when the capability is strategic, the timeline allows for a 6–18 month ramp, and the work will continue past the first project. Engage consultants when outcome ownership is what you are buying, the scope is bounded, and capability transfer is part of the engagement.

Which capabilities require permanent in-house ownership, and which are safe to outsource?

Domain-aware data science and production-critical ML engineering generally need to be in-house. Data engineering and MLOps are more standardised and can start as outsourced specialist work, then be brought in-house once the volume of production models justifies it.

How does the build-vs-hire decision shift as the organisation matures from first project to portfolio of AI work?

Early on, consultants compress time-to-first-model and reduce the cost of being wrong about scope. As the portfolio grows, the economics flip: a permanent team amortises across many models, and the institutional knowledge of your data becomes the differentiator that no external firm can replicate quickly.

What is the realistic cost of building an internal AI team — hiring, retention, ramp time — versus engaging consultants?

The fully loaded cost of a five-person internal team — salaries, retention, tooling, plus the 6–18 month ramp before reliable production value — is typically larger than a focused consulting engagement on a single problem. The internal team wins on cost only at scale, once it carries multiple projects in parallel.

How do we structure a hybrid model so consultants augment rather than replace internal capability?

Pair every external contributor with an internal counterpart who owns the artefact after handover. Define capability-transfer milestones in the statement of work, not just deliverables. Avoid arrangements where consultants own the model code without an internal engineer who can modify and redeploy it.

Which warning signs indicate that an outsourced engagement is creating long-term dependency instead of transferring skill?

The internal team cannot redeploy or retrain the model without the consultants present. Documentation describes what was built but not why. Renewal conversations focus on extending scope rather than narrowing it. If any of these appear, the engagement has slid into staff-augmentation regardless of how it was sold.