AI hiring is harder than software hiring for structural reasons Software engineering hiring has decades of established practices: coding interviews, system design, past project assessment. AI/ML hiring is less mature, the role definitions are blurred, and the skills that matter most are often the hardest to assess in an interview setting. Organizations underestimate these challenges and end up hiring for the wrong role, at the wrong level, for the wrong problem. Role definitions that actually matter The four core AI engineering roles are distinct and should not be treated as interchangeable: Role Core skill What they build What they don’t do ML Engineer Model training, deployment, optimization Production models, serving infrastructure Data pipelines from scratch, research Data Scientist Analysis, modeling, business translation Exploratory analysis, model prototypes Production deployment, infrastructure AI Researcher Novel algorithms, academic methods New techniques, papers Production systems typically MLOps Engineer Pipelines, monitoring, infrastructure Training/serving pipelines, monitoring Model development In our experience, the most common hiring mistake is expecting a data scientist to build production ML systems (that is ML engineering) or expecting an ML engineer to scope and prioritize business problems (that is data science). These are different skills that are rarely combined well in one person. What interviews typically miss Standard coding interviews assess the wrong things. LeetCode-style problems test algorithmic thinking but are poor predictors of ML engineering quality. An ML engineer who cannot implement a binary search tree in 20 minutes may still be excellent at building production serving infrastructure. Model accuracy is the wrong success metric. Interviewers commonly test whether a candidate can describe how to improve a model’s accuracy. Production ML success is more often about debugging data pipelines, handling distribution shift, and building reliable monitoring than model architecture choices. Communication with non-technical stakeholders. Data scientists in particular need to translate between technical findings and business decisions. This is rarely assessed. What actually predicts success In our experience across AI hiring engagements, the factors that most reliably predict success: Production vs research experience: Has the candidate deployed models that other people depend on? This surfaces the concerns (monitoring, fallback, drift) that academic or research experience does not. Debugging portfolio: Can they describe a real debugging problem they solved — not a textbook example, but a messy production failure? Data quality instincts: Do they ask about data quality early, or do they assume the data is clean? Opinion on trade-offs: Strong candidates have opinions about when to use different approaches. Candidates who answer “it depends” to everything without follow-through often lack depth. Organisational readiness factors Technical capability is necessary but not sufficient for successful AI deployment. Organisational readiness — the ability to define clear business problems, provide quality data, staff appropriate roles, and sustain commitment through the learning curve — determines whether technical capability translates into business value. We assess organisational readiness across four dimensions: data maturity (is the required data accessible, documented, and of known quality?), process clarity (can stakeholders define what success looks like in business terms?), technical foundation (does the team have the infrastructure and skills to support AI operations?), and leadership commitment (will the organisation sustain investment through the 6–18 months typically required to reach production value?). Teams that score low on data maturity but high on everything else should start with a data quality initiative, not a model-building project. Teams with strong data but unclear business objectives benefit more from a problem-definition workshop than from hiring ML engineers. The most expensive mistake is hiring a full AI team before confirming that the organisation can feed them useful work. Contractor vs full-time for AI talent For specific time-bounded projects (model training, dataset labeling, specific deployment), contractors with narrow expertise are often more cost-effective. For ongoing production ownership (model maintenance, monitoring, retraining), full-time hires provide continuity. The build internal AI team or hire consultants framework covers the broader organizational decision around when to build internal capability versus engage external expertise. What interview practices actually predict on-the-job AI performance? Traditional technical interviews — LeetCode-style algorithm problems, textbook ML theory questions, whiteboard system design — have low predictive validity for AI engineering roles. They test preparation for the interview format rather than ability to deliver AI projects. More predictive interview practices: take-home projects using realistic data, pair programming on a representative task, and portfolio review of previous work. Each tests different aspects of job performance. Take-home projects (4–8 hours, compensated) with a realistic dataset test the candidate’s end-to-end workflow: data exploration, feature engineering, model selection, evaluation methodology, and result communication. We provide a dataset and problem statement that mirrors the complexity of actual work, and evaluate the submission on methodology rigour (not just accuracy), code quality, and written explanation of decisions. Pair programming sessions (60–90 minutes) test real-time problem-solving and collaboration. We use a task from our actual codebase (anonymised if necessary): debugging a data pipeline issue, extending a model evaluation script, or implementing a new feature in the serving layer. This reveals the candidate’s ability to navigate unfamiliar code, ask useful questions, and produce working solutions under realistic conditions. Portfolio review evaluates the candidate’s ability to complete projects and communicate results. We look for evidence of end-to-end delivery (not just model training but deployment, monitoring, and iteration) and clear communication of technical decisions and tradeoffs. These practices require more interviewer time than standardised coding interviews but produce better hiring decisions. Our 6-month retention rate for AI engineers hired through this process is 92%, compared to an industry average below 80%.