GxP Systems: What Qualifies and What the Classification Means for Software

Not every system in a pharma facility is a GxP system

A GxP system is a computerised system that creates, modifies, maintains, archives, retrieves, or transmits data affecting pharmaceutical product quality, patient safety, or data integrity. The classification is determined by what the system does — specifically, whether its data or outputs influence quality-affecting decisions — not by where it physically resides or who owns it. We see teams trip over this distinction often, treating physical location as a proxy for regulatory scope, and end up either over-validating routine IT or missing a quality-critical system entirely.

An MES (Manufacturing Execution System) controlling batch manufacturing is a GxP system. A visitor management kiosk in the facility lobby is not, even though both operate within the same facility perimeter. The distinction matters because GxP classification triggers specific regulatory obligations: validation, audit trails, change control, access controls, data integrity measures, and periodic reviews under EU GMP Annex 11 and 21 CFR Part 11.

What qualifies a system as GxP-relevant?

The qualifying question is not “is this system regulated?” but “does this system’s behaviour, data, or output reach a quality decision?” Four sub-questions decompose that cleanly, and any single “yes” puts the system in scope.

Question	If yes	If no
Does the system create or modify GMP/GLP/GCP/GDP data?	→ GxP system	→ Next question
Does the system control a GxP process (e.g., process parameters)?	→ GxP system	→ Next question
Does the system’s output influence quality decisions?	→ GxP system	→ Not GxP
Could system failure affect product quality or patient safety?	→ GxP system	→ Not GxP

Systems that answer “yes” to any of these questions fall under GxP scope and inherit regulatory obligations proportionate to their risk classification. Systems that answer “no” to all four are not GxP-relevant and do not require GxP validation — regardless of being deployed inside a pharmaceutical facility. In our experience, the third question is the one that catches misclassified systems most often: a dashboard that “just visualises” data can still influence quality decisions if a QA reviewer relies on it for batch disposition.

Common GxP systems by type

The table below is the working reference we use when scoping a regulatory analysis. The “Typical obligations” column is a starting point — final scope still depends on the system’s specific risk profile under GAMP 5 Second Edition.

System type	GxP relevance	Typical obligations
MES / batch record systems	High — direct quality impact	Full validation, audit trails, Part 11/Annex 11 compliance
LIMS	High — analytical data integrity	Full validation, instrument integration qualification
SCADA / process control	High — controls manufacturing parameters	IQ/OQ/PQ, alarm management validation
ERP (quality modules)	Medium — quality and materials management	Module-level validation, change control
Document management	Medium — SOPs, batch record templates	Workflow validation, electronic signatures
Environmental monitoring	High — cleanroom and sterile area data	Continuous monitoring validation, alert configuration
AI/ML models (quality-affecting)	High — decision support or automation	Continuous validation, performance monitoring, drift detection
AI/ML models (non-quality)	Low or none — operational efficiency	Proportionate assurance or no GxP validation needed

The bottom two rows are where current debates are concentrated. An ML model that flags potential out-of-specification results for QA review is quality-affecting even if a human signs off — its outputs shape what gets investigated. An ML model that optimises warehouse picking routes is not. The boundary is the decision path, not the technology stack.

The risk-based validation consequence

Once a system is classified as GxP-relevant, the next question is how much validation effort is appropriate. The GAMP 5 framework provides the answer through its software category classification and risk-based approach. A configured LIMS (Category 4) requires different validation activities than a custom AI model (Category 5). Both are GxP systems, but the validation effort is proportionate to the complexity and risk of each — this is the observed pattern across the engagements we have run, and it is also what GAMP 5 Second Edition (2022) and ISPE’s emerging AI guidance explicitly intend.

Understanding the full scope of GxP compliance requirements for software in pharma is the prerequisite for making accurate classification decisions. Over-classification wastes validation resources. Under-classification creates regulatory exposure. The goal is accurate classification followed by proportionate validation — which is exactly what the current regulatory frameworks expect.

How does system classification affect the software development lifecycle?

GxP classification determines the level of documentation, testing, and change control required throughout the software lifecycle. Non-GxP systems follow standard software engineering practices. GxP-regulated systems require formal validation activities at each lifecycle phase, with documented evidence that each requirement has been implemented and verified.

The classification decision cascades through the entire project: team structure (a Quality Assurance representative must be involved), documentation requirements (formal requirements specifications, design documents, and test protocols), change management (every change requires impact assessment and re-validation), and operational procedures (incident handling, backup verification, and periodic review follow documented SOPs).

For GAMP Category 4 systems (configured products), the validation burden is moderate — vendors provide baseline validation packages, and the implementing organisation validates the specific configuration. For Category 5 systems (custom applications), the full validation lifecycle applies: requirements specification, functional specification, design specification, code review, unit testing with documented evidence, integration testing, and user acceptance testing — all with formal sign-off and traceability. For ML components, the picture shifts again: training data, model versions, and retraining pipelines have to be controlled with the same rigour as code, which is why MLflow-style experiment tracking and DVC-style data versioning now appear inside validation packages we review.

We help clients right-size the validation effort based on accurate system classification. Over-classification (treating a Category 3 system as Category 5) wastes resources on unnecessary documentation. Under-classification (treating a Category 5 system as Category 3) creates regulatory risk when inspectors review the validation evidence. In our experience the initial classification assessment is a 2–3 day exercise that prevents weeks of misdirected validation work downstream — this is an observed planning heuristic from our pharma engagements, not a benchmarked rate.

The classification decision should be documented and reviewed by a cross-functional team including IT, Quality, and the system’s end users. Each stakeholder brings a different perspective on the system’s impact: IT understands the technical architecture, Quality understands the regulatory implications, and end users understand which business processes depend on the system. A classification decision made by IT alone risks underestimating the regulatory impact; a decision made by Quality alone risks overestimating the technical complexity. The collaborative assessment produces a classification that is both technically accurate and regulatorily defensible.

FAQ

How is AI/ML software classified under GAMP 5 — Category 3, 4, 5, or something new? Classical GAMP 5 categories were drafted for deterministic software, so ML systems do not map cleanly. In practice we treat a hosted/standard ML service as Category 3-like, a configured ML platform as Category 4, and a custom-trained model as Category 5 — but the GAMP 5 Second Edition and the ISPE AI guidance push toward risk-based classification on top of the category, because the model’s training data and retraining cadence matter as much as the code.

What does a GAMP 5 validation lifecycle look like for a continuously-retrained AI model? The V-model still applies, but each retrain is a controlled change. That means versioned training data, a documented retraining trigger (drift, schedule, new label set), automated regression tests against a frozen validation set, and a re-qualification step proportionate to the change’s risk — not a full re-validation each time.

Why is continuous validation needed for AI/ML, and how does it differ from one-shot validation? One-shot validation assumes the system’s behaviour does not change after release. ML systems can drift even without retraining, because input distributions shift. Continuous validation adds live performance monitoring, drift detection, and a defined intervention threshold to the standard validation package.

What evidence is required at each GAMP 5 V-model phase when the system under test is a model? URS and FS still apply, but DS expands to cover model architecture, training data lineage, and acceptance metrics. IQ verifies the runtime and model artefact integrity (hash). OQ tests model behaviour against the frozen validation set. PQ tests against live or representative production data with the operational monitoring in place.

How do GAMP 5’s risk-based controls map onto AI-specific risks (data drift, hallucination, training-data quality)? Each AI-specific risk maps to a control class GAMP 5 already recognises: data drift maps to ongoing performance qualification, training-data quality maps to data integrity controls (ALCOA+), and hallucination/over-confidence maps to output-review controls — typically a human-in-the-loop checkpoint for quality-affecting decisions.

Where does the ISPE GAMP AI guidance change the classic GAMP 5 categorisation for ML software? The ISPE AI guidance does not replace categories; it overlays a risk dimension that accounts for autonomy, learning behaviour, and decision impact. A Category 4 platform running an autonomous quality-affecting model carries a higher effective validation burden than a Category 4 platform running a static rules engine — the category is the same, the risk profile is not.

GxP Systems: What Qualifies and What the Classification Means for Software

Not every system in a pharma facility is a GxP system

What qualifies a system as GxP-relevant?

Common GxP systems by type

The risk-based validation consequence

How does system classification affect the software development lifecycle?

FAQ

GAMP Software Categories Explained: What Each Category Means for Pharma Validation

GAMP Software Categories: How to Classify Pharmaceutical Systems for Validation

GxP Validation Explained: What Pharma Teams Need to Know About Software Validation

Validation-Ready AI for GxP Operations in Pharma