The category determines the validation effort GAMP software categories are the classification framework that determines how much validation effort a computerised system requires in a pharmaceutical environment. The rule is simple in shape: more complex and more configurable software requires more thorough validation. The difficulty lies in applying it to modern software — particularly AI and machine learning, where a single deployed system mixes commercial frameworks, pre-trained model weights, custom training pipelines, and configured inference infrastructure. The ISPE’s GAMP 5 framework defines four active software categories (Category 2 was retired in the current edition). We treat the category not as a label but as a budget — it tells the validation team how much evidence to gather and where to focus risk-based testing. Category definitions and validation requirements Category Name Description Validation approach Examples 1 Infrastructure software Provides the computing environment Qualification — verify installation and configuration Operating systems, databases, virtualisation platforms, network firmware 3 Non-configured products Used as delivered, without configuration Verification of intended use, vendor documentation review Laboratory instruments with embedded firmware, standard calculators 4 Configured products Configured for the specific application Configuration verification, functional testing of configured features ERP, LIMS, MES, SCADA, CRM with workflow configuration 5 Custom applications Developed specifically for the intended use Full lifecycle validation — requirements, design, code review, testing Bespoke manufacturing control systems, custom analytics applications Category 1 systems require documented evidence that they are installed correctly and operate as expected — no detailed functional testing. Category 5 systems require full lifecycle documentation: user requirements, functional specifications, design specifications, code review, unit testing, integration testing, and user acceptance testing. Category 3 and 4 sit between those poles, with the configured surface area governing how much functional testing is appropriate. How is AI/ML software classified under GAMP 5 — Category 3, 4, 5, or something new? Machine learning systems do not fit cleanly into a single traditional category, and forcing them into one is the most common classification mistake we see when reviewing pharmaceutical AI projects. Consider a computer vision model for pharmaceutical quality inspection. The components decompose as follows: A commercial ML framework (PyTorch, TensorFlow) — Category 1 infrastructure. A pre-trained model architecture (ResNet, YOLO) used unmodified — Category 3; fine-tuned on company data — moves toward Category 5 in the fine-tuning component. A training pipeline written in-house that ingests facility data, applies augmentation, and produces model weights — Category 5 custom development. Inference hardware with configured drivers (CUDA, TensorRT, container runtime) — Category 4 configured product, on top of Category 1 infrastructure. The system spans multiple categories simultaneously. The GAMP 5 Second Edition resolves this by directing teams to classify based on the system’s overall risk to product quality rather than forcing each component into a single bucket. In practice, this is observed-pattern guidance from our own validation engagements: the training pipeline and the resulting model are classified as Category 5, while the underlying framework and infrastructure stay at their respective lower categories, and the validation plan is built around the highest-risk component rather than the lowest. The detailed methodology — including how the ISPE GAMP AI guidance reframes this classification when models retrain continuously — is covered in how to classify and validate AI/ML software under GAMP 5. Common classification mistakes A short diagnostic checklist for teams reviewing their own GAMP classification: Classifying commercial ML platforms as Category 3 across the board. A pre-trained model used without modification may genuinely be Category 3. The same model fine-tuned on company data has a Category 5 component — the fine-tuning step itself. The platform classification cannot absorb the training pipeline. Treating all custom code as Category 5. A Python script that reformats CSV data is technically custom software, but its risk to product quality may not warrant full Category 5 validation. GAMP 5 explicitly supports risk-proportional treatment; documentation depth follows risk, not file count. Ignoring infrastructure classification. The GPU hardware, CUDA drivers, and container runtime that an ML model runs on are Category 1 infrastructure. They still require qualification — a model validated on one GPU configuration is not automatically valid on another, because the inference behaviour can shift with kernel version, driver, or numerical precision setting. Static classification for systems that retrain. A model that updates weekly on new production data is not a snapshot. Treating it as a one-time Category 5 deliverable misses the entire continuous-validation problem the GAMP AI guidance was written to address. The practical decision Classification is not an academic exercise. It determines how much time, effort, and documentation the validation team must produce before the system can be used in production. Over-classification — treating every component as Category 5 — wastes resources and slows deployment. Under-classification — treating a custom-trained model as Category 3 — creates regulatory exposure that surfaces during inspection. The answer is accurate classification based on system architecture, a documented risk assessment, and the specific GAMP 5 guidance, followed by proportionate validation effort. We see two patterns consistently. Teams that start with the artifact (an architecture diagram, a data-flow map) and assign categories per component land closer to a defensible classification. Teams that start with a procurement label (“it’s a SaaS product, so Category 4”) tend to misclassify the parts that matter most for ML — the training data and the model weights themselves. How do you handle systems that span multiple GAMP categories? Modern pharmaceutical systems frequently combine components from multiple GAMP categories. An MES (Manufacturing Execution System) typically includes Category 3 infrastructure components (operating system, database), Category 4 configured software (the MES platform), and Category 5 custom components (site-specific business logic, integrations with other systems). Our validation approach for mixed-category systems: assess each component against its applicable category, but validate the system as an integrated whole. Component-level testing verifies individual functions. Integration testing verifies that components interact correctly. System-level testing (OQ, PQ) verifies end-to-end workflows that span multiple components. The risk assessment for mixed-category systems focuses on the interfaces between components, where failures are most likely. A misconfigured integration between the MES and the LIMS may result in incorrect test results being associated with the wrong batch — a high-impact failure that occurs at the interface rather than within either system individually. This is an observed pattern across our regulated-systems engagements rather than a benchmarked failure rate. We document mixed-category systems using a system architecture diagram that maps each component to its GAMP category and identifies the interfaces between components. This diagram becomes a key input to the risk assessment and a reference document for change control — when a change is proposed to one component, the diagram shows which interfaces, and therefore which integration tests, may be affected. FAQ