AI vs Real Face: Anti-Spoofing, Liveness Detection, and When Custom CV Models Are Necessary

Pretrained face detection and recognition models are trained to answer the question: is this a human face, and whose face is it? They are not designed to answer a different question: is this a real, live face, or a presentation attack? The distinction matters enormously in deployment contexts where access control or identity verification is at stake.

A presentation attack is any attempt to defeat a face recognition system using a representation of a face rather than the live subject: a printed photograph, a digital display showing a face image or video, a 3D mask, or an AI-generated synthetic face image. Standard pretrained face recognition pipelines — including high-quality commercial APIs — fail against many of these attacks because they were never optimised to distinguish them. We see this gap repeatedly when teams treat anti-spoofing as a checkbox on top of a recognition stack rather than as a separate engineering problem with its own data and evaluation protocol.

For the broader question of when to build custom versus use off-the-shelf CV models, see the custom vs off-the-shelf decision framework.

Why does anti-spoofing break the pretrained-model assumption?

Recognition and liveness are two different classification problems that happen to share an input image. A pretrained face recognition model — typically a ResNet, ArcFace, or transformer backbone trained on millions of identity-labelled face crops — learns a metric over identity. It is explicitly trained to be invariant to lighting, expression, and minor occlusion. That invariance is precisely what an attacker exploits: a high-quality print or a phone-screen replay produces a face crop that the model is designed to treat as the same identity as the genuine subject.

This is an observed pattern across the deployments we have reviewed: anti-spoofing performance on internal evaluation sets is essentially uncorrelated with anti-spoofing performance against attacks the vendor never trained on. Treating the recognition model and the liveness model as a single integrated product hides this gap.

How AI-generated faces defeat detection

Generative models (GANs, diffusion models) produce face images that are indistinguishable to the human eye from real photographs — and in many cases, indistinguishable to pretrained face recognition models as well. The failure mode is specific:

Recognition model behaviour: a high-quality AI-generated face image of a person will produce feature embeddings close to that person’s genuine photos. If an attacker synthesises a face image closely resembling a target, the recognition model may match it against the enrolled identity.

Liveness detector behaviour: classical liveness detectors are trained on attacks available at the time of training — typically printed photos and replay attacks from consumer-grade displays. Diffusion-model-generated faces printed at high resolution or displayed on high-resolution screens may evade detectors trained on earlier attack distributions.

The cat-and-mouse dynamic: every published liveness detector reveals the feature space that the discriminator relies on, which enables adaptive attacks that modify the attack medium to avoid those features. This is not theoretical — it has been demonstrated repeatedly in the literature and in adversarial challenges.

Liveness detection: what it actually involves

Liveness detection (anti-spoofing in the face recognition domain) adds a check that the face presented is from a live person in the current moment, not a static representation. The main approaches:

Method	How It Works	Limitations
Texture analysis	Detects print artifacts, moire patterns, display refresh lines	Defeated by high-quality print/display; not robust to GANs
Depth estimation (passive)	Infers 3D structure from single image	Poor on flat surfaces; computationally expensive
Structured light (active)	Projects IR pattern, verifies 3D face geometry	Requires specific hardware; works well against flat attacks
Time-of-flight (active)	Measures depth using IR pulse timing	Reliable depth; hardware requirement limits deployment contexts
Challenge-response	Asks user to blink, turn head, smile	Defeated by video replay; adds UX friction
Remote PPG	Detects blood flow variation from subtle colour changes	Defeated by video replay with accurate colour reproduction
Multi-spectral	Detects skin-specific spectral properties	Reliable; requires specialised camera hardware

In our experience, passive texture-based liveness detection (software-only approaches built on OpenCV preprocessing and a PyTorch or ONNX-deployed classifier) provides meaningful protection against low-effort attacks — standard printed photos, basic video replay — but is not reliable against high-quality attacks. Active hardware approaches (structured light, time-of-flight) provide substantially stronger guarantees but require specific camera hardware that is not present in standard CCTV or smartphone front cameras.

Practical comparison: fine-tune the vendor model, or build custom?

Fine-tuning a pretrained anti-spoofing model suffices when:

The attack distribution is known and stable (e.g., defending against specific document-photo attacks in a KYC context)
The deployment hardware is fixed and well-characterised
Training data for the specific attack type is available
The deployment environment (lighting, distance, camera type) matches the fine-tuning domain

Custom model development is necessary when:

The attack space includes high-quality synthetic faces from generative models
The deployment requires hardware-independent liveness detection across diverse camera types
The adversarial threat model includes adaptive attackers who can probe the system
The deployment is high-assurance (financial access, border control, secure facility entry)
The available pretrained models fail evaluation on held-out attack samples from the deployment environment

The key diagnostic: test the off-the-shelf or fine-tuned model against the actual attack types you need to defend against, not against the benchmark datasets the model was evaluated on. NUAA, CASIA-FASD, and Replay-Attack are standard public benchmarks; performance on these does not predict performance against novel high-quality attacks, and observed patterns across deployments show consistent gaps between benchmark numbers and field results.

Custom model development for liveness detection

Building a custom liveness detection model requires:

Attack data collection: you cannot train a liveness detector without negative examples (spoofing attacks). This means deliberately generating attack samples — printing photos, recording replay videos, creating 3D masks if relevant to your threat model, and generating synthetic face images if your threat model includes them.

Domain-specific real data: liveness detectors are sensitive to the specific camera, lighting, and distance of the deployment. A model trained on data from a different camera or lighting condition will not transfer reliably. Collect real-face data under your deployment conditions.

Evaluation protocol: standard split evaluation (train/test from the same dataset) overestimates real-world performance. Cross-dataset evaluation — training on one dataset and testing on another — is a better proxy for deployment robustness. We use this as a gating check before any custom anti-spoofing model is considered ready for staging.

Attack-surface checklist for liveness system design

Printed photo attacks (various paper quality and print resolution)
Display replay attacks (phone, tablet, monitor)
High-resolution display replay attacks (4K, HDR)
AI-generated face image attacks (GAN and diffusion model outputs)
3D mask attacks (silicone, resin) — if relevant to threat model
Occlusion attacks (glasses, hats, masks reducing face detection confidence)
Lighting manipulation attacks (obscuring liveness cues)

The deployment reality

In our experience, teams underestimate the difficulty of liveness detection and overestimate the protection provided by standard “add anti-spoofing” integrations. Commercial anti-spoofing APIs provide meaningful protection in consumer identity verification contexts, where the attack population is predominantly low-effort. In higher-assurance contexts — access control, financial authentication, government use — the threat model requires hardware-enforced liveness or custom model development with ongoing adversarial evaluation.

The honest answer to “can we use off-the-shelf anti-spoofing for our access control system?” is: it depends on what you are protecting against and what the consequence of a successful attack is. For most office access control, commercial solutions are adequate. For secure facility access or high-value financial transactions, custom evaluation against your specific threat model — including a migration path from the initial integrated vendor solution to a custom model trained on your camera and lighting distribution — is necessary before deployment.