AI vs Real Face: Anti-Spoofing, Liveness Detection, and When Custom CV Models Are Necessary

When synthetic faces defeat pretrained detectors: anti-spoofing challenges, liveness detection requirements, and when custom CV models are unavoidable.

AI vs Real Face: Anti-Spoofing, Liveness Detection, and When Custom CV Models Are Necessary
Written by TechnoLynx Published on 07 May 2026

Pretrained face detection and recognition models are trained to answer the question: is this a human face, and whose face is it? They are not designed to answer a different question: is this a real, live face, or a presentation attack? The distinction matters enormously in deployment contexts where access control or identity verification is at stake.

A presentation attack is any attempt to defeat a face recognition system using a representation of a face rather than the live subject: a printed photograph, a digital display showing a face image or video, a 3D mask, or an AI-generated synthetic face image. Standard pretrained face recognition pipelines — including high-quality commercial APIs — fail against many of these attacks because they were never optimised to distinguish them. We see this gap repeatedly when teams treat anti-spoofing as a checkbox on top of a recognition stack rather than as a separate engineering problem with its own data and evaluation protocol.

For the broader question of when to build custom versus use off-the-shelf CV models, see the custom vs off-the-shelf decision framework.

Why does anti-spoofing break the pretrained-model assumption?

Recognition and liveness are two different classification problems that happen to share an input image. A pretrained face recognition model — typically a ResNet, ArcFace, or transformer backbone trained on millions of identity-labelled face crops — learns a metric over identity. It is explicitly trained to be invariant to lighting, expression, and minor occlusion. That invariance is precisely what an attacker exploits: a high-quality print or a phone-screen replay produces a face crop that the model is designed to treat as the same identity as the genuine subject.

This is an observed pattern across the deployments we have reviewed: anti-spoofing performance on internal evaluation sets is essentially uncorrelated with anti-spoofing performance against attacks the vendor never trained on. Treating the recognition model and the liveness model as a single integrated product hides this gap.

How AI-generated faces defeat detection

Generative models (GANs, diffusion models) produce face images that are indistinguishable to the human eye from real photographs — and in many cases, indistinguishable to pretrained face recognition models as well. The failure mode is specific:

Recognition model behaviour: a high-quality AI-generated face image of a person will produce feature embeddings close to that person’s genuine photos. If an attacker synthesises a face image closely resembling a target, the recognition model may match it against the enrolled identity.

Liveness detector behaviour: classical liveness detectors are trained on attacks available at the time of training — typically printed photos and replay attacks from consumer-grade displays. Diffusion-model-generated faces printed at high resolution or displayed on high-resolution screens may evade detectors trained on earlier attack distributions.

The cat-and-mouse dynamic: every published liveness detector reveals the feature space that the discriminator relies on, which enables adaptive attacks that modify the attack medium to avoid those features. This is not theoretical — it has been demonstrated repeatedly in the literature and in adversarial challenges.

Liveness detection: what it actually involves

Liveness detection (anti-spoofing in the face recognition domain) adds a check that the face presented is from a live person in the current moment, not a static representation. The main approaches:

Method How It Works Limitations
Texture analysis Detects print artifacts, moire patterns, display refresh lines Defeated by high-quality print/display; not robust to GANs
Depth estimation (passive) Infers 3D structure from single image Poor on flat surfaces; computationally expensive
Structured light (active) Projects IR pattern, verifies 3D face geometry Requires specific hardware; works well against flat attacks
Time-of-flight (active) Measures depth using IR pulse timing Reliable depth; hardware requirement limits deployment contexts
Challenge-response Asks user to blink, turn head, smile Defeated by video replay; adds UX friction
Remote PPG Detects blood flow variation from subtle colour changes Defeated by video replay with accurate colour reproduction
Multi-spectral Detects skin-specific spectral properties Reliable; requires specialised camera hardware

In our experience, passive texture-based liveness detection (software-only approaches built on OpenCV preprocessing and a PyTorch or ONNX-deployed classifier) provides meaningful protection against low-effort attacks — standard printed photos, basic video replay — but is not reliable against high-quality attacks. Active hardware approaches (structured light, time-of-flight) provide substantially stronger guarantees but require specific camera hardware that is not present in standard CCTV or smartphone front cameras.

Practical comparison: fine-tune the vendor model, or build custom?

Fine-tuning a pretrained anti-spoofing model suffices when:

  • The attack distribution is known and stable (e.g., defending against specific document-photo attacks in a KYC context)
  • The deployment hardware is fixed and well-characterised
  • Training data for the specific attack type is available
  • The deployment environment (lighting, distance, camera type) matches the fine-tuning domain

Custom model development is necessary when:

  • The attack space includes high-quality synthetic faces from generative models
  • The deployment requires hardware-independent liveness detection across diverse camera types
  • The adversarial threat model includes adaptive attackers who can probe the system
  • The deployment is high-assurance (financial access, border control, secure facility entry)
  • The available pretrained models fail evaluation on held-out attack samples from the deployment environment

The key diagnostic: test the off-the-shelf or fine-tuned model against the actual attack types you need to defend against, not against the benchmark datasets the model was evaluated on. NUAA, CASIA-FASD, and Replay-Attack are standard public benchmarks; performance on these does not predict performance against novel high-quality attacks, and observed patterns across deployments show consistent gaps between benchmark numbers and field results.

Custom model development for liveness detection

Building a custom liveness detection model requires:

Attack data collection: you cannot train a liveness detector without negative examples (spoofing attacks). This means deliberately generating attack samples — printing photos, recording replay videos, creating 3D masks if relevant to your threat model, and generating synthetic face images if your threat model includes them.

Domain-specific real data: liveness detectors are sensitive to the specific camera, lighting, and distance of the deployment. A model trained on data from a different camera or lighting condition will not transfer reliably. Collect real-face data under your deployment conditions.

Evaluation protocol: standard split evaluation (train/test from the same dataset) overestimates real-world performance. Cross-dataset evaluation — training on one dataset and testing on another — is a better proxy for deployment robustness. We use this as a gating check before any custom anti-spoofing model is considered ready for staging.

Attack-surface checklist for liveness system design

  • Printed photo attacks (various paper quality and print resolution)
  • Display replay attacks (phone, tablet, monitor)
  • High-resolution display replay attacks (4K, HDR)
  • AI-generated face image attacks (GAN and diffusion model outputs)
  • 3D mask attacks (silicone, resin) — if relevant to threat model
  • Occlusion attacks (glasses, hats, masks reducing face detection confidence)
  • Lighting manipulation attacks (obscuring liveness cues)

The deployment reality

In our experience, teams underestimate the difficulty of liveness detection and overestimate the protection provided by standard “add anti-spoofing” integrations. Commercial anti-spoofing APIs provide meaningful protection in consumer identity verification contexts, where the attack population is predominantly low-effort. In higher-assurance contexts — access control, financial authentication, government use — the threat model requires hardware-enforced liveness or custom model development with ongoing adversarial evaluation.

The honest answer to “can we use off-the-shelf anti-spoofing for our access control system?” is: it depends on what you are protecting against and what the consequence of a successful attack is. For most office access control, commercial solutions are adequate. For secure facility access or high-value financial transactions, custom evaluation against your specific threat model — including a migration path from the initial integrated vendor solution to a custom model trained on your camera and lighting distribution — is necessary before deployment.

FAQ

Back See Blogs
arrow icon