TensorFlow failing to find a GPU is one of the most common setup problems we see in ML engineering. The symptom is consistent — training runs but only on CPU, silently — and the causes are specific and diagnosable. This article covers the verification commands, the common failure modes, and a systematic diagnostic approach that gets you from “TensorFlow can’t find my GPU” to a working configuration without reinstalling the OS. The pattern matters because silent CPU fallback is wasteful in a way that hides itself. A model still trains. Loss still decreases. The only signal that something is wrong is that an epoch takes hours instead of minutes — and by then you’ve already burned compute time on the wrong device. Verifying GPU Detection The definitive check is tf.config.list_physical_devices: import tensorflow as tf gpus = tf.config.list_physical_devices('GPU') print(f"GPUs available: {len(gpus)}") for gpu in gpus: print(f" {gpu}") If this returns an empty list, TensorFlow cannot see any GPU. If it returns GPU devices, TensorFlow has successfully initialised the CUDA runtime and detected hardware. A secondary check that also shows TensorFlow’s CUDA build configuration: print(tf.test.is_built_with_cuda()) # True if TF was built with CUDA support print(tf.test.is_gpu_available()) # Deprecated but still informative print(tf.sysconfig.get_build_info()) # CUDA and cuDNN versions TF was built against For confirming actual GPU execution (not just detection): with tf.device('/GPU:0'): a = tf.constant([[1.0, 2.0], [3.0, 4.0]]) b = tf.constant([[5.0, 6.0], [7.0, 8.0]]) c = tf.matmul(a, b) print(c) If this executes without error and doesn’t fall back to CPU, GPU execution is confirmed. The four common failure modes Across the engagements where we’ve helped teams unblock TensorFlow GPU setup, the same four causes account for nearly every case. The order below is rough frequency, highest first. 1. CUDA Version Mismatch This is the most frequent cause. TensorFlow has strict CUDA version requirements — a TF binary built against CUDA 11.8 will not work with CUDA 12.x installed, and vice versa. The runtime loader will silently fail to bind the GPU kernels and TensorFlow falls back to CPU. Check what TensorFlow expects: import tensorflow as tf build_info = tf.sysconfig.get_build_info() print(f"CUDA version TF built with: {build_info['cuda_version']}") print(f"cuDNN version TF built with: {build_info['cudnn_version']}") Check what’s installed: nvcc --version # CUDA toolkit version nvidia-smi # Driver version and supported CUDA cat /usr/local/cuda/version.txt # Installed CUDA version ls /usr/local/cuda-*/ # All installed CUDA versions TensorFlow CUDA compatibility matrix (key recent versions): TensorFlow Python CUDA cuDNN 2.13 3.8–3.11 11.8 8.6 2.14 3.9–3.11 11.8 8.7 2.15 3.9–3.11 12.2 8.9 2.16 3.9–3.12 12.3 8.9 2.17 3.9–3.12 12.3 8.9 If there’s a mismatch, either install the TF version matching your CUDA, or install the CUDA version matching your TF. The simplest resolution on Linux is tensorflow[and-cuda], which pulls in the correct CUDA libraries automatically for TF 2.12+: pip install tensorflow[and-cuda] 2. Driver Version Too Old The NVIDIA driver must be new enough to support the installed CUDA toolkit. CUDA 12.x requires driver ≥525.85 on Linux. Installing a new CUDA toolkit without updating the driver is a common mistake — the toolkit installs cleanly, but the runtime cannot initialise. Check driver version: nvidia-smi --query-gpu=driver_version --format=csv,noheader Minimum driver versions: CUDA Version Min Linux Driver Min Windows Driver 11.8 520.61 522.06 12.0 525.85 527.41 12.2 535.54 536.25 12.3 545.23 545.84 12.4 550.54 551.61 3. TensorFlow Not Built with GPU Support The tensorflow package on PyPI for some platforms or CPU architectures is the CPU-only build. Verify: pip show tensorflow | grep -i "version\|location" python -c "import tensorflow as tf; print(tf.test.is_built_with_cuda())" If is_built_with_cuda() returns False, you have the CPU-only package. On Linux x86_64, install tensorflow or tensorflow[and-cuda] from PyPI (GPU builds are default). On Apple Silicon Macs, use tensorflow-metal instead. 4. CUDA Libraries Not on the Library Path The CUDA runtime libraries (libcuda.so, libcudnn.so, libcublas.so) must be findable by the dynamic linker. If installed in non-standard locations: ldconfig -p | grep libcuda # Check if libcuda is in linker cache ldconfig -p | grep libcudnn # Check cuDNN ls /usr/local/cuda/lib64/ # Check default CUDA lib location Fix if libraries exist but aren’t found: export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH For a permanent fix, add an entry to /etc/ld.so.conf.d/ and run ldconfig. 5. GPU Not Visible in Container In Docker or Kubernetes environments, GPU visibility requires three things working together: NVIDIA Container Toolkit installed on the host --gpus all or --gpus "device=0" flag in the docker run command Or deploy.resources.reservations.devices in docker-compose.yml docker run --gpus all nvidia/cuda:12.2-base nvidia-smi If that command does not list your GPU, the container runtime cannot pass the device through and TensorFlow inside the container will never see it — no library-path or driver fix will help until this layer works. What does it mean when nvidia-smi works but TensorFlow still doesn’t see the GPU? This is the most diagnostic pattern in the whole space. nvidia-smi succeeding tells you the driver is loaded and the device is visible to the kernel. TensorFlow still failing to detect the GPU after that means the failure is above the driver — almost always one of: a CUDA toolkit mismatch (failure mode 1), a CPU-only TensorFlow build (failure mode 3), or a missing library on the linker path (failure mode 4). It is rarely a driver problem at that point. The corollary: if nvidia-smi itself fails, stop debugging TensorFlow until you have fixed the driver. Nothing higher in the stack will work. Systematic Diagnostic Checklist Work this list in order. Each step rules out a class of failure before the next. nvidia-smi runs and shows the GPU? (If not: driver issue, stop here) nvcc --version shows expected CUDA version? (If not: toolkit not installed or wrong PATH) tf.test.is_built_with_cuda() returns True? (If not: wrong TF package) tf.sysconfig.get_build_info()['cuda_version'] matches installed CUDA? (If not: version mismatch) Driver version meets the minimum for that CUDA version? ldconfig -p | grep libcuda finds the library? (If not: library path issue) In a container: --gpus all flag present? NVIDIA Container Toolkit installed on host? tf.config.list_physical_devices('GPU') returns devices? (Final confirmation) Enabling GPU Memory Growth After confirming detection, enabling memory growth prevents TensorFlow from pre-allocating all available VRAM at startup, which would block other processes from using the same GPU: gpus = tf.config.list_physical_devices('GPU') for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) This is the right default for multi-process or multi-model environments — shared inference servers, notebooks running alongside training jobs, or any setup where more than one Python process touches the same device. Where this sits in the broader profiling workflow Detection is the precondition, not the goal. Once TensorFlow sees the GPU, the next question is whether it is actually being used efficiently — and that requires profiling, not just configuration. A model can be running on the GPU at 100% utilisation according to nvidia-smi and still be memory-bound, host-bound, or starved by an I/O pipeline that can’t keep the kernels fed. We cover the structural approach to that in how to profile GPU kernels to find the real bottleneck. FAQ