Case-Study: Performance Modelling of AI Inference on GPUs (Under NDA)

Learn how TechnoLynx helps reduce inference costs for trained neural networks and real-time applications including natural language processing, video games, and large language models.

15/05/2023

Case-Study: Performance Modelling of AI Inference on GPUs (Under NDA)

Problem

Our client was heavily involved in the development and use of AI applications in various sectors. As their AI models became more complex, the cost of inference—running models to generate results—became a critical issue for the company. The client, highly experienced with AI models, sought a way to reduce these costs by optimising their use of graphics processing units (GPUs). They wanted to better understand the relationship between different GPU topologies and their impact on performance, including factors like clock speeds and ray tracing capabilities.

The client was particularly concerned about the efficiency of their machine learning models. They were running multiple models across a wide range of GPU architectures, including dedicated graphics cards and discrete GPUs. Each type of GPU had different strengths and weaknesses, and the client needed to optimise its resources strategically. They wanted a way to predict the inference performance of various models on different GPU topologies to reduce running costs without sacrificing performance.

Solution

Our task was to model the performance of various AI operations on different GPU architectures and provide the client with clear insights into the performance implications of each. We needed to examine popular AI model operations, such as convolutions, which are central to tasks like image recognition and video analysis. Our approach involved recreating several of these operations and modelling them on a low-level GPU system.

We used Python and OpenCL for this task. Python provided flexibility in coding and testing, while OpenCL gave us the ability to work closely with the underlying GPU hardware. This allowed us to model the exact behaviours of the GPU as it executed complex machine learning tasks.

The core of our solution involved creating a performance model that could predict how well certain GPU topologies would perform with different types of AI workloads. This model took into account various GPU parameters such as:

Clock speeds: Higher clock speeds typically lead to faster processing, but they can also increase power consumption and heat generation.
Memory bandwidth: This determines how quickly data can be transferred between the GPU and the system’s main memory.
Parallel processing: Many AI models, particularly deep learning models, require large amounts of data to be processed simultaneously. GPUs excel at this because they can handle multiple calculations in parallel.
Compute units: These are the individual processing units inside the GPU, which determine how many tasks it can handle at once.

We also designed a tool to measure the characteristics of any OpenCL-capable GPU the client was using. This tool could analyse the GPU’s performance on specific tasks and provide detailed feedback on how it would handle different AI models.

Performance Modelling in GPUs

Performance modelling of GPUs is an important part of optimising AI systems. Modern GPUs are highly specialised hardware designed to handle tasks like 3D graphics, virtual reality, and machine learning. They are far more efficient at these tasks than central processing units (CPUs) because they have hundreds or even thousands of cores that can process data simultaneously.

In this case, we focused on discrete GPUs, which are separate from the system’s main CPU and memory. These dedicated graphics cards have their own memory and processing power, making them ideal for high-intensity tasks like AI inference. However, discrete GPUs vary in their ability to handle different AI models, and understanding which GPU was best suited for the client’s needs was critical to optimising their system.

For instance, the client had a variety of video cards at their disposal, including models that supported advanced features like ray tracing for 3D graphics. However, these features, while useful in areas like virtual reality, didn’t always provide a performance boost for their specific AI tasks. Our model allowed the client to identify which features were essential for their work and which were unnecessary, saving them valuable resources.

Predicting GPU Performance for AI Models

The predictive aspect of the performance model was key to helping the client reduce costs. By analysing the characteristics of a GPU—such as its clock speeds, memory bandwidth, and parallel processing capabilities—the client could predict how efficiently it would run their AI models.

For example, the client often used machine learning algorithms that involved multiple layers of convolution and matrix multiplication. These operations are highly parallelisable, meaning they run best on GPUs with a large number of cores and high memory bandwidth. On the other hand, certain types of tasks, such as training models with very large datasets, may require GPUs with high memory capacity rather than just raw processing power.

With the model we developed, the client was able to forecast how different AI models would perform on various GPU architectures. This allowed them to choose the most cost-effective GPU for each specific task, significantly reducing their inference costs. Additionally, by knowing which features were essential for their work, they could avoid purchasing more expensive GPUs with unnecessary capabilities.

Results

The final result of our work was a detailed performance model that not only helped the client predict how well their AI models would perform on different GPU architectures, but also provided them with valuable insights into how their graphics cards worked on a low level. This knowledge was crucial for their development team, enabling them to optimise their use of GPUs in the long term.

The model we provided was sophisticated enough to predict performance across a wide range of GPU architectures. The client was now able to test various AI models on GPUs with different configurations, identifying the best possible setup for their needs.

The tools we developed also helped the client measure the performance of their discrete GPUs. By analysing the clock speeds, memory usage, and other parameters, the client was able to make informed decisions about which GPU to use for different types of tasks.

The most significant benefit, however, was the cost savings. By optimising their use of GPU resources, the client reduced the amount of time and money they spent on AI inference. This not only improved the performance of their models but also allowed them to reallocate resources to other areas of their business.

Educational Value

Beyond achieving an optimised performance for the client’s AI system, the performance model offered invaluable insights into how GPUs function at a fundamental level.

While the performance model was primarily designed to optimise their AI systems, the insights it provided were invaluable for understanding how GPUs functioned at a fundamental level.

Through our reports and workshops, the client’s development team gained a deeper understanding of how their GPUs worked, enabling them to better utilise these powerful tools in future projects. The client appreciated this internal educational purpose, which helped them enhance their AI capabilities over time.

Conclusion

Our performance modelling project helped the client tackle the growing costs associated with AI inference by optimising their use of GPUs. By building a model that could predict the performance of various AI models on different GPU architectures, we enabled the client to make better-informed decisions and save on GPU resources.

As artificial intelligence (AI) continues to grow in use, demands on computational power rise sharply. This applies across many sectors, from natural language processing to video games. In real-time applications including financial forecasting and user behaviour tracking, delays can cause serious issues.

By combining the performance model with data from trained neural networks, the client can now adjust GPU usage on the fly. This real-time adaptability ensures faster output, lower energy use, and better overall reliability. It also helps when working with large language models, which require steady, efficient processing. The flexibility gained made future scaling far easier.

In the long run, the performance model proved to be not just a tool for improving efficiency, but also a valuable educational resource for the client’s team. This project highlighted the importance of understanding the intricate relationship between AI workloads and GPU performance, enabling the client to build more cost-effective, high-performance systems for the future.

At TechnoLynx, we specialise in helping businesses optimise their AI workflows. Whether you’re looking to improve your GPU performance, reduce costs, or develop new AI solutions, our team can provide the tools and expertise you need to succeed.

Read our Blog!

Technical Excellence

Founded in 2019 by Balázs Keszthelyi, co-inventor of more than a dozen patents and contributor to two international standards, we know how to beat the state-of-the-art.

Balázs’ passion for high quality and superior performance sets a high bar, generating value for our clients and growth for our employees.

Meet our team

Technologies

Computer Vision
Generative AI
Extended Reality (XR)

What We Do

We specialise in guiding clients through the entire research and development journey, from initial prototyping to seamless integration and even safeguarding intellectual property. As an innovative solutions center, we not only identify areas for workflow enhancement but also actively engage in crafting and implementing solutions.

Reach out!

Services

Technical Business Analysis & Consulting
R&D Outsourcing
Custom Software Development
MLOps
Performance Optimisation

24/06/2025

Artificial Intelligence on Air Traffic Control

Learn how artificial intelligence improves air traffic control with neural network decision support, deep learning, and real-time data processing for safer skies.

11/06/2025

5 Ways AI Helps Fuel Efficiency in Aviation

Learn how AI improves fuel efficiency in aviation. From reducing fuel use to lowering emissions, see 5 real-world use cases helping the industry.

10/06/2025

AI in Aviation: Boosting Flight Safety Standards

Learn how AI is helping improve aviation safety. See how airlines in the United States use AI to monitor flights, predict problems, and support pilots.

6/06/2025

IoT Cybersecurity: Safeguarding against Cyber Threats

Explore how IoT cybersecurity fortifies defences against threats in smart devices, supply chains, and industrial systems using AI and cloud computing.

5/06/2025

Large Language Models Transforming Telecommunications

Discover how large language models are enhancing telecommunications through natural language processing, neural networks, and transformer models.

4/06/2025

Real-Time AI and Streaming Data in Telecom

Discover how real-time AI and streaming data are transforming the telecommunications industry, enabling smarter networks, improved services, and efficient operations.

3/06/2025

AI in Aviation Maintenance: Smarter Skies Ahead

Learn how AI is transforming aviation maintenance. From routine checks to predictive fixes, see how AI supports all types of maintenance activities.

2/06/2025

AI-Powered Computer Vision Enhances Airport Safety

Learn how AI-powered computer vision improves airport safety through object detection, tracking, and real-time analysis, ensuring secure and efficient operations.

30/05/2025

Fundamentals of Computer Vision: A Beginner's Guide

Learn the basics of computer vision, including object detection, convolutional neural networks, and real-time video analysis, and how they apply to real-world problems.

29/05/2025

Computer Vision in Smart Video Surveillance powered by AI

Learn how AI and computer vision improve video surveillance with object detection, real-time tracking, and remote access for enhanced security.

28/05/2025

Generative AI Tools in Modern Video Game Creation

Learn how generative AI, machine learning models, and neural networks transform content creation in video game development through real-time image generation, fine-tuning, and large language models.

27/05/2025

Artificial Intelligence in Supply Chain Management

Learn how artificial intelligence transforms supply chain management with real-time insights, cost reduction, and improved customer service.

26/05/2025

Content-based image retrieval with Computer Vision

Learn how content-based image retrieval uses computer vision, deep learning models, and feature extraction to find similar images in vast digital collections.

23/05/2025

What is Feature Extraction for Computer Vision?

Discover how feature extraction and image processing power computer vision tasks—from medical imaging and driving cars to social media filters and object tracking.

22/05/2025

Machine Vision vs Computer Vision: Key Differences

Learn the differences between machine vision and computer vision—hardware, software, and applications in automation, autonomous vehicles, and more.

21/05/2025

Computer Vision in Self-Driving Cars: Key Applications

Discover how computer vision and deep learning power self-driving cars—object detection, tracking, traffic sign recognition, and more.

20/05/2025

Machine Learning and AI in Modern Computer Science

Discover how computer science drives artificial intelligence and machine learning—from neural networks to NLP, computer vision, and real-world applications. Learn how TechnoLynx can guide your AI journey.

19/05/2025

Real-Time Data Streaming with AI

You have surely heard that ‘Information is the most powerful weapon’. However, is a weapon really that powerful if it does not arrive on time? Explore how real-time streaming powers Generative AI across industries, from live image generation to fraud detection.

17/05/2025

Core Computer Vision Algorithms and Their Uses

Discover the main computer vision algorithms that power autonomous vehicles, medical imaging, and real-time video. Learn how convolutional neural networks and OCR shape modern AI.

15/05/2025

Case Study: CloudRF  Signal Propagation and Tower Optimisation

See how TechnoLynx helped CloudRF speed up signal propagation and tower placement simulations with GPU acceleration, custom algorithms, and cross-platform support. Faster, smarter radio frequency planning made simple.

14/05/2025

Applying Machine Learning in Computer Vision Systems

Learn how machine learning transforms computer vision—from object detection and medical imaging to autonomous vehicles and image recognition.

13/05/2025

Cutting-Edge Marketing with Generative AI Tools

Learn how generative AI transforms marketing strategies—from text-based content and image generation to social media and SEO. Boost your bottom line with TechnoLynx expertise.

12/05/2025

AI Object Tracking Solutions: Intelligent Automation

AI tracking solutions are incorporating industries in different sectors in safety, autonomous detection and sorting processes. The use of computer vision and high-end computing is key in AI tracking.

9/05/2025

Feature Extraction and Image Processing for Computer Vision

Learn how feature extraction and image processing enhance computer vision. Discover techniques, applications, and how TechnoLynx can assist your AI projects.

8/05/2025

Fine-Tuning Generative AI Models for Better Performance

Understand how fine-tuning improves generative AI. From large language models to neural networks, TechnoLynx offers advanced solutions for real-world AI applications.

7/05/2025

Image Segmentation Methods in Modern Computer Vision

Learn how image segmentation helps computer vision tasks. Understand key techniques used in autonomous vehicles, object detection, and more.

6/05/2025

Generative AI's Role in Shaping Modern Data Science

Learn how generative AI impacts data science, from enhancing training data and real-time AI applications to helping data scientists build advanced machine learning models.

5/05/2025

Deep Learning vs. Traditional Computer Vision Methods

Compare deep learning and traditional computer vision. Learn how deep neural networks, CNNs, and artificial intelligence handle image recognition and quality control.

30/04/2025

Control Image Generation with Stable Diffusion

Learn how to guide image generation using Stable Diffusion. Tips on text prompts, art style, aspect ratio, and producing high quality images.

29/04/2025

Object Detection in Computer Vision: Key Uses and Insights

Learn how object detection with computer vision transforms industries, from autonomous driving to medical imaging, using AI, CNNs, and deep learning.

28/04/2025

The Foundation of Generative AI: Neural Networks Explained

Find out how neural networks support generative AI models with applications like content creation, and where these models are used in real-world scenarios.

25/04/2025

Virtual Reality Transforming Modern Manufacturing Processes

Learn how virtual reality is changing the manufacturing industry. From assembly lines to lean manufacturing, VR applications improve real-time production, training, and design.

22/04/2025

Computer Vision Applications in Autonomous Vehicles

Learn how computer vision, deep learning models, and AI drive autonomous vehicles. Understand applications like object detection, image classification, and driver assistance to reduce human error on real-world roads.

17/04/2025

Agentic AI vs Generative AI: What Sets Them Apart?

Understand the difference between agentic AI and generative AI, including how they work in content creation, deep learning, and artificial intelligence applications.

15/04/2025

Extended Reality in Remote Work: A Practical Shift

See how extended reality, including virtual, augmented, and mixed reality, is changing the remote work experience through immersive real-time environments.

14/04/2025

Top Cutting-Edge Generative AI Applications in 2025

Learn how applications in text, image, music, fashion, architecture, and business are driven by deep learning, neural networks, and large language models.

11/04/2025

Computer Vision for Production Line Inspections

Learn how computer vision improves quality checks on production lines. AI, deep learning, and visual data make inspections faster and more reliable.

10/04/2025

The Growing Need for Video Pipeline Optimisation

Learn how video pipeline optimisation improves real-time computer vision performance. Reduce bandwidth use, transmit data efficiently, and scale AI applications with ease.

9/04/2025

Unlocking XR’s True Power with Smarter GPU Optimisation

Learn how optimising your GPU can enhance performance, reduce costs, and improve user experience. Discover best practices, real-world case studies.

9/04/2025

TechnoLynx Named a Top Machine Learning Company

TechnoLynx named a top machine learning development company by Vendorland. We specialise in AI, supervised learning, and custom machine learning systems that deliver real business results.

8/04/2025

Cloud Computing and Computer Vision in Practice

See how computer vision and cloud computing work together. Learn how AI, deep learning, and cloud services improve image processing and object detection.

7/04/2025

XR: The Future of Immersion

It is really impressive how far technology has come. In some fields, we have reached a point where we don’t always seek revolutionary solutions but fun solutions as well. The idea of Extended Reality (XR) has become a reality in recent years, and it always keeps improving.

4/04/2025

Real-Time AI Motion Tracking in XR Experiences

Learn how motion tracking works in XR. See how real-time systems use AI and motion capture for smoother virtual reality experiences.

3/04/2025

Generative AI Models: How They Work and Why They Matter

Learn how generative AI models like GANs, VAEs, and LLMs work. Understand their role in content creation, image generation, and AI applications.

2/04/2025

Augmented and Virtual Reality in Real Estate Industry

Learn how augmented and virtual reality improve real estate with virtual tours, headsets, and real-time interaction in both real and digital spaces.

1/04/2025

Augmented Reality 3D Billboards: Future of Advertising

Learn how augmented reality 3D billboards use AR apps, mobile devices, and real-world views to create immersive advertising in real time.

31/03/2025

Markov Chains in Generative AI Explained

Discover how Markov chains power Generative AI models, from text generation to computer vision and AR/VR/XR. Explore real-world applications!

28/03/2025

Augmented Reality Entertainment: Real-Time Digital Fun

See how augmented reality entertainment is changing film, gaming, and live events with digital elements, AR apps, and real-time interactive experiences.

Case-Study: Performance Modelling of AI Inference on GPUs (Under NDA)