Problem

Our client was an avid user and developer of practical AI applications, to the extent that the cost of inference became a leading issue for their firm. To tackle it at a more strategic level, they asked us to model the performance implications of different topologies on certain GPU parameters.

Solution

We embarked on a mission to recreate certain popular operations used in AI models, e.g., different kinds of convolutions, and provided the client with a step-by-step explanation of how things are executed on a low level.

The resulting demonstrational software was entirely done with Python and OpenCL. The performance model also included tooling for predicting inference performance for a given topology and another tool to measure the characteristics of a given (OpenCL-capable) GPU.

Results

Although the model was designed in a way so that it could hold a certain level of predictive power too, ultimately, its main use was the point of view it conveyed. Hence, along with the reports and workshops delivered, its main use turned out to be for internal educational purposes for the client’s team.

Image by Freepik
Image by Freepik