top of page

Embedded Video Coding on GPU

Problem

Solution

Result

Our client was a post-funding startup, who had a novel idea for both application and implementation of embedded video coding, but unfortunately they have found that existing encoders could not be customized to the extent of what they needed in order to interact with the rest of their pipeline. Having said that, in their proposed solution they did not have control over the decoder side, so it was necessary to still adhere to well-established coding standards. Additionally, the embedded system they were targeting had a competent GPU and an already potentially over-worked CPU, hence the use of the GPU part came up as an important constraint. As a result, our client was looking for a niche mix of GPU and video compression expertise.

Our client already had other contractors and a competent team in house, mostly focusing on the video coding side of the story, so unlike with our typical projects, this was a closer day-to-day collaboration. Having said that, relatively distinct modules and appropriate APIs between them could be defined, so at the end of the day the necessary level of autonomy to do an expert's work has been observed nonetheless.

TechnoLynx was focusing on the implementation of transform and prediction level functionality, which are ultimately affecting the compression efficiency delivered by an encoder. Starting off from a state-of-the-art baseline agreed with our client, we have iteratively improved upon the proposed solution and its implementation.

For this project we have used cross-platform C++, CMake and mostly CUDA. We have reacted to issues of incosistent behaviour across operating systems and different GPU environments, until we managed to come up with sufficiently stable code.

Ultimately the team delivered an encoding solution that met the client's requirements with regards to the expected level of customization and pipeline integration, and at the same time managed to beat the set benchmark in compression efficiency. The runtime performance of the solution has also been satisfactory.

bottom of page