Case-Study: Text-to-Speech

Problem

Our client was a small, government-backed startup looking to expand multi-platform screen reading capabilities to the Kazakh language on consumer devices.

Solution

The difficulty of the project was coming from the tight deadline, which this time did not allow for the use of custom models, or even re-training existing ones. The available, pre-trained Kazakh models were mostly available in the form of PyTorch checkpoints; hence converting them to more efficient runtime formats, e.g. ONNX and CoreML, proved to be a challenge.

As an additional problem, albeit open-source application-layer solutions were available to start working with, but many of them relied on outdated build systems and incomplete dependencies, so bringing them back to life also turned out to be difficult.

Results

In the end, our team managed to deliver good enough quality solutions to Android and Windows with efficient real-time models, whilst for iOS devices, the same model turned out to be too memory intensive as a part of the screen reader framework; hence, we had to operate a standalone application in the end.

This was due to an artificial limitation of AVSpeechSynthesisProviderAudioUnit memory requirements that does not allow for using memory that is otherwise available on the device, should it exceed around 80MB of usage. Our team proposed future steps to overcome that in a possible continuation project.

Image by Freepik

Related Posts

The Impact of Conversational AI on the Insurance Industry

The Impact of Conversational AI on the Insurance Industry

25/04/2024

The Ultimate ChatGPT Cheat Sheet: Crafting Effective Prompts

The Ultimate ChatGPT Cheat Sheet: Crafting Effective Prompts

24/04/2024

AI in Customer Service: Efficiency and Personalisation

AI in Customer Service: Efficiency and Personalisation

27/02/2024

How can artificial intelligence replace virtual assistants?

How can artificial intelligence replace virtual assistants?

23/02/2024

Microsoft's AI Journey from Bing to Copilot

Microsoft's AI Journey from Bing to Copilot

8/02/2024

Amazon's AI Banter: Your Shopping Questions Just Got Witty!

Amazon's AI Banter: Your Shopping Questions Just Got Witty!

17/01/2024

AI Chatbots in Health: The Virtual Doctor Dilemma

AI Chatbots in Health: The Virtual Doctor Dilemma

12/01/2024

Microsoft's new button for AI chatbot

Microsoft's new button for AI chatbot

4/01/2024

AI performs speech recognition

AI performs speech recognition

19/12/2023

AI chatbots solve mathematical problems beyond human capacity

AI chatbots solve mathematical problems beyond human capacity

18/12/2023

Case-Study: Performance-porting of GPU application from OpenCL to Metal (15/12/2023)

Google's Gemini AI Raises the Bar (7/12/2023)

Conversational AI – Beyond Basic Chatbots (28/11/2023)

AI Chat Open Assistant Chatbot (1/11/2023)

Use ChatGPT on WhatsApp (30/10/2023)

GPT-3 and GPT-4: Model architecture comparison (27/10/2023)

22 Best Artificial Intelligence Chatbots in 2023 (18/10/2023)

AI Art Prompts with Adobe Firefly (27/09/2023)

Yale's approach to ChatGPT (5/09/2023)

“Deep Learning” - the South Park episode co-written with ChatGPT (30/08/2023)

AI Assistant Chatbot (25/08/2023)

Communicating with animals through AI (23/08/2023)

Conversational AI vs Generative AI (22/08/2023)

Case-Study: NLP Applications for Stock Market Prediction (6/06/2023)

ChatGPT Cheat Sheet (24/05/2023)

How ChatGPT can improve the roadmap process in product development (16/05/2023)

Case-Study: Performance Modelling of AI Inference on GPUs (15/05/2023)

The hacking of ChatGPT (6/04/2023)

ChatGPT in cybersecurity (4/04/2023)

GPT-4 vs GPT-3.5 (22/03/2023)

Microsoft's chatbot wants to be a human? (23/02/2023)

Case Study: Multi-Target Multi-Camera Tracking (10/02/2023)

ChatGPT and Plagiarism (30/01/2023)

Case-Study: Action Recognition (11/01/2023)

Consulting: AI for Personal Training (2/11/2022)

Case-Study: A Generative Approach to Anomaly Detection (22/05/2022)

Read more at TechnoLynx Blog!