Problem
Our client was a small, government-backed startup looking to expand multi-platform screen reading capabilities to the Kazakh language on consumer devices.
Solution
The difficulty of the project was coming from the tight deadline, which this time did not allow for the use of custom models, or even re-training existing ones. The available, pre-trained Kazakh models were mostly available in the form of PyTorch checkpoints; hence converting them to more efficient runtime formats, e.g. ONNX and CoreML, proved to be a challenge.
As an additional problem, albeit open-source application-layer solutions were available to start working with, but many of them relied on outdated build systems and incomplete dependencies, so bringing them back to life also turned out to be difficult.
Results
In the end, our team managed to deliver good enough quality solutions to Android and Windows with efficient real-time models, whilst for iOS devices, the same model turned out to be too memory intensive as a part of the screen reader framework; hence, we had to operate a standalone application in the end.
This was due to an artificial limitation of AVSpeechSynthesisProviderAudioUnit memory requirements that does not allow for using memory that is otherwise available on the device, should it exceed around 80MB of usage. Our team proposed future steps to overcome that in a possible continuation project.