[Development] Qt TextToSpeech: Adding dependency to Qt Multimedia - ok?

Volker Hilsheimer volker.hilsheimer at qt.io
Mon Jan 23 10:58:56 CET 2023


Hi,

I recently prototyped a few frequently requested features for Qt TextToSpeech, in particular the ability to capture the generated audio data as a QByteArray with the PCM bits. However, a QByteArray with PCM bits isn’t very usable unless we also inform the client code which format those PCM bits are in - sample rate, channel count, and sample format. We have a class in Qt that nicely encapsulates this kind of information: QAudioFormat from Qt Multimedia.

However, Qt TextToSpeech as reintroduced in Qt 6.4 does not generally depend on Qt Multimedia. It depends on it for the Windows Media engine (“winrt”) and for the flite engine on Linux, as from those engines we anyway get the PCM data that we then play ourselves via QAudioSink.

For other engines - the two engines on macOS, the Android engine, the SAPI engine on Windows, and the “speech-dispatcher” engine on Linux - we don’t depend on Qt Multimedia today. So the dependency to Qt Multimedia is listed as an “OPTIONAL_COMPONENT” in the build system of Qt TextToSpeech. Adding an API that uses QAudioFormat to the public QTextToSpeech class, or adding the dependency to all engines, would add this as a “REQUIRED" dependency though.

This is a binary compatibility breakage of sorts. Applications that were linked against Qt 6.4 or Qt 6.5, and want to run against Qt 6.6 won’t work unless Qt Multimedia is present.

The question is whether this is a significant problem in practice. On Linux distributions, we can probably assume that Qt Multimedia is present if Qt TextToSpeech is present. Applications that get deployed with a bundled Qt don’t have a problem anyway - they can include Qt Multimedia in that bundle, or continue to ship the old Qt TextToSpeech.

The alternative is to either not use Qt Multimedia types where they aren’t used already, which means essentially duplicating QAudioFormat (and perhaps other types in the future); or to craft a separate "Qt TextToPCM” module with the new functionality. Both seem rather silly.

What do you think? I’m not aware of any precedence to this particular scenario in recent history, but maybe there have been cases. Perhaps we only want to promise binary compatibility for Qt as a whole (on https://wiki.qt.io/Qt-Version-Compatibility, somewhat outdated, we limit the promises to Qt Essentials, which Qt TextToSpeech is not).


Volker



More information about the Development mailing list