[Development] RFC: Speech Recognition API

Turunen Tuukka tuukka.turunen at theqtcompany.com
Sat Sep 19 07:14:45 CEST 2015


This we'll do as well, but typically closer (or after) having the first binaries released. 

At this point it is just the code. The goal is to make it straightforward to use ASR and TTS functionality from Qt apps and devices. This helps in accessibility and enables hands free operation of functionality. Engines are coming from 3rd parties and architecture is planned to allow using different engines easily. The feature and planned feature list by Tuomas also gives an overview what is intended to be available first.



> "rpzrpzrpz at gmail.com" <rpzrpzrpz at gmail.com> kirjoitti 18.9.2015 kello 18.29:
> Tuukka:
> Instead of pointing us at source code to inspect, maybe a youtube
> video outlining in a more visual way your current plan of attack
> and some of the considerations and issues andn goals that you feel are 
> relevant given your current state of research.
> What are the Android vs IOS tradeoffs?
> I would be much more happy to comment on that higher level discussion 
> because I would like to see how voice input might "fit" into apps across 
> the different platforms.
> That way you have higher level set of concerns driven by users instead 
> of abstract source code which is just being laid down as we speak.
> md
>> On 9/18/2015 6:24 AM, Turunen Tuukka wrote:
>> Hi,
>> There has been quite little comments and feedback to the new speech APIs. It would be great to have people to look into this and provide comments already during the early development. We hope to have a technology preview available later this year, but getting feedback only after this phase is not optimal. Even if you are not an expert on speech recognition, but just interested in it, please take a look and provide feedback to Tuomas and others developing this.
>> Yours,
>>    Tuukka
>>> -----Original Message-----
>>> From: development-bounces+tuukka.turunen=theqtcompany.com at qt-
>>> project.org [mailto:development-
>>> bounces+tuukka.turunen=theqtcompany.com at qt-project.org] On Behalf Of
>>> Tuomas Tuononen
>>> Sent: 10. syyskuuta 2015 16:18
>>> To: development at qt-project.org
>>> Subject: [Development] RFC: Speech Recognition API
>>> Hi all,
>>> We are developing a new QtSpeechRecognition API for Qt.
>>> The initial stage of development starts to be ready and we would like to get
>>> feedback from the Qt development list experts.
>>> You can find the sources in review in qtspeech project, branch
>>> wip/speech-recognition:
>>> https://codereview.qt-project.org/#/q/project:qt/qtspeech+branch:wip/speech-
>>> recognition,n,z
>>> The initial implementation was already merged into wip/speech-recognition, but
>>> is still open for discussion.
>>> Main features:
>>> - Speech recognition engines are loaded as plug-ins.
>>> - Engine is controlled asynchronously, causing only minimal load to the
>>> application thread.
>>> - Built-in task queue makes plug-in development easier and forces unified
>>> behavior between engine integrations.
>>> - Engine integration handles the audio recording, making it easy to use from the
>>> application.
>>> - Application can create multiple grammars and switch between them.
>>> - Setting mute temporarily disables speech recognition, allowing co-operation
>>> with audio output (speech prompts or audio cues).
>>> - Includes integration to PocketSphinx engine (latest codebase) as a reference.
>>> Current limitations:
>>> - The recognition control only supports manual start/stop of audio recording.
>>> - Only supports reading the grammars and dictionaries from files.
>>> - The grammar and dictionary formats are engine-specific.
>>> - Only transcription of what was said is returned (locale-specific).
>>> - Switching the grammar always aborts current session.
>>> Other notes:
>>> - QML API probably needs improvement, as I'm not an expert in that area.
>>> Maybe an extra QML plug-in layer?
>>> Future development:
>>> 1) Support for automatic endpointing (auto-stop) and continuously listening
>>> grammars.
>>> 2) Engine-independent run-time grammar API (support for updating the whole
>>> grammar or only part of the grammar).
>>> 3) Support for adding words to the dictionary run-time (probably with engine-
>>> specific phonetic data).
>>> 4) Support for switching the grammar without interrupting audio recording (e.g.
>>> switching from wake-up grammar to command grammar).
>>> 5) Engine-independent grammar file format.
>>> 6) Support for locale-independent results (grammar meta-data).
>>> Best Regards,
>>> Tuomas Tuononen,
>>> Senior Software Engineer,
>>> Code-Q Oy
>>> _______________________________________________
>>> Development mailing list
>>> Development at qt-project.org
>>> http://lists.qt-project.org/mailman/listinfo/development
>> _______________________________________________
>> Development mailing list
>> Development at qt-project.org
>> http://lists.qt-project.org/mailman/listinfo/development
> -- 
> No spell checkers were harmed during the creation of this message.
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development

More information about the Development mailing list