#speech #google #cognitive #recognition #synthesizing

google-cognitive-apis

Library wrapping Google speech-to-text, text-to-speech and dialogflow APIs. Provides high level API layer wrapping the underlying complexity of GRPC.

3 releases

0.1.2 Jun 30, 2021
0.1.1 Jun 30, 2021
0.1.0 May 28, 2021

#276 in Audio

MIT/Apache

1.5MB
16K SLoC

Google Cognitive APIs

CI License License: MIT version docs

Asynchronous Rust bindings for Google Cloud Platform cognitive gRPC APIs. Provides high level interfaces wrapping complexity of low-level GRPC implementation. Bidirectional gRPC streaming is supported with two alternative approaches:

  • tokio.rs channels
  • asynchronous streams facilitated by crate async-stream

Following APIs are currently supported:

Cognitive API Feature name Status
Dialogflow ES dialogflow In progress
Speech-to-text speech-to-text Complete
Text-to-speech text-to-speech Complete

Google API proto definitions

Google proto definitions have been taken from this repo.

Limitations

  • Only limited subset of Google cognitive APIs is supported. Feel free to raise PR with new additions!
  • Dialogflow CX is not yet supported.
  • For Dialogflow we currently support only SessionClient (The purpose of this library is not support different DialogFlow management APIs).
  • REST APIs are supported with single purpose: to define structs that will enable deserialization of JSON config structures and their conversion into GRPC counterparts. Full support for REST APIs will be not introduced.
  • Dialogflow detect intent streaming (i.e. receiving audio data, performing speech-to-text followed by intent detection) is not fully supported. It seems google APIs require half-close operation to be supported on audio stream to find out no more data will arrive and initiate intent detection. Details can be found here. Thus after streaming in all audio bytes API will simply timeout complaining data needs to be arriving promptly. If you know how to implement half-close with Rust/Tonic toolset let me know (raise PR/issue)! Until then use speech-to-text streaming API and Dialogflow detect intent API separately to achieve the same result.

Examples

You can find all examples here.

License

Licensed under either Apache-2.0 or MIT license.

Dependencies

~23MB
~527K SLoC