We provide two integration modes: Audio-Only for devices like IoT products, and Audio+Image for devices that support image uploads, such as smartphones and AR glasses.
目錄 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Notes for Mediatek DaVinci Integration
...
OA laptops: OA laptops are capable of accessing the service, which allows for testing within the company's network. Check this certificate for possible SSL error.
OA Server: Direct access from OA servers to the service is currently restricted due to network limitations, which affects the ability to connect to Azure's STT/TTS services for real-time processing. (Previously, batch processing was achievable, but real-time processing is not supported at this time.)
...
WebSocket Connection
i. Obtaining ASSISTANT_ID
and API_KEY
Simply follow instruction in the README.
ii. Connection URL:
Audio-Only Mode:
...
URL:
wss://assistant-audio-stag.dvcbot.net/ws?vision=1&assistant_id=<ASSISTANT_ID>
iii. Specifying Subprotocols:
Subprotocols:
['proto', 'API_KEY']
Note. The client specifies one or more desired subprotocols by including the
Sec-WebSocket-Protocol
header in the handshake request.
...
Handling Frames in Packets
The DaVinci Voice Engine (DVE) server-side handles packet serialization and deserialization using protobuf. Therefore, the packets sent and received by the client must also comply with protobuf's encoding and decoding methods. This ensures that the data is correctly structured and understood by both the server and client during communication.
...
Packet Types
The client can send and receive different types of frames, distinguished by protobuf's oneof feature.
...
Frame
TextFrame
: Used for transmitting non-speech information, such as commands to interrupt and stop audio playback.程式碼區塊 language py message TextFrame { uint64 id = 1; string name = 2; string text = 3; }
AudioRawFrame
: Contains the actual audio data that is to be played out.程式碼區塊 language py message AudioRawFrame { uint64 id = 1; string name = 2; bytes audio = 3; uint32 sample_rate = 4; uint32 num_channels = 5; }
ImageRawFrame
(Audio+Image Mode Applicable) Contains image data that can be sent alongside audio data.程式碼區塊 language py message ImageRawFrame { uint64 id = 1; string name = 2; bytes image = 3; repeated uint32 size = 4; // Width, Height string format = 5; // e.g., "JPEG", "PNG" }
...
Decode/Encode w/ ProtoBuf
Protocol buffers support generated code in C++, C#, Dart, Go, Java, Kotlin, Objective-C, Python, and Ruby. With proto3, you can also work with PHP.
...
To send an image alongside audio data, the image (in PNG or JPG format) must be encoded into bytes and then packed into a protobuf message for serialization. Check python sample code for sending image.
...
Sample Code
Audio-Only Mode
i. Python (w/o interruption handling)
...