頁面比較

We provide two integration modes: Audio-Only for devices like IoT products, and Audio+Image for devices that support image uploads, such as smartphones and AR glasses.

minLevel	1
maxLevel	6
outline	true
style	none
type	list
printable	true

Notes for Mediatek DaVinci Integration

...

OA laptops: OA laptops are capable of accessing the service, which allows for testing within the company's network. Check this certificate for possible SSL error.
OA Server: Direct access from OA servers to the service is currently restricted due to network limitations, which affects the ability to connect to Azure's STT/TTS services for real-time processing. (Previously, batch processing was achievable, but real-time processing is not supported at this time.)

...

WebSocket Connection

i. Obtaining ASSISTANT_ID and API_KEY

Simply follow instruction in the README.

ii. Connection URL:

Audio-Only Mode:

...

URL: wss://assistant-audio-stag.dvcbot.net/ws?vision=1&assistant_id=<ASSISTANT_ID>

iii. Specifying Subprotocols:

Subprotocols: ['proto', 'API_KEY']
Note. The client specifies one or more desired subprotocols by including the Sec-WebSocket-Protocol header in the handshake request.

...

Handling Frames in Packets

The DaVinci Voice Engine (DVE) server-side handles packet serialization and deserialization using protobuf. Therefore, the packets sent and received by the client must also comply with protobuf's encoding and decoding methods. This ensures that the data is correctly structured and understood by both the server and client during communication.

...

Packet Types

The client can send and receive different types of frames, distinguished by protobuf's oneof feature.

...

Frame

TextFrame: Used for transmitting non-speech information, such as commands to interrupt and stop audio playback.
程式碼區塊
language py
message TextFrame { uint64 id = 1; string name = 2; string text = 3; }

AudioRawFrame: Contains the actual audio data that is to be played out.

程式碼區塊

language	py

message AudioRawFrame {
    uint64 id = 1;
    string name = 2;
    bytes audio = 3;
    uint32 sample_rate = 4;
    uint32 num_channels = 5;
    }

ImageRawFrame (Audio+Image Mode Applicable) Contains image data that can be sent alongside audio data.

程式碼區塊

language	py

message ImageRawFrame {
    uint64 id = 1;
    string name = 2;
    bytes image = 3;
    repeated uint32 size = 4; // Width, Height
    string format = 5; // e.g., "JPEG", "PNG"
    }

...

Decode/Encode w/ ProtoBuf

Protocol buffers support generated code in C++, C#, Dart, Go, Java, Kotlin, Objective-C, Python, and Ruby. With proto3, you can also work with PHP.

...

To send an image alongside audio data, the image (in PNG or JPG format) must be encoded into bytes and then packed into a protobuf message for serialization. Check python sample code for sending image.

已比較的版本

Old Version 3

新版本 4

索引鍵

Notes for Mediatek DaVinci Integration

WebSocket Connection

Handling Frames in Packets

Packet Types

Decode/Encode w/ ProtoBuf

Sample Code

Audio-Only Mode

i. Python (w/o interruption handling)