已比較的版本

索引鍵

  • 此行已新增。
  • 此行已移除。
  • 格式已變更。

We provide two integration modes: Audio-Only for devices like IoT products, and Audio+Image for devices that support image uploads, such as smartphones and AR glasses.

目錄
minLevel1
maxLevel6
outlinetrue
stylenone
typelist
printabletrue

Notes for Mediatek DaVinci Integration

...

  • OA laptops: OA laptops are capable of accessing the service, which allows for testing within the company's network. Check this certificate for possible SSL error.

  • OA Server: Direct access from OA servers to the service is currently restricted due to network limitations, which affects the ability to connect to Azure's STT/TTS services for real-time processing. (Previously, batch processing was achievable, but real-time processing is not supported at this time.)

...

WebSocket Connection

i. Obtaining ASSISTANT_ID and API_KEY

Simply follow instruction in the README.

ii. Connection URL:

  • Audio-Only Mode:

...

URL: wss://assistant-audio-stag.dvcbot.net/ws?vision=1&assistant_id=<ASSISTANT_ID>

iii. Specifying Subprotocols:

  • Subprotocols: ['proto', 'API_KEY']

  • Note. The client specifies one or more desired subprotocols by including the Sec-WebSocket-Protocol header in the handshake request.

...

Handling Frames in Packets

The DaVinci Voice Engine (DVE) server-side handles packet serialization and deserialization using protobuf. Therefore, the packets sent and received by the client must also comply with protobuf's encoding and decoding methods. This ensures that the data is correctly structured and understood by both the server and client during communication.

...

Packet Types

The client can send and receive different types of frames, distinguished by protobuf's oneof feature.

...

  • Frame

    • TextFrame: Used for transmitting non-speech information, such as commands to interrupt and stop audio playback.

      程式碼區塊
      languagepy
      message TextFrame {
          uint64 id = 1;
          string name = 2;
          string text = 3;
          }
    • AudioRawFrame: Contains the actual audio data that is to be played out.

      程式碼區塊
      languagepy
      message AudioRawFrame {
          uint64 id = 1;
          string name = 2;
          bytes audio = 3;
          uint32 sample_rate = 4;
          uint32 num_channels = 5;
          }
    • ImageRawFrame (Audio+Image Mode Applicable) Contains image data that can be sent alongside audio data.

      程式碼區塊
      languagepy
      message ImageRawFrame {
          uint64 id = 1;
          string name = 2;
          bytes image = 3;
          repeated uint32 size = 4; // Width, Height
          string format = 5; // e.g., "JPEG", "PNG"
          }

...

Decode/Encode w/ ProtoBuf

Protocol buffers support generated code in C++, C#, Dart, Go, Java, Kotlin, Objective-C, Python, and Ruby. With proto3, you can also work with PHP.

...

To send an image alongside audio data, the image (in PNG or JPG format) must be encoded into bytes and then packed into a protobuf message for serialization. Check python sample code for sending image.

...

Sample Code

Audio-Only Mode

i. Python (w/o interruption handling)

...