Python (SDK)

from mka1 import SDK


with SDK(
    bearer_auth="<YOUR_BEARER_TOKEN_HERE>",
) as sdk:

    res = sdk.llm.speech.transcribe(file={
        "file_name": "example.file",
        "content": open("example.file", "rb"),
    }, model="auto", language="en", include_speaker_data=True, prompt="This is a technical podcast about machine learning.", temperature=0.2)

    # Handle response
    print(res)

{
  "text": "Hello, this is a sample transcription of the audio file.",
  "language": "en",
  "confidence": 0.95,
  "speakers": [
    {
      "speaker": "Speaker-1",
      "text": "Hello, this is a sample transcription of the audio file.",
      "confidence": 0.95,
      "offset_ms": 0,
      "duration_ms": 2100
    }
  ]
}

Speech

Speech to text transcription

Convert audio to text using advanced speech recognition.

Complete File Upload (Standard) Use Content-Type: multipart/form-data to upload the complete audio file in one request. Maximum file size: 25MB.

Example:

curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
  -F "file=@audio.flac"

Chunked Upload (Streaming) Use Transfer-Encoding: chunked header to stream audio data in chunks as it’s being recorded. No need to know total file size upfront. Server buffers chunks until complete before processing. Maximum total size: 25MB.

Example:

curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
  -H "Transfer-Encoding: chunked" \
  -H "Content-Type: multipart/form-data" \
  --data-binary @audio.flac

Supported Formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM

Query Parameters:

model (optional): Transcription model identifier. Defaults to ‘auto’.
language (optional): ISO-639-1 or BCP-47 language code (e.g., “en”, “en-US”). Auto-detects if not specified.
prompt (optional): Legacy prompt parameter retained for backward compatibility.
temperature (optional): Legacy temperature parameter retained for backward compatibility.
include_speaker_data (optional): When true, include speaker diarization data and require WAV/PCM input. Otherwise transcription uses the standard compatibility path.

Response: Returns transcribed text in JSON format.

POST

api

llm

speech

transcriptions

Python (SDK)

from mka1 import SDK


with SDK(
    bearer_auth="<YOUR_BEARER_TOKEN_HERE>",
) as sdk:

    res = sdk.llm.speech.transcribe(file={
        "file_name": "example.file",
        "content": open("example.file", "rb"),
    }, model="auto", language="en", include_speaker_data=True, prompt="This is a technical podcast about machine learning.", temperature=0.2)

    # Handle response
    print(res)

{
  "text": "Hello, this is a sample transcription of the audio file.",
  "language": "en",
  "confidence": 0.95,
  "speakers": [
    {
      "speaker": "Speaker-1",
      "text": "Hello, this is a sample transcription of the audio file.",
      "confidence": 0.95,
      "offset_ms": 0,
      "duration_ms": 2100
    }
  ]
}

Authorizations

Authorization

string

header

required

Gateway auth: send Authorization: Bearer <mka1-api-key>. For multi-user server-side integrations, you can also send X-On-Behalf-Of: <external-user-id>.

Headers

X-On-Behalf-Of

string

Optional external end-user identifier forwarded by the API gateway.

Query Parameters

model

string

default:auto

Transcription model identifier. Defaults to 'auto' which selects the best available model. Ignored when speaker diarization is requested.

Example:

"auto"

language

string

The language of the input audio in ISO-639-1 or BCP-47 format (for example 'en' or 'en-US'). If not specified, the transcription service auto-detects the language.

Example:

"en"

include_speaker_data

boolean

default:false

Whether to include speaker-segment data. Defaults to false. When true, the response includes a speakers array split by detected speaker.

Example:

true

prompt

string

Legacy prompt parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.

Example:

"This is a technical podcast about machine learning."

temperature

number

Legacy temperature parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.

Required range: 0 <= x <= 1

Example:

0.2

Body

multipart/form-data

file

required

Audio file to transcribe.

File Requirements:

Maximum size: 25MB
Supported formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM
Speaker diarization currently requires WAV / PCM input

Upload Options:

Standard: Upload complete file using multipart/form-data
Chunked: Stream file chunks using Transfer-Encoding: chunked header (useful for real-time recording)

Note: For chunked uploads, server buffers all chunks before processing. Transcription begins only after the complete file is received.

Response

200 - application/json

Response from the transcription endpoint containing the transcribed text, detected language, confidence score, and optional speaker segments.

text

string

required

The transcribed text from the audio file

language

string

The detected or specified language code in ISO-639-1 format (e.g., 'en', 'es', 'fr')

confidence

number

Confidence score from 0 to 1 where 1 indicates highest confidence in transcription accuracy

speakers

object[]

Speaker diarization segments, returned only when include_speaker_data is true.

Show child attributes

List preconfigured skills

Text to speech

⌘I

Documentation Index

Authorizations

Headers

Query Parameters

Body

Response