Speech to text transcription
Convert audio to text using advanced speech recognition.
Complete File Upload (Standard)
Use Content-Type: multipart/form-data to upload the complete audio file in one request. Maximum file size: 25MB.
Example:
curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
-F "file=@audio.flac"
Chunked Upload (Streaming)
Use Transfer-Encoding: chunked header to stream audio data in chunks as it’s being recorded. No need to know total file size upfront. Server buffers chunks until complete before processing. Maximum total size: 25MB.
Example:
curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
-H "Transfer-Encoding: chunked" \
-H "Content-Type: multipart/form-data" \
--data-binary @audio.flac
Supported Formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM
Query Parameters:
model(optional): Transcription model identifier. Defaults to ‘auto’.language(optional): ISO-639-1 or BCP-47 language code (e.g., “en”, “en-US”). Auto-detects if not specified.prompt(optional): Legacy prompt parameter retained for backward compatibility.temperature(optional): Legacy temperature parameter retained for backward compatibility.include_speaker_data(optional): Whentrue, include speaker diarization data and require WAV/PCM input. Otherwise transcription uses the standard compatibility path.
Response: Returns transcribed text in JSON format.
Documentation Index
Fetch the complete documentation index at: https://docs.mka1.com/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Gateway auth: send Authorization: Bearer <mka1-api-key>. For multi-user server-side integrations, you can also send X-On-Behalf-Of: <external-user-id>.
Headers
Optional external end-user identifier forwarded by the API gateway.
Query Parameters
Transcription model identifier. Defaults to 'auto' which selects the best available model. Ignored when speaker diarization is requested.
"auto"
The language of the input audio in ISO-639-1 or BCP-47 format (for example 'en' or 'en-US'). If not specified, the transcription service auto-detects the language.
"en"
Whether to include speaker-segment data. Defaults to false. When true, the response includes a speakers array split by detected speaker.
true
Legacy prompt parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.
"This is a technical podcast about machine learning."
Legacy temperature parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.
0 <= x <= 10.2
Body
Audio file to transcribe.
File Requirements:
- Maximum size: 25MB
- Supported formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM
- Speaker diarization currently requires WAV / PCM input
Upload Options:
- Standard: Upload complete file using multipart/form-data
- Chunked: Stream file chunks using Transfer-Encoding: chunked header (useful for real-time recording)
Note: For chunked uploads, server buffers all chunks before processing. Transcription begins only after the complete file is received.
Response
OK
Response from the transcription endpoint containing the transcribed text, detected language, confidence score, and optional speaker segments.
The transcribed text from the audio file
The detected or specified language code in ISO-639-1 format (e.g., 'en', 'es', 'fr')
Confidence score from 0 to 1 where 1 indicates highest confidence in transcription accuracy
Speaker diarization segments, returned only when include_speaker_data is true.