API Documentation

Overview

POST /api/synthesize

Convert plain text or Markdown into spoken audio and WebVTT subtitles.

Request

Send JSON to generate audio

Headers

Content-Type: application/json
Accept: application/json

Body

{
  "text": "Your text or markdown content here"
}

Send a JSON payload with a single text field containing the text or Markdown content to synthesize. For Markdown inputs, front matter is parsed automatically.

Response

Receive audio and subtitles

JSON Payload

{
  "audio_base64": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjYx...", // Base64 encoded MP3 audio
  "vtt": "WEBVTT\n\n00:00.000 --> 00:01.000\nHello"  // WebVTT subtitles
}

The audio_base64 field contains MP3 audio encoded as base64. The vtt field contains subtitle cues in WebVTT format.

Notes

The API currently returns the full audio payload in a single JSON response. For large inputs, expect larger response sizes and longer processing times.

Example

cURL

curl -X POST https://voxify-labs.gaidot.net/api/synthesize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a text-to-speech synthesis test."
  }'