audio-processing

Server path: /audio-processing | Type: Embedded | PCID required: No

Tools

Tool	Description
`audio-processing_text_to_speech`	Convert text to speech using minimax/speech-02-turbo model. Generates high-quality audio from text input with configurable voice settings. Returns a URL to the generated audio file.
`audio-processing_transcribe_audio_or_video`	Transcribe audio or video files to text using Deepgram AI. Supports speaker diarization, automatic punctuation, summaries, topic extraction, and sentiment analysis. Returns the transcription text and metadata.

audio-processing_text_to_speech

Convert text to speech using minimax/speech-02-turbo model. Generates high-quality audio from text input with configurable voice settings. Returns a URL to the generated audio file. Parameters:

Parameter	Type	Required	Default	Description
`text`	string	Yes	—	Text to convert to speech. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s)
`voice_id`	string	No	`"Wise_Woman"`	Voice ID for text-to-speech generation
`pitch`	number	No	—	Speech pitch (-12 to 12, default: 0)
`speed`	number	No	—	Speech speed multiplier (0.5 to 2, default: 1)
`volume`	number	No	—	Speech volume level (0 to 10, default: 1)
`emotion`	string	No	—	Emotion to apply to speech (default: auto)
`sample_rate`	number	No	—	Audio sample rate (default: 32000)
`language_boost`	string	No	—	Language enhancement for better pronunciation

Show inputSchema

{
  "type": "object",
  "properties": {
    "text": {
      "type": "string",
      "description": "Text to convert to speech. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s)"
    },
    "voice_id": {
      "type": "string",
      "enum": [
        "Wise_Woman",
        "Friendly_Person",
        "Inspirational_girl",
        "Deep_Voice_Man",
        "Calm_Woman",
        "Casual_Guy",
        "Lively_Girl",
        "Patient_Man",
        "Young_Knight",
        "Determined_Man",
        "Lovely_Girl",
        "Decent_Boy",
        "Imposing_Manner",
        "Elegant_Man",
        "Abbess",
        "Sweet_Girl_2",
        "Exuberant_Girl"
      ],
      "default": "Wise_Woman",
      "description": "Voice ID for text-to-speech generation"
    },
    "pitch": {
      "type": "number",
      "default": 0,
      "description": "Speech pitch (-12 to 12, default: 0)"
    },
    "speed": {
      "type": "number",
      "default": 1,
      "description": "Speech speed multiplier (0.5 to 2, default: 1)"
    },
    "volume": {
      "type": "number",
      "default": 1,
      "description": "Speech volume level (0 to 10, default: 1)"
    },
    "emotion": {
      "type": "string",
      "enum": [
        "auto",
        "neutral",
        "happy",
        "sad",
        "angry",
        "fearful",
        "disgusted",
        "surprised"
      ],
      "default": "auto",
      "description": "Emotion to apply to speech (default: auto)"
    },
    "sample_rate": {
      "type": "number",
      "default": 32000,
      "description": "Audio sample rate (default: 32000)"
    },
    "language_boost": {
      "type": "string",
      "enum": [
        "None",
        "Automatic",
        "Chinese",
        "Chinese,Yue",
        "English",
        "Arabic",
        "Russian",
        "Spanish",
        "French",
        "Portuguese",
        "German",
        "Turkish",
        "Dutch",
        "Ukrainian",
        "Vietnamese",
        "Indonesian",
        "Japanese",
        "Italian",
        "Korean",
        "Thai",
        "Polish",
        "Romanian",
        "Greek",
        "Czech",
        "Finnish",
        "Hindi"
      ],
      "default": "None",
      "description": "Language enhancement for better pronunciation"
    }
  },
  "required": [
    "text"
  ]
}

audio-processing_transcribe_audio_or_video

Transcribe audio or video files to text using Deepgram AI. Supports speaker diarization, automatic punctuation, summaries, topic extraction, and sentiment analysis. Returns the transcription text and metadata. Parameters:

Parameter	Type	Required	Default	Description
`fileUrl`	string	Yes	—	URL of the audio or video file to transcribe
`model`	string	No	—	Deepgram model to use. nova-3 is the latest and most accurate (default)
`languageCode`	string	No	—	Language code (e.g., “en”, “es”, “fr”). Default: auto-detect
`enableDiarization`	boolean	No	—	Identify different speakers in the audio
`diarizationSpeakerCount`	number	No	—	Expected number of speakers (required if enableDiarization is true)
`enableParagraphs`	boolean	No	—	Format output into paragraphs
`enableSummary`	boolean	No	—	Generate AI summary of the content
`enableTopics`	boolean	No	—	Extract key topics discussed
`enableSentiment`	boolean	No	—	Analyze sentiment (positive/negative/neutral)
`redact`	string[]	No	—	Redaction options to remove sensitive information. Common: pci (credit card info), pii (personally identifiable info), phi (protected health info), numbers (numerical entities), ssn (social security numbers). Supports all options plus specific entity types for pre-recorded audio.

Show inputSchema

{
  "type": "object",
  "properties": {
    "fileUrl": {
      "type": "string",
      "description": "URL of the audio or video file to transcribe"
    },
    "model": {
      "type": "string",
      "enum": [
        "nova-3",
        "nova-2",
        "enhanced",
        "base"
      ],
      "default": "nova-3",
      "description": "Deepgram model to use. nova-3 is the latest and most accurate (default)"
    },
    "languageCode": {
      "type": "string",
      "description": "Language code (e.g., \"en\", \"es\", \"fr\"). Default: auto-detect"
    },
    "enableDiarization": {
      "type": "boolean",
      "default": false,
      "description": "Identify different speakers in the audio"
    },
    "diarizationSpeakerCount": {
      "type": "number",
      "description": "Expected number of speakers (required if enableDiarization is true)"
    },
    "enableParagraphs": {
      "type": "boolean",
      "default": false,
      "description": "Format output into paragraphs"
    },
    "enableSummary": {
      "type": "boolean",
      "default": false,
      "description": "Generate AI summary of the content"
    },
    "enableTopics": {
      "type": "boolean",
      "default": false,
      "description": "Extract key topics discussed"
    },
    "enableSentiment": {
      "type": "boolean",
      "default": false,
      "description": "Analyze sentiment (positive/negative/neutral)"
    },
    "redact": {
      "type": "array",
      "items": {
        "type": "string",
        "enum": [
          "pci",
          "pii",
          "phi",
          "numbers",
          "ssn",
          "aggressive_numbers",
          "credit_card",
          "credit_card_expiration",
          "cvv",
          "email_address",
          "phone_number",
          "account_number",
          "age",
          "date",
          "date_interval",
          "dob",
          "driver_license",
          "healthcare_number",
          "ip_address",
          "location",
          "location_address",
          "location_zip",
          "location_coordinate",
          "money",
          "numerical_pii",
          "passport_number",
          "password",
          "time",
          "vehicle_id",
          "statistics",
          "bank_account",
          "routing_number"
        ]
      },
      "description": "Redaction options to remove sensitive information. Common: pci (credit card info), pii (personally identifiable info), phi (protected health info), numbers (numerical entities), ssn (social security numbers). Supports all options plus specific entity types for pre-recorded audio."
    }
  },
  "required": [
    "fileUrl"
  ]
}

Triggers API

Platform API

Embedded MCP Servers

Application MCP Servers

Tools

audio-processing_text_to_speech

audio-processing_transcribe_audio_or_video

Triggers API

Platform API

Embedded MCP Servers

Application MCP Servers

​Tools

​audio-processing_text_to_speech

​audio-processing_transcribe_audio_or_video

Tools

audio-processing_text_to_speech

audio-processing_transcribe_audio_or_video