Skip to main content
Server path: /audio-processing | Type: Embedded | PCID required: No

Tools

ToolDescription
audio-processing_text_to_speechConvert text to speech using minimax/speech-02-turbo model. Generates high-quality audio from text input with configurable voice settings. Returns a URL to the generated audio file.
audio-processing_transcribe_audio_or_videoTranscribe audio or video files to text using Deepgram AI. Supports speaker diarization, automatic punctuation, summaries, topic extraction, and sentiment analysis. Returns the transcription text and metadata.

audio-processing_text_to_speech

Convert text to speech using minimax/speech-02-turbo model. Generates high-quality audio from text input with configurable voice settings. Returns a URL to the generated audio file. Parameters:
ParameterTypeRequiredDefaultDescription
textstringYesText to convert to speech. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s)
voice_idstringNo"Wise_Woman"Voice ID for text-to-speech generation
pitchnumberNoSpeech pitch (-12 to 12, default: 0)
speednumberNoSpeech speed multiplier (0.5 to 2, default: 1)
volumenumberNoSpeech volume level (0 to 10, default: 1)
emotionstringNoEmotion to apply to speech (default: auto)
sample_ratenumberNoAudio sample rate (default: 32000)
language_booststringNoLanguage enhancement for better pronunciation

audio-processing_transcribe_audio_or_video

Transcribe audio or video files to text using Deepgram AI. Supports speaker diarization, automatic punctuation, summaries, topic extraction, and sentiment analysis. Returns the transcription text and metadata. Parameters:
ParameterTypeRequiredDefaultDescription
fileUrlstringYesURL of the audio or video file to transcribe
modelstringNoDeepgram model to use. nova-3 is the latest and most accurate (default)
languageCodestringNoLanguage code (e.g., “en”, “es”, “fr”). Default: auto-detect
enableDiarizationbooleanNoIdentify different speakers in the audio
diarizationSpeakerCountnumberNoExpected number of speakers (required if enableDiarization is true)
enableParagraphsbooleanNoFormat output into paragraphs
enableSummarybooleanNoGenerate AI summary of the content
enableTopicsbooleanNoExtract key topics discussed
enableSentimentbooleanNoAnalyze sentiment (positive/negative/neutral)
redactstring[]NoRedaction options to remove sensitive information. Common: pci (credit card info), pii (personally identifiable info), phi (protected health info), numbers (numerical entities), ssn (social security numbers). Supports all options plus specific entity types for pre-recorded audio.