Scade knowledge base
  • What is Scade?
  • Quick start
    • How to create a flow
  • Build a flow
    • What is Flow?
    • How to create a flow
    • What is a Node
    • What are the Start and End Nodes?
    • How to add nodes (AIs and tools) to your flow
    • Top Nodes settings
      • Large language models
      • Image generation
      • Transcribe audio to text (Whisper)
      • Image background removal options
    • How to Connect Nodes
    • How to Use Expression Editor
    • How to Copy Generated Images or Text
    • What is the 'View Source' of a Node
    • How to add python code
  • Flow examples
    • Building a flow: create promo cards of a product
    • Building a flow: a virtual AI editorial office
    • Building a flow: video transcription and summarization
    • 5 minute challenge: compare different LLMs
    • 5 minute challenge: upscale and colorize photos
    • 5 minute challenge: summarize audio
  • Publish
    • Run flows via API
  • Pricing and credits
Powered by GitBook
On this page
  1. Build a flow
  2. Top Nodes settings

Transcribe audio to text (Whisper)

Whisper is a speech-to-text recognition model

  • You can upload your audio in a few ways:

    • From the Internet: Provide a direct link (URL) to the audio file.

    • From Your Computer: Upload the audio file directly from your device.

  • File String: Upload the audio in string format if needed.

  • Priority Queue: Set the order in which your request is processed.

  • File URL: Enter the direct URL of the audio file.

  • Group Segments: Decide if you want to group transcription segments together.

  • Transcript Output Format:

    • Words: Get a plain text transcript containing only words.

    • Segments: Get a transcript with timestamps for creating subtitles.

  • Num Speakers: Specify the number of speakers, or let Whisper automatically detect them.

  • Language: Select the language of the audio file, or let Whisper automatically detect it for you.

  • Prompt: Add any specific words or phrases used in your audio to make the transcription more accurate.

Offset Seconds: Set a starting point if you need the transcription to begin at a specific point in the audio.

PreviousImage generationNextImage background removal options

Last updated 10 months ago