5 minute challenge: summarize audio

This guide will show you how to swiftly build from scratch a workflow that transcribes and summarizes audio recordings

  1. Go to https://app.scade.pro/flow/ and click the New flow button and then choose Start with blank

  1. There will already be Start end End nodes on the canvas. Place everything else between them. Go to the Settings of the Start Node Form and click Configure fields to start adding your data

  1. Add one field, call it, for example, audio, and change the input type to String/URI to make it suitable for storing files. Click Save

  1. You can upload the audio file from your computer or, alternatively, use the Set url button if you have a link to the file. We'll use Carl Sagan's "The pale blue dot" speech as an example file. Here's the link: https://tile.loc.gov/storage-services/media/ls/sagan/1958124-3-1.mp3

  1. Hit the Execute button and then click Save and execute to run this node and generate the output. We need it to move further.

  1. Use the search panel on the left to find a processor that transcribes audio. If you already know the name of your desired processor, type it. Alternatively, you can type what you want to do — like audio to text there and pick by the name and description. whisperx-a40-large looks promising, drag it to the canvas

  1. Connect the audio output of the Start node with the input of the whisperx node and click the Execute button on the latter. It will take a while to transcribe an audio file

  1. Drag the ChatGPT processor to your canvas, connect the output of the whisperx with the ChatGPT's input and go to the settings of the latter. Review the instructions in paragraphs 6-8 if needed. You can move your nodes around whatever you like to create a convinient workspace

Locate the Messages section in the settings and click the pencil to add instructions for the ChatGPT

  1. Click Add message. Change the type of the message to System as recommended by OpenAI to emphasize that it's a high-level instruction and to guide the model's behaviour

The prompt here is rather straightforward: summarize a following audio transcript

  1. Add another message and click the # symbol. The Expression editor will open. In case of the ChatGPT node we need it to get data from other nodes

There is a list of the nodes on the left. Click on the whisprex-a40-large to see its output data. Keep going until you end up with segments — that's where the transcription is stored. Drag the segments to the expression field

The result should look similar to this. Don't forget to click the Save buttons in the Expression editor and before closing the node settings.

Click the Execute button on the ChatGPT node after you set everything

  1. In principle, you are good to go now: the ChatGPT node generates summary, and you can copy and use it. But let's do some housekeeping to make sure that this workflow will be reusable and suitable to API integrations. Open settings of the End Node Form

Add a field and name it, there's no need to change the field type now

Connect the success output of the ChatGPT node with the newly generated input of the End node and execute the End node

  1. Now you have your result in the End node. Why it's important? We'll cover this later If you hover the text you'll see the button to copy and take it elsewhere

  1. Once you have the workflow built, there's no need to run nodes one by one anymore. Simply change the file in the Start node and use the Play button on the top panel to launch the entire workflow.

  1. And one last thing. It's a good idea to give your workflow a comprehensive name. Locate the cog in the top bar, fill in the suitable Flow name and click Submit

Last updated