Skip to content

Speech To Text

This task will transcribe an audio file and output the resulting text to an XML file.

Settings

Language

Optional
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

This setting will be overridden if the Language attribute is provided in the input file.

Model

Required
Select the ID of the model to use to generate the response. Please refer to the model endpoint compatibility table for details on which models work with the Chat API. Currently, only whisper-1 is supported.

This setting will be overridden if the Model attribute is provided in the input file.

Prompt

Optional
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. More guidance about the prompt can be found here.

This setting will be overridden if the Prompt attribute is provided in the input file.

Temperature

Optional
The sampling temperature to use, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

This setting will be overridden if the Temperature attribute is provided in the input file. It will default to 0 if not specified here or in the input file.

Connection

Required
The ChatGPT connection to use. See Connecting to ChatGPT.

Input File

Required
The XML file containing the data to send to ChatGPT. An example of the XML format is shown below.

Output File

Required
The XML file to save the results of the operation to. The XML format will be the same as the input file.

Zynk Settings

See Common Task Settings

Examples

A sample input file is shown below. For full documentation see Chat GPT Speech To Text XML.

<?xml version="1.0"?>
<Transcriptions>
    <Options Model="whisper-1" />
    <Transcription Id="1">
        <Options Language="en" Prompt="The transcript is about Zynk, a piece of software used to automate business processes." Temperature="0.5" />
        <File>C:\Users\admin\Downloads\sample_dialog.mp3</File>
    </Transcription>
</Transcriptions>

A sample output file is shown below.

<?xml version="1.0"?>
<Transcriptions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Options Model="whisper-1" />
    <Transcription Id="1">
        <Options Language="en" Prompt="The transcript is about Zynk, a piece of software used to automate business processes." Temperature="0.5" />
        <File>C:\Users\admin\Downloads\sample_dialog.mp3</File>
        <Segments>
            <Segment> Hello, Ben speaking, how can I help?</Segment>
            <Segment> Hi, is John available please?</Segment>
            <Segment> I'll just check for you, can I ask who's calling?</Segment>
            <Segment> uh yeah it's Reece.</Segment>
            <Segment> Right, bear with me just a moment Reece.</Segment>
            <Segment> Thanks.</Segment>
            <Segment> It looks like he's on the other line at the moment, can I get him to give you a call back?</Segment>
            <Segment> Sure, let me give you our number. 0123 456 7890</Segment>
            <Segment> No problem at all. Shouldn't be too long..</Segment>
            <Segment> Alright, anyway thanks for helping me.</Segment>
            <Segment> OK, cheers man.</Segment>
        </Segments>
        <Text>Hello, Ben speaking, how can I help? Hi, is John available please? I'll just check for you, can I ask who's calling? uh yeah it's Reece. Right, bear with me just a moment Reece. Thanks. It looks like he's on the other line at the moment, can I get him to give you a call back? Sure, let me give you our number. 0123 456 7890 No problem at all. Shouldn't be too long.. Alright, anyway thanks for helping me. OK, cheers man.</Text>
        <Errors />
    </Transcription>
</Transcriptions>