Home > Uncategorized > AWS #Transcribe Speech to Text using C#

AWS #Transcribe Speech to Text using C#

AWS has a transcribe service which converts audio containing speech to text. It is very tightly integrated with the AWS ecosystem, so it’s probably best used for systems that are already using AWS for other services – specifically, S3 for storage of audio, and perhaps Cloudwatch and Lambda for post processing.

So, the Transcribe service takes an audio file that is already in an S3 bucket with Amazon, and produces text output, which is placed in another bucket. The process is asynchronous, so it’s best to have another event (i.e. Cloudwatch + Lambda) dealing with the output.

First off, you need the Nuget package “Install-Package AWSSDK.TranscribeService” installed for your project. You should also have your local dev environment setup to access AWS via the CLI (aws configure). You don’t have to do that last step, but the code below assumes you have done this.

var client = new AmazonTranscribeServiceClient( RegionEndpoint.EUWest1);

var job = client.StartTranscriptionJobAsync(new StartTranscriptionJobRequest
{
	LanguageCode = LanguageCode.EnUS,
	Media = new Media
	{
		MediaFileUri = "s3://audioBucket/message.mp3"
	},
	MediaFormat = MediaFormat.Mp3,
	OutputBucketName = "aws.serverless.2",
	TranscriptionJobName = "message"
}).Result;

Here, we specify the input S3 Uri, which is an Mp3 file, in US English. I also specify the output bucket, and the name of the file.

This will run, and return immediately. At some time in the future, a file will appear in the output bucket with contents such as;

{
   "jobName":"hotline2",
   "accountId":"005445879168",
   "results":{
      "transcripts":[
         {
            "transcript":"thank you for Collins in his bank, the first American bank designed for international customers. Please leave your message and we will return your call shortly."
         }
      ],
      "items":[
         {
            "start_time":"0.44",
            "end_time":"0.79",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"thank"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"0.79",
            "end_time":"0.88",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"you"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"0.88",
            "end_time":"1.01",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"for"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"1.01",
            "end_time":"1.52",
            "alternatives":[
               {
                  "confidence":"0.9214",
                  "content":"Collins"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"1.52",
            "end_time":"1.65",
            "alternatives":[
               {
                  "confidence":"0.9884",
                  "content":"in"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"1.65",
            "end_time":"1.82",
            "alternatives":[
               {
                  "confidence":"0.9662",
                  "content":"his"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"1.82",
            "end_time":"2.48",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"bank"
               }
            ],
            "type":"pronunciation"
         },
         {
            "alternatives":[
               {
                  "confidence":"0.0",
                  "content":","
               }
            ],
            "type":"punctuation"
         },
         {
            "start_time":"2.51",
            "end_time":"2.77",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"the"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"2.78",
            "end_time":"3.12",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"first"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"3.12",
            "end_time":"3.63",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"American"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"3.63",
            "end_time":"3.99",
            "alternatives":[
               {
                  "confidence":"0.996",
                  "content":"bank"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"4.0",
            "end_time":"4.58",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"designed"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"4.58",
            "end_time":"4.74",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"for"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"4.74",
            "end_time":"5.39",
            "alternatives":[
               {
                  "confidence":"0.9987",
                  "content":"international"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"5.39",
            "end_time":"6.16",
            "alternatives":[
               {
                  "confidence":"0.9995",
                  "content":"customers"
               }
            ],
            "type":"pronunciation"
         },
         {
            "alternatives":[
               {
                  "confidence":"0.0",
                  "content":"."
               }
            ],
            "type":"punctuation"
         },
         {
            "start_time":"6.54",
            "end_time":"6.94",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"Please"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"6.94",
            "end_time":"7.12",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"leave"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"7.12",
            "end_time":"7.26",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"your"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"7.26",
            "end_time":"7.86",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"message"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"7.87",
            "end_time":"8.06",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"and"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"8.06",
            "end_time":"8.17",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"we"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"8.17",
            "end_time":"8.36",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"will"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"8.36",
            "end_time":"8.78",
            "alternatives":[
               {
                  "confidence":"0.5229",
                  "content":"return"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"8.78",
            "end_time":"8.95",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"your"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"8.95",
            "end_time":"9.33",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"call"
               }
            ],
            "type":"pronunciation"
         },
         {
            "start_time":"9.34",
            "end_time":"10.05",
            "alternatives":[
               {
                  "confidence":"1.0",
                  "content":"shortly"
               }
            ],
            "type":"pronunciation"
         },
         {
            "alternatives":[
               {
                  "confidence":"0.0",
                  "content":"."
               }
            ],
            "type":"punctuation"
         }
      ]
   },
   "status":"COMPLETED"
}

As you can see from the result, it can make some errors, for example, here it used the world “Collins” instead of “Calling”, so the process is not perfect. However, the word-by-word breakdown is really useful, for any other post-processing you may want to do.

This would be excellent for generating subtitles from movie audio, for example.

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: