Home > Uncategorized > Audio #Transcription (#Speech to #Text) With #Bing #CognitiveServices

Audio #Transcription (#Speech to #Text) With #Bing #CognitiveServices

bing speech

Converting Speech to text is a challenge for computers, and it’s far from perfect. However, voice control is a feature that many consumers come to expect, and companies like Google, IBM and Microsoft offer it as a service that you can plug in to. This article explores Microsoft’s Cognitive Services in C#.

You will need an API key from Microsoft Cognitive Services to run this code, and you will need to run

Install-Package Microsoft.Bing.Speech

On a project that is running at least .NET 4.5. This code example runs on VS2013 and above, and is designed for Synchronous operation.

First you need to write a class to authenticate you against Microsoft:

public sealed class CognitiveServicesAuthorizationProvider : IAuthorizationProvider
{
///
/// The fetch token URI
///

private const string FetchTokenUri = “https://api.cognitive.microsoft.com/sts/v1.0/issueToken”;

///
/// The subscription key
///

private readonly string subscriptionKey;

///
/// Initializes a new instance of the class.
///

///The subscription identifier. public CognitiveServicesAuthorizationProvider(string subscriptionKey)
{
this.subscriptionKey = subscriptionKey;
}

///
/// Gets the authorization token asynchronously.
///

///
/// A task that represents the asynchronous read operation. The value of the string parameter contains the next the authorization token.
///
///
/// This method should always return a valid authorization token at the time it is called.
///
public Task GetAuthorizationTokenAsync()
{
Func fn = () =>
{
var client = new WebClient();
client.Headers.Add(“Ocp-Apim-Subscription-Key”, subscriptionKey);
string strToken = client.UploadString(FetchTokenUri, “”);
return strToken;
};
return Task.Run(fn);
}
}

This class makes a HTTP POST request to https://api.cognitive.microsoft.com/sts/v1.0/issueToken – with a header named Ocp-Apim-Subscription-Key with your API key in it. – It is called internally by the speech API, in the following class;

public class SpeechAPI
{
private TimeSpan Timeout = TimeSpan.FromSeconds(30);

private string result = “”;

///
/// Invoked when the speech client receives a phrase recognition result(s) from the server.
///

///The recognition result. ///
/// A task
///
public Task OnRecognitionResult(RecognitionResult args)
{
var response = args;
Console.WriteLine();
var words = new List();
if (response.Phrases != null)
{
foreach (var phrase in response.Phrases)
{
var strNewWords = phrase.LexicalForm.Split(new[] { ‘ ‘ });
words.AddRange(strNewWords.Where(w => !words.Contains(w)));
}
}
var strCompleteText = string.Join(” “, words);
this.result = strCompleteText;
return Task.FromResult(true);
}
public string RecognizeAudio(string AudioUrl)
{
var ShortPhraseUrl = new Uri(@”wss://speech.platform.bing.com/api/service/recognition”);
var preferences = new Preferences(“en-US”, ShortPhraseUrl, new CognitiveServicesAuthorizationProvider(“{{YOUR KEY HERE}}”));
var speechClient = new SpeechClient(preferences);
speechClient.SubscribeToRecognitionResult(OnRecognitionResult);
WebClient wc = new WebClient();
var audio = wc.OpenRead(AudioUrl);
var deviceMetadata = new DeviceMetadata(DeviceType.Near, DeviceFamily.Desktop, NetworkType.Ethernet, OsName.Windows, “1607”, “Dell”, “T3600”);
var applicationMetadata = new ApplicationMetadata(“SampleApp”, “1.0.0”);
var requestMetadata = new RequestMetadata(Guid.NewGuid(), deviceMetadata, applicationMetadata, “SampleAppService”);
var input = new SpeechInput(audio, requestMetadata);
var cts = new CancellationTokenSource();
speechClient.RecognizeAsync(input, cts.Token);
DateTime dtStart = DateTime.Now;
while(true)
{
if (DateTime.Now – dtStart > Timeout)
{
return “”;
}
Thread.Sleep(100);
if (result != “”)
{
return result;
}
}
}
}

The core method of this class is RecognizeAudio, which accepts a URL where some audio is hosted, and it returns a string. Under the hood, RecognizeAsync runs asynchronously, and this method polls at 100ms intervals until the recognition is complete.

The class is called with code similar to:

var speechAPI = new SpeechAPI();
var strText = speechAPI.RecognizeAudio(strAudioUrl);
Console.WriteLine(“Recognised as : ” + strText);

This code is used in http://www.cloudansweringmachine.com to transcribe voicemails – Which will be available in the next version of the App.

Advertisement
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: