OCR or Optical Character Recognition is the process of extracting text from an Image. Microsoft Azure offers a service within Azure, called “Computer Vision”, which offers a free tier, that you can use to run small batches of OCR on images.

Here’s some sample code to use it in C#. I’ve used the Nuget package Newtonsoft.JSON for Json processing. I’ve also omitted the key, which you can get from Azure

private static string OcrUsingAzure(string url)
const string strUrl = “https://westeurope.api.cognitive.microsoft.com/vision/v1.0/ocr?language=unk&detectOrientation=true&enhanced=True”;
var wc = new WebClient();
wc.Headers[“Ocp-Apim-Subscription-Key”] = “xxxxxxx”;
var jPost = new { url = url };
var strPost = JsonConvert.SerializeObject(jPost, Formatting.Indented);
var strJson = wc.UploadString(strUrl, “POST”, strPost);
var jObject = JObject.Parse(strJson);
var strOutput = “”;
foreach (var region in jObject[“regions”])
foreach (var line in region[“lines”])
foreach (var word in line[“words”])
strOutput += word[“text”] + ” “;
strOutput += Environment.NewLine;
}return strOutput.Trim();

You pass in a url of an image with some text, and it spits out the text the other side.

If you know in advance the language of the document, i.e. english, you can improve the accuracy by changing the language parameter in the Querystring.


