Archive

Archive for October, 2021

Asynchronous Pre-Request script in #Postman

A Pre-request script in Postman allows you to dynamically change the request body based on any JavaScript function you provide, this sometimes is done to dynamically calculate HMACs in the request, such that you don’t have to manually calculate it before every request.

However, it’s not obvious how to create an asynchronous pre-request script, that is one that does not return instantly, but has to wait for some external factor (i.e. a HTTP Request), before completing the main request.

Here is the simple example, where we want to inject the user’s IP address into the HTTP request body before sending; as follows –

new Promise(resolve => 

pm.sendRequest("http://checkip.amazonaws.com/",function(e, r){
   pm.request.body.update(JSON.stringify(r.text()));
   resolve();
})


);

And, the main request will not execute until the promise returned in the pre-request script is resolved.

Categories: Uncategorized

Firewall rules to allow internal connection to #AWS Elastic IP

This is probably a very specific problem, that I’m not sure that many people will have, but I’ll share the problem and solution here, since it’s not very obvious.

Given a piece of software, with a config file containing a DSN. I want the DSN to be the same on Dev as on Production, so that there are no “Works on my machine” errors.

My server on AWS has an Elastic IP, and a windows firewall to permit remote access to limited IP addresses to the Database port.

On Dev, we point the DSN to the Elastic IP, and all is good. On Prod, the same Elastic IP times out. — help!!

SQLCMD LOCALHOST -> Works

SQLCMD PRIVATE IP -> Works

SQLCMD ELASTIC IP -> Times out (only on same machine)

Obviously “LOCALHOST” and “PRIVATE IP” are not going to work from DEV.

So, the solution; Add the ELASTIC IP into the Scope on the Firewall !!

Categories: Uncategorized

Comparing #OCR services on #handwritten text.

1FilenameImageTesseractOCR.SPACEAzureIRON OCRAWS TexttractAWS Textract (DDT)
21457617912-CROP.jpg1FTME1HL84DA439N1FTNELGLRDATNG1FT 4L87DA739141FT4L87DA73914
31457617924-CROP.jpgMGPWUEWLMLDLEPLLL1R29LDSLLD4GP4412R296096LD4GP4412R296096
41457638629-CROP.jpg2STME20U071MEMJTNCR2000776480GL
51457643042-CROP.jpg5H63H5H8SFHTFMSHEKHSHESTMTHYBHGGKSH85FM7499BHGGKSH85FM7499
61457670471-CROP.jpgJNFML3WFNSWSS3IN4IN4.
71457670537-CROP.jpgLNEPALBAPEMWM()VE-IV 1ANDPALZAPLENZFRSTLNPAL3,LNPAL3,
81457677623-CROP.jpgTUBM1FXMGW1AG1TTRMEX1WUS1720SJTJBMTEXIH5176297JTJBMTEXIH5176297
91457677635-CROP.jpgMJHSHGB91JTTTRMTEXHES17963932JTJ8M7FX1H5176397JTJ8M7FX1H5176397
101457734011-CROP.jpgFWHATSGSGCMDWW9UUASCSATATL1944A86509ALY71944A86509ALY7
11EASY-CROP.jpgWAUDH74F16N117113WAUDH74F16N117113WAUDH74F16N117113WAUDH74F16N117113WAUDH74F16N117113WAUDH74F16N117113

Given 10 images, 9 containing handwritten text, 1 containing printed text, I ran these through five different OCR services to compare accuracy. Each one could correctly interpret the printed text, but the handwritten text was not accurately recognised by any service, except for AWS Textract.

There were a few “didn’t run” contenders, because I couldn’t get their API to work in the time I allotted myself (one hour). Which were FileStack, Mindee, and Google Cloud Vision. These may have returned better results but the APIs were too convolute to run a simple test.

First up is Tesseract, which was some software running locally, with the following additional parameters,

-psm 8 -c tessedit_char_whitelist=ABCDEFGHJKLMNPRSTUVWXYZ0123456789

What does this mean. Well – the text are handwritten VIN numbers, which do not include the letters O, I and Q because these are too similar to numbers, and the text was in uppercase, and one word.

Tesseract made a good attempt, and fared very well against commercial offerings, but in effect, it was wrong on each example apart from the printed text.

OCR.SPACE is a free OCR API, and was easy to get started with; You should get your own API key, but this key is free, so I don’t care if it’s public

private static string OcrSpace(string imageFileLocation)
{
	
	var postData = "apikey=b8fd788a8b88957"; 
	postData += "&url=" + HttpUtility.UrlEncode(imageFileLocation);
	const string strUrl = "https://api.ocr.space/parse/image";
	var web = new WebClient();
	web.Headers.Add(HttpRequestHeader.ContentType, "application/x-www-form-urlencoded");
	var json = "";
	try
	{
		json = web.UploadString(strUrl, postData);
	}
	catch (WebException ex)
	{
		var resp = new StreamReader(ex.Response.GetResponseStream()).ReadToEnd();
		Console.WriteLine(resp);
	}
   
	var jParsed = JObject.Parse(json);
	var result = jParsed["ParsedResults"].FirstOrDefault();
	return result["ParsedText"] + "";
}

This code takes in an image URL and returns text – very simple, but it returns an empty string when it fails to recognise, so it was one of the worst performers.

Microsoft Azure computer vision was pretty useless too with handwritten text. Returning either nothing or complete garbage. Although it was very fast.

private static string Azure(string imageFileLocation)
{ 
	const string strUrl = "https://westeurope.api.cognitive.microsoft.com/vision/v1.0/ocr?language=unk&detectOrientation=true";
	var wc = new WebClient();
	wc.Headers["Ocp-Apim-Subscription-Key"] = "**REDACTED**";
	var jPost = new { url = imageFileLocation };
	var post = JsonConvert.SerializeObject(jPost, Formatting.Indented);
	var json = wc.UploadString(strUrl, "POST", post);
	var jObject = JObject.Parse(json);
	var output = "";
	foreach (var region in jObject["regions"])
	{
		foreach (var line in region["lines"])
		{
			foreach (var word in line["words"])
			{
				output += word["text"] + " ";
			}
			output += Environment.NewLine;
		}
	}
	return output.Trim();
}

IRON OCR is also based on tesseract, and preformed similarly to the local Tesseract version. Very easy to use, but comes with a price tag. Not having to upload the image to temporary storage is a plus.

private static string ironOCR(string filename)
{
	var engine = new IronTesseract
	{
		Configuration =
		{
			WhiteListCharacters = "ABCDEFGHJKLMNPRSTUVWXYZ0123456789",
	 
		}
	};
	var Result = engine.Read(filename).Text;
	return Result;
}

The winning service that I tried was AWS textract, and I tested it using their online demo:

https://eu-west-1.console.aws.amazon.com/textract/home?region=eu-west-1#/demo

Here is the equivalent code;

private static string Textract(string filename)
{
	var readFile = File.ReadAllBytes(filename);
	var stream = new MemoryStream(readFile);
	var client = new AmazonTextractClient();
	
	var ddtRequest = new DetectDocumentTextRequest
	{
		Document = new Document
		{
			Bytes = stream
		}
	};
	var detectDocumentTextResponse = client.DetectDocumentText(ddtRequest);
	var words = detectDocumentTextResponse.Blocks
		.Where(b => b.BlockType == BlockType.WORD)
		.Select(b => b.Text)
		.ToArray();
	var result = string.Join("", words);
	return result;
}

Categories: Uncategorized