Introduction
Form Recognizer has two types of models: prebuilt and custom. Custom models enable us to train use-case specific forms. There are three prebuild models, namely Layout, Receipt, and Business Card. These can be used for specific use cases. Custom models are more expensive than prebuilt models. To learn more about pricing, go to the pricing page on the Azure Website. You can train a custom model using Form Recognizer Labelling Tool (to know how it’s done, go to the article []) or it can be done by REST API (to learn more, go to the official documentation). Here we are discussing how we can use a custom trained model for Form Recognizer v2.0.
Prerequisites
- Azure Account for creating an Azure Storage Account(If not you can get a free account with ₹13,300 worth of credits from here. If you are a student, you can verify student status to get it without entering credit card details, or else credit card details are mandatory)
- An Azure Form Recognizer Resource for testing the tool. If you don’t know how to create a Storage Account, go to the official documentation. Get the form recognizer credentials ie; End Point and API Key.
- A model ID for getting a sample labelling tool.
Usage
After the training of models from Sample Labelling Tool, we got a model ID. This is the connection between the trained model and the API request we are sending to analyze the form. Our aim is to connect the model already created and produce the result for different input. The results are saved into a file with a .json extension for future reference and see how the output looks like.
Steps
- Copy-paste the below code to a file and save with .py extension. You can use google collab or any local IDE to compile the code.
- Replace the values of
PROCESSING_DIRECTORY
andFILE_NAME
variables with the file path and file name which you would like to get the input pdf/image and store the JSON result as a file. - Replace the value of the variable
CONTENT_TYPE
if you are not using pdf as the input. - Replace values
FORM_RECOGNIZER_ENDPOINT
andFORM_RECOGNIZER_SUBSCRIPTION_KEY
variables with the credentials from Form Recognizer Resource in the Azure portal. - Replace the value of
MODEL_ID
with the model ID you got from the Sample Labelling Tool. - Run the code. You will get the result in a file with filename ‘FRResult.json’ at the
PROCESSING_DIRECTORY
you have given. The result will also be printed in the Console.
PROCESSING_DIRECTORY = '<Path to directory>'
FILE_NAME='<Name of Input File>'
CONTENT_TYPE='application/pdf'
FORM_RECOGNIZER_ENDPOINT="https://xxxxxx.cognitiveservices.azure.com/"
FORM_RECOGNIZER_SUBSCRIPTION_KEY="xxxxxxxxxxxxxx"
MODEL_ID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
with open(PROCESSING_DIRECTORY+FILE_NAME, "rb") as f:
file = f.read()
post_url = FORM_RECOGNIZER_ENDPOINT + "/formrecognizer/v2.1-preview.1/custom/models/%s/analyze" % MODEL_ID
params = {
"includeTextDetails": True
}
headers = {'Ocp-Apim-Subscription-Key': FORM_RECOGNIZER_SUBSCRIPTION_KEY,
'Content-Type': CONTENT_TYPE}
try:
resp = requests.post(url=post_url, data=file, headers=headers, params=params)
if resp.status_code != 202:
print("POST analyze failed:\n%s" % json.dumps(resp.json()))
quit()
print("POST analyze succeeded:\n%s" % resp.headers)
get_url = resp.headers["operation-location"]
#This get_url variable is used in next section to get analysed results
except Exception as e:
print("POST analyze failed:\n%s" % str(e))
quit()
# Get Analyzised Result
n_tries = 15
n_try = 0
wait_sec = 5
max_wait_sec = 60
if (1 < 2):
while n_try < n_tries:
try:
resp = requests.get(url=get_url,
headers={"Ocp-Apim-Subscription-Key": FORM_RECOGNIZER_SUBSCRIPTION_KEY})
resp_json = resp.json()
if resp.status_code != 200:
print("GET analyze results failed:\n%s" % json.dumps(resp_json))
status = resp_json["status"]
if status == "succeeded":
print("Analysis succeeded:\n%s" % json.dumps(resp_json))
# can be avoided
f = open(PROCESSING_DIRECTORY + "/" + "FRResult.json", "w+")
f.write(json.dumps(resp_json, indent=4))
f.close()
return resp_json
if status == "failed":
print("Analysis failed:\n%s" % json.dumps(resp_json))
# Analysis still running. Wait and retry.
time.sleep(wait_sec)
n_try += 1
wait_sec = min(2 * wait_sec, max_wait_sec)
except Exception as e:
msg = "GET analyze results failed:\n%s" % str(e)
print(msg)
quit()
print("Analyze operation did not complete within the allocated time.")
Note
Find more great content here!
About the Author:
AI Engineer | Blogger | Speaker | Mentor | Tech Enthusiast
Reference:
Jose, R. (2020). How To Use Custom Models Trained From Form Recognizer Labelling Tool. Available at: https://www.c-sharpcorner.com/article/how-to-use-custom-models-trained-from-form-recognizer-labelling-tool/[Accessed: 11th January 2021].