Skip to main content

Integrating with Cape Confidential OCR

This tutorial shows how to use the Cape API to make a Confidential OCR (Optical Character Recognition) request. Cape processes the OCR request inside of an enclave so that no inputs or output can be seen by anyone but you.

Get a Cape API Key

This tutorial assumes you have an environment variable CAPE_API_KEY that contains an API Key. You will need to signup for a Cape account, and then you can get an API key here.

Using the Cape API to call Confidential OCR

Here's how to use Python to make a request to the Cape API OCR endpoint. Please see the API docs for more details.

import os
import requests

url = ""

headers = {
"Authorization": f"Bearer {os.getenv('CAPE_API_KEY')}"

file_response = requests.get("")

response =, files={"file": ("credit_card_app.pdf", file_response.content, "application/pdf")}, headers=headers)

data = response.json()

You should see some output like:

'West Texas National Bank\n\nVISA and MasterCard" Consumer Credit Card Application\n\nPLEASE CHOOSE CARD TYPE: a VISA Platinum\n\nOVISA Classic\n\nGold MasterCard\n(Co-Applicant Initials)\n\nAND\n\nPLEASE CHOOSE BENEFIT TYPE: E Preferred Points Card Low Rate Card\n\nEWE INTEND TO APPLY FOR JOINT CREDIT: JMS\n\n(Applicant Initials) JS\n\nIMPORTANT INFORMATION ABOUT PROCEDURES FOR OPENING AN ACCOUNT: To help the government fight the funding of terrorism and money\nlaundering activities, Federal law requires all financial institutions to obtain, verify, and record information that identifies each person who opens an account.\nWHAT THIS MEANS FOR YOU: When you open an account, we will ask for your name, address, date of birth, and other information that will allow us to identify\nMARRIED WI RESIDENTS: If you are applying for an individual account or aj joint account with...

Bounding Boxes

The output also contains additional information about the bounding boxes are each detected word. You can access this data from data["ocr_records"]. Checkout this documentation for more information.

OCR Model

Cape uses the docTR library for the OCR service. The OCR model consists of two steps: text detection and text recognition. More specifically, for detection, Cape uses a pre-trained DB Resnet50 architecture, and for the recognition, it uses a MobileNetV3 Small architecture. To learn more about the OCR accuracy using these two pre-trained models and how it compares against other commercial solutions, you can consult these benchmarks provided by docTR.