Optical Character Recognition (OCR) from Cape.js and PyCape.
Cape's secure OCR service confidentially detects text within pdf documents and returns a transcript and bounding boxes. This guide will walk you through the steps required to get started quickly using Cape.js and PyCape. You can find end-to-end OCR code examples in the function examples repository.
Install the Cape CLI
The Cape CLI is used to manage your Cape account and resources. In this guide, it will be used to sign up
for a Cape account and to create a personal access token for authentication.
The following command will download the appropriate version for your OS and platform, and place it under $HOME/.cape/bin
:
curl -fsSL https://raw.githubusercontent.com/capeprivacy/cli/main/install.sh | sh
Note: sudo
may be required when running this command.
Sign up for Cape
Sign up for Cape by simply running cape signup
(or cape login
if you've logged in before). Cape uses your Github account for login.
cape signup
If your terminal is able to, it will auto-launch a browser (if not, open the link provided manually). Finish the sign-up process with your browser by confirming that the code you see there matches the code you see in your terminal.
Create a personal access token (PAT)
Your personal access token will be used within your JavaScript application, and is used to identify you when making a request to the Cape OCR service. Create your PAT using the following command:
cape token create --name my-token --description 'for use with js app calling Cape OCR'
Which produces:
Success! Your token: <token string>
Note: The --name
and --description
can be anything you'd like to help you identify and manage the token later.
Configure your JavaScript application
Cape provides multiple options to install the Cape JavaScript SDK:
npm install @capeprivacy/cape-sdk
If you are using yarn:
yarn add @capeprivacy/cape-sdk
Or if you are using pnpm:
pnpm add @capeprivacy/cape-sdk
After choosing your preferred method, import the SDK into your project in one of the following ways:
ES module style: Use the import
statement (recommended):
import { Cape } from "@capeprivacy/cape-sdk";
CommonJS style: Use the const
statement:
const { Cape } = require("@capeprivacy/cape-sdk");
Script tag: Use the script
tag in an HTML document:
<script type="module">
import { Cape } from "https://cdn.skypack.dev/@capeprivacy/cape-sdk";
</script>
Then use your token as the authToken
parameter when you instantiate your Cape instance:
const authToken = "<your token>";
const client = new Cape({ authToken, capeApiUrl: 'wss://ocr.capeprivacy.com' });
OCR Model
Cape uses the docTR library for the OCR service. The OCR model consists of two steps: text detection and text recognition. More specifically, for detection, Cape uses a pre-trained DB Resnet50 architecture, and for the recognition, it uses a MobileNetV3 Small architecture. To learn more about the OCR accuracy using these two pre-trained models and how it compares against other commercial solutions, you can consult these benchmarks provided by docTR.
Invoke the OCR service
Finally, use the function name capedocs/ocr-doctr-onnx-1.0
to specify the OCR service when calling Cape.
When invoking cape.run
:
const result = await client.run({ id: "capedocs/ocr-doctr-onnx-1.0", data });
Or alternatively in cape.connect
when using cape.invoke
:
try {
await cape.connect({ id: "capedocs/ocr-doctr-onnx-1.0" });
const results = await Promise.all([
cape.invoke({ data: file1.binary }),
cape.invoke({ data: file2.binary }),
cape.invoke({ data: file3.binary }),
]);
} catch (error) {
console.error("Something went wrong", error);
}
You can also find examples encrypting a PDF before invoking the OCR in the functions repository.
OCR service output
After invoking the OCR service with cape.run
or cape.connect
, it will return a JSON object. The JSON object contains two key-value pairs:
- ocr_transcript: represents the PDF transcript (all the text contained in the PDF)
- ocr_records: contains a JSON object with the bounding boxes (delimited by the 2D coordinates of the top-left and bottom-right corner) for each word detected in the PDF. To learn about how to interpret the bounding boxes and parse this JSON output, you can consult this section of the docTR documentation.
Invoke the OCR service from PyCape
You can also invoke the OCR service from PyCape. You can install PyCape by following these instructions. Once PyCape is installed, you can call the OCR service as follow:
import json
from pycape import Cape
# Load your PDF
with open('./path/to/some-file.pdf', "rb") as f:
pdf = f.read()
# Instantiate a Cape object with the URL "wss://ocr.capeprivacy.com".
# Setting the URL to wss://ocr.capeprivacy.com will guarantee the OCR model is
# deployed to larger instances with required dependencies.
cape = Cape(url="wss://ocr.capeprivacy.com")
# Get a personal access token from the UI or the CLI with
# cape token create --name ocr
t = cape.token("<your token>")
# Select the Cape function you would like to invoke.
# Since we want invoke the ocr service, set the function ID
# to "capedocs/ocr-doctr-onnx-1.0"
f = cape.function("capedocs/ocr-doctr-onnx-1.0")
# Invoke the OCR service
result = cape.run(f, t, pdf)
# Print the transcript
print(f"OCR transcript: {json.loads(result)['ocr_transcript']}")
# Print the bounding boxes
print(f"OCR records: {json.loads(result)['ocr_records']}")
You can also find examples encrypting a PDF before invoking the OCR in the functions repository.
Next steps
Did you know that Cape also allows you to create and host your own secure functions? The Cape CLI can also deploy code that you write, and then you can invoke them with the assurance that both the code and the data will be protected. Learn more.
Join the community
Discord
Join our Cape Community Discord to ask questions, get answers, and hang out with other privacy-minded developers.
GitHub
Learn more about Cape’s implementation of confidential computing and see sample functions.