Skip to main content

Optical Character Recognition (OCR) from Cape.js and PyCape.

Cape's secure OCR service confidentially detects text within pdf documents and returns a transcript and bounding boxes. This guide will walk you through the steps required to get started quickly using Cape.js and PyCape. You can find end-to-end OCR code examples in the function examples repository.

Install the Cape CLI

The Cape CLI is used to manage your Cape account and resources. In this guide, it will be used to sign up for a Cape account and to create a personal access token for authentication. The following command will download the appropriate version for your OS and platform, and place it under $HOME/.cape/bin:

curl -fsSL https://raw.githubusercontent.com/capeprivacy/cli/main/install.sh | sh

Note: sudo may be required when running this command.

Sign up for Cape

Sign up for Cape by simply running cape signup (or cape login if you've logged in before). Cape uses your Github account for login.

cape signup

If your terminal is able to, it will auto-launch a browser (if not, open the link provided manually). Finish the sign-up process with your browser by confirming that the code you see there matches the code you see in your terminal.

Create a personal access token (PAT)

Your personal access token will be used within your JavaScript application, and is used to identify you when making a request to the Cape OCR service. Create your PAT using the following command:

cape token create --name my-token --description 'for use with js app calling Cape OCR'

Which produces:

Success! Your token: <token string>

Note: The --name and --description can be anything you'd like to help you identify and manage the token later.

Configure your JavaScript application

Cape provides multiple options to install the Cape JavaScript SDK:

npm install @capeprivacy/cape-sdk

If you are using yarn:

yarn add @capeprivacy/cape-sdk

Or if you are using pnpm:

pnpm add @capeprivacy/cape-sdk

After choosing your preferred method, import the SDK into your project in one of the following ways:

ES module style: Use the import statement (recommended):

import { Cape } from "@capeprivacy/cape-sdk";

CommonJS style: Use the const statement:

const { Cape } = require("@capeprivacy/cape-sdk");

Script tag: Use the script tag in an HTML document:

<script type="module">
import { Cape } from "https://cdn.skypack.dev/@capeprivacy/cape-sdk";
</script>

Then use your token as the authToken parameter when you instantiate your Cape instance:

const authToken = "<your token>";
const client = new Cape({ authToken, capeApiUrl: 'wss://ocr.capeprivacy.com' });

OCR Model

Cape uses the docTR library for the OCR service. The OCR model consists of two steps: text detection and text recognition. More specifically, for detection, Cape uses a pre-trained DB Resnet50 architecture, and for the recognition, it uses a MobileNetV3 Small architecture. To learn more about the OCR accuracy using these two pre-trained models and how it compares against other commercial solutions, you can consult these benchmarks provided by docTR.

Invoke the OCR service

Finally, use the function name capedocs/ocr-doctr-onnx-1.0 to specify the OCR service when calling Cape.

When invoking cape.run:

const result = await client.run({ id: "capedocs/ocr-doctr-onnx-1.0", data });

Or alternatively in cape.connect when using cape.invoke:

try {
await cape.connect({ id: "capedocs/ocr-doctr-onnx-1.0" });

const results = await Promise.all([
cape.invoke({ data: file1.binary }),
cape.invoke({ data: file2.binary }),
cape.invoke({ data: file3.binary }),
]);
} catch (error) {
console.error("Something went wrong", error);
}

You can also find examples encrypting a PDF before invoking the OCR in the functions repository.

OCR service output

After invoking the OCR service with cape.run or cape.connect, it will return a JSON object. The JSON object contains two key-value pairs:

  • ocr_transcript: represents the PDF transcript (all the text contained in the PDF)
  • ocr_records: contains a JSON object with the bounding boxes (delimited by the 2D coordinates of the top-left and bottom-right corner) for each word detected in the PDF. To learn about how to interpret the bounding boxes and parse this JSON output, you can consult this section of the docTR documentation.

Invoke the OCR service from PyCape

You can also invoke the OCR service from PyCape. You can install PyCape by following these instructions. Once PyCape is installed, you can call the OCR service as follow:

import json

from pycape import Cape

# Load your PDF
with open('./path/to/some-file.pdf', "rb") as f:
pdf = f.read()

# Instantiate a Cape object with the URL "wss://ocr.capeprivacy.com".
# Setting the URL to wss://ocr.capeprivacy.com will guarantee the OCR model is
# deployed to larger instances with required dependencies.
cape = Cape(url="wss://ocr.capeprivacy.com")

# Get a personal access token from the UI or the CLI with
# cape token create --name ocr
t = cape.token("<your token>")

# Select the Cape function you would like to invoke.
# Since we want invoke the ocr service, set the function ID
# to "capedocs/ocr-doctr-onnx-1.0"
f = cape.function("capedocs/ocr-doctr-onnx-1.0")

# Invoke the OCR service
result = cape.run(f, t, pdf)

# Print the transcript
print(f"OCR transcript: {json.loads(result)['ocr_transcript']}")

# Print the bounding boxes
print(f"OCR records: {json.loads(result)['ocr_records']}")

You can also find examples encrypting a PDF before invoking the OCR in the functions repository.

Next steps

Did you know that Cape also allows you to create and host your own secure functions? The Cape CLI can also deploy code that you write, and then you can invoke them with the assurance that both the code and the data will be protected. Learn more.

Join the community