Skip to main content

Using Vector Search

When you have a set of documents or a very large document where only a portion of the content may be useful for answering your query to an AI model you will want to ensure that the content sent to such a model is relevant and fits within the respective input limits.

This can be achieved by uploading the document to Cape's secure vector store and utilizing vector search.

Get an API Key

This tutorial assumes you have an environment variable CAPE_API_KEY that contains an API Key. See the previous tutorial if you have not done this.

Upload a document

To start, upload your file or text using the respective endpoint. Documents are organized with respect to keys and you can search all documents that are located together under one key. For more information in document handling including privacy measures and uploading text check out the document tutorial.

Here we are uploading the file wikipedia_LLM.pdf under the key search-tutorial. This is an export of Wikipedia's page on large language models.

import os
import requests

with open('wikipedia_LLM.pdf', 'rb') as f:
resp =
headers={"Authorization": f"Bearer {os.getenv('CAPE_API_KEY')}"},
files={"file": f}
"message": "document added to vectorstore",
"document": {
"id": "a14e217a-c0b4-4f3d-949c-5422295d0e15",
"filename": "wikipedia_LLM.pdf",
"tokens": 22619

You can repeat this upload process using the same key for all the documents you wish to be able to search together. For example we'll also upload the Wikipedia article on machine learning so that we can search across both documents.

Now that we have content uploaded we can search the content using either a similarity search or a max marginal relevance search.

  • Similarity Search
    • Searches for content most similar to the query
  • Max Marginal Relevance Search
    • Optimizes for similarity and diversity among selected documents

When you have a lot of very similar documents that you are searching against, a max marginal relevance search may improve your results as they ensure the returned content is relevant but also has diverse information.

Here we will do a search using the similarity search tool to find content in the documents similar to our query parameter of training. The search results can be retrieved in either their plaintext or redacted form using the optional format query parameter.

import os
import requests

params = {
"query": "training",
"nb_chunks": 5,
"retriever_type": "similarity-search"
"format": "redacted"
resp = requests.get(
headers={"Authorization": f"Bearer {os.getenv('CAPE_API_KEY')}"},
"chunks": [
"fine tune the model with additional task-specific training. It has subsequently been found that more powerful LLMs such as GPT-3\ncan solve tasks without additional training via ...",
"trained on many examples of tasks formulated as natural language instructions, along with appropriate responses.\nVarious techniques for instruction...",
"Reinforcement learning is an area of [OCCUPATION_1] concerned with how software agents ought to...",
"good solutions to a given problem. In [OCCUPATION_1], genetic algorithms...",
"it must perform a certain goal (such as driving a vehicle or playing a game against an\nopponent). As it navigates its problem space..."
"count": 5

The example output above has been reduced for readability but by looking at the original documents we can see that 2 chunks originate from the LLM document and 3 from the ML article taking the relevant pieces of the articles regarding training models so that the information can be amalgamated into a useful context for processing.

These chunks can then be sent to a model of your choice. For information on how to do so, check out the Integrating with OpenAI Tutorial