Skip to content

API Reference#

pycape.Cape#

This is the main class you instantiate to access the PyCape API.

Use to authenticate with the Cape Cloud and manage top-level resources such as Project.

delete_project(self, id) #

Delete a Job by ID.

Parameters:

Name Type Description Default
id str

ID of Project.

required

Returns:

Type Description
int

A success messsage write out.

Source code in pycape/api/cape/cape.py
def delete_project(self, id: str) -> int:
    """
    Delete a `Job` by ID.

    Arguments:
        id: ID of `Project`.
    Returns:
        A success messsage write out.
    """
    self.__requester.archive_project(id=id)
    return self._out.write(f"Project ({id}) deleted" + "\n")

get_project(self, id=None, label=None) #

Query a Project by either ID or label.

Returns:

Type Description
Project

A list of Project instances.

Parameters:

Name Type Description Default
id Optional[str]

ID of Project.

None
label Optional[str]

Unique Project label.

None

Returns:

Type Description
Project

A Project instance.

Source code in pycape/api/cape/cape.py
def get_project(
    self, id: Optional[str] = None, label: Optional[str] = None
) -> Project:
    """
    Query a `Project` by either ID or label.

    Returns:
        A list of `Project` instances.

    Arguments:
        id: ID of `Project`.
        label: Unique `Project` label.
    Returns:
        A `Project` instance.
    """
    project = self.__requester.get_project(id=id, label=label)
    return Project(requester=self.__requester, user_id=self.__user_id, **project)

list_projects(self) #

Returns all list of projects that requesting user is a contributor of.

Returns:

Type Description
List[pycape.pycape.api.project.project.Project]

A list of Project instances.

Source code in pycape/api/cape/cape.py
def list_projects(self) -> List[Project]:
    """
    Returns all list of projects that requesting user is a contributor of.

    Returns:
        A list of `Project` instances.
    """
    projects = self.__requester.list_projects()
    get_project_values = [
        Project(requester=self.__requester, user_id=self.__user_id, **p)
        for p in projects
    ]
    format_projects = {
        "PROJECT ID": [x.id for x in get_project_values],
        "NAME": [x.name for x in get_project_values],
        "LABEL": [x.label for x in get_project_values],
    }
    self._out.write(tabulate(format_projects, headers="keys") + "\n")
    return [
        Project(requester=self.__requester, user_id=self.__user_id, **p)
        for p in projects
    ]

login(self, token=None) #

Calls POST /v1/login. Authenticate with Cape Cloud in order to make subsequent requests.

Parameters:

Name Type Description Default
token Optional[str]

User authentication token.

None

Returns:

Type Description
None

A success messsage write out.

Source code in pycape/api/cape/cape.py
def login(self, token: Optional[str] = None) -> None:
    """
    Calls `POST /v1/login`. Authenticate with Cape Cloud in order to make subsequent requests.

    Arguments:
        token:  User authentication token.
    Returns:
        A success messsage write out.
    """
    self.__user_id = self.__requester.login(token=token)
    self._out.write("Login successful\n")
    return

pycape.Project#

Projects are the business contexts in which you collaborate with other organizations or Cape users to train models.

Parameters:

Name Type Description Default
id str

ID of Project.

required
name str

name of Project.

required
label str

label of Project.

required
description str

description of Project.

required
owner dict

Returned dictionary of fields related to the Project owner.

required
organizations list

Returned list of fields related to the organizations associated with the Project.

required
dataviews list

Returned list of DataViews added to the Project.

required

create_dataview(self, name, uri, owner_id=None, owner_label=None, schema=None, development=False) #

Creates a DataView in Cape Cloud. Returns created Dataview

Parameters:

Name Type Description Default
name str

a name for the DataView.

required
uri str

URI location of the dataset.

required
owner_id Optional[str]

The ID of the organization that owns this dataset.

None
owner_label Optional[str]

The label of the organization that owns this dataset.

None
schema Union[pandas.core.series.Series, List]

The schema of the data that DataView points to. A string value for each column's datatype. Possible datatypes: string integer number datetime

None
development bool

Whether the created dataview is in development mode or not.

False

Returns:

Type Description
DataView

A DataView instance.

Source code in pycape/api/project/project.py
def create_dataview(
    self,
    name: str,
    uri: str,
    owner_id: Optional[str] = None,
    owner_label: Optional[str] = None,
    schema: Union[pd.Series, List, None] = None,
    development: bool = False,
) -> DataView:
    """
    Creates a `DataView` in Cape Cloud. Returns created `Dataview`

    Arguments:
        name: a name for the `DataView`.
        uri: URI location of the dataset.
        owner_id: The ID of the organization that owns this dataset.
        owner_label: The label of the organization that owns this dataset.
        schema: The schema of the data that `DataView` points to.
            A string value for each column's datatype. Possible datatypes:
                string
                integer
                number
                datetime
        development: Whether the created dataview is in development mode or not.
    Returns:
        A `DataView` instance.
    """
    parse_schema = DataView._validate_schema(schema)
    if not parse_schema:
        parse_schema = DataView._get_schema_from_uri(uri)

    validate_s3_location(uri)

    data_view_dict = self._requester.create_dataview(
        project_id=self.id,
        name=name,
        uri=uri,
        owner_id=owner_id,
        owner_label=owner_label,
        schema=parse_schema,
        development=development,
    )
    data_view = DataView(user_id=self._user_id, **data_view_dict)

    if hasattr(self, "dataviews"):
        self.dataviews.append(data_view)
    else:
        self.dataviews = [data_view]
    return data_view

delete_dataview(self, id) #

Remove a DataView by ID.

Parameters:

Name Type Description Default
id str

ID of DataView.

required
Source code in pycape/api/project/project.py
def delete_dataview(self, id: str) -> None:
    """
    Remove a `DataView` by ID.

    Arguments:
        id: ID of `DataView`.
    """
    self._requester.delete_dataview(id=id)

    if hasattr(self, "dataviews"):
        self.dataviews = [x for x in self.dataviews if id != x.id]

    self._out.write(f"DataView ({id}) deleted" + "\n")

get_dataview(self, id=None, uri=None) #

Query a DataView for the scoped Project by DataView ID or URI.

Parameters:

Name Type Description Default
id Optional[str]

ID of DataView.

None
uri Optional[str]

Unique DataView URI.

None

Returns:

Type Description
DataView

A DataView instance.

Source code in pycape/api/project/project.py
def get_dataview(
    self, id: Optional[str] = None, uri: Optional[str] = None
) -> DataView:
    """
    Query a `DataView` for the scoped `Project` by `DataView` \
    ID or URI.

    Arguments:
        id: ID of `DataView`.
        uri: Unique `DataView` URI.
    Returns:
        A `DataView` instance.
    """
    data_view = self._requester.get_dataview(
        project_id=self.id, dataview_id=id, uri=uri
    )

    return DataView(user_id=self._user_id, **data_view[0])

get_job(self, id) #

Returns a Job given an ID.

Parameters:

Name Type Description Default
id str

ID of Job.

required

Returns:

Type Description
Job

A Job instance.

Source code in pycape/api/project/project.py
def get_job(self, id: str) -> Job:
    """
    Returns a `Job` given an ID.

    Arguments:
        id: ID of `Job`.
    Returns:
        A `Job` instance.
    """
    job = self._requester.get_job(project_id=self.id, job_id=id, return_params="")

    return Job(**job, project_id=self.id, requester=self._requester)

list_dataviews(self) #

Returns a list of dataviews for the scoped Project.

Returns:

Type Description
List[pycape.pycape.api.dataview.dataview.DataView]

A list of DataView instances.

Source code in pycape/api/project/project.py
def list_dataviews(self) -> List[DataView]:
    """
    Returns a list of dataviews for the scoped `Project`.

    Returns:
        A list of `DataView` instances.
    """

    data_views = self._requester.list_dataviews(project_id=self.id)
    get_data_view_values = [
        DataView(user_id=self._user_id, **d) for d in data_views
    ]
    dv_ids = []
    dv_names = []
    dv_locations = []
    dv_owners = []

    for dv in get_data_view_values:
        dv_ids.append(dv.id)
        dv_names.append(dv.name)
        dv_locations.append(dv.location)
        dv_owner_label = dv.owner.get("label")
        if self._user_id in [x.get("id") for x in dv.owner.get("members", {})]:
            dv_owners.append(f"{dv_owner_label} (You)")
        else:
            dv_owners.append(dv_owner_label)

    format_data_views = {
        "DATAVIEW ID": dv_ids,
        "NAME": dv_names,
        "LOCATION": dv_locations,
        "OWNER": dv_owners,
    }
    self._out.write(tabulate(format_data_views, headers="keys") + "\n")
    return [DataView(user_id=self._user_id, **d) for d in data_views]

list_jobs(self) #

Returns a list of Jobs for the scoped Project.

Returns:

Type Description
List[pycape.pycape.api.job.job.Job]

A list of Job instances.

Source code in pycape/api/project/project.py
def list_jobs(self) -> List[Job]:
    """
    Returns a list of `Jobs` for the scoped `Project`.

    Returns:
        A list of `Job` instances.
    """
    jobs = self._requester.list_jobs(project_id=self.id)
    get_job_values = [
        Job(project_id=self.id, requester=self._requester, **j) for j in jobs
    ]
    j_ids = []
    j_type = []
    j_status = []

    for j in get_job_values:
        j_ids.append(j.id)
        j_type.append(j.job_type)
        j_status.append(j.status)

    format_jobs = {
        "JOB ID": j_ids,
        "TYPE": j_type,
        "STATUS": j_status,
    }
    self._out.write(tabulate(format_jobs, headers="keys") + "\n")
    return get_job_values

list_organizations(self) #

Returns all list of organizations that requesting user is a contributor of.

Returns:

Type Description
List[pycape.pycape.api.organization.organization.Organization]

A list of Organization instances.

Source code in pycape/api/project/project.py
def list_organizations(self) -> List[Organization]:
    """
    Returns all list of organizations that requesting user is a contributor of.

    Returns:
        A list of `Organization` instances.
    """
    orgs = self._requester.get_project(id=self.id).get("organizations", [])
    get_org_values = [Organization(**o) for o in orgs]

    format_orgs = {
        "ORGANIZATION ID": [x.id for x in get_org_values],
        "NAME": [x.name for x in get_org_values],
        "LABEL": [x.label for x in get_org_values],
    }
    self._out.write(tabulate(format_orgs, headers="keys") + "\n")
    return get_org_values

submit_job(self, task, timeout=600) #

Submits a Job to be run by your Cape worker in collaboration with other organizations in your Project.

Parameters:

Name Type Description Default
task Task

Instance of class that inherits from Task.

required
timeout float

How long (in ms) a Cape Worker should run before canceling the Job.

600

Returns:

Type Description
Job

A Job instance.

Source code in pycape/api/project/project.py
def submit_job(self, task: Task, timeout: float = 600) -> Job:
    """
    Submits a `Job` to be run by your Cape worker in \
    collaboration with other organizations in your `Project`.
    Arguments:
        task: Instance of class that inherits from `Task`.
        timeout: How long (in ms) a Cape Worker should run before canceling the `Job`.
    Returns:
        A `Job` instance.
    """
    created_job = self._create_job(task=task, timeout=timeout)

    return created_job

pycape.Organization#

Organization represents an organization in Cape.

Parameters:

Name Type Description Default
id str

ID of Organization

required
name str

name of Organization.

required
label str

label of Organization.

required

pycape.DataView#

Dataviews store metadata around datasets, including namely a pointer to the dataset's location.

Parameters:

Name Type Description Default
id str

ID of DataView

required
name str

name of DataView.

required
schema list

schema of the data that DataView points to.

required
location str

URI of DataView.

required
owner dict

Dictionary of fields related to the DataView owner.

required
user_id str

User ID of requester.

required
development bool

Whether this dataview is in development mode or not.

required

pycape.Job#

Jobs track the status and eventually report the results of computation sessions run on Cape workers.

Parameters:

Name Type Description Default
id str

ID of Job

required
status str

name of Job.

required
project_id str

ID of Project.

required

approve(self, org_id) #

Approve the Job on behalf of your organization. Once all organizations approve a job, the computation will run.

Parameters:

Name Type Description Default
org_id str

ID of Organization.

required

Returns:

Type Description
Job

A Job instance.

Source code in pycape/api/job/job.py
def approve(self, org_id: str) -> "Job":
    """
    Approve the Job on behalf of your organization. Once all organizations \
    approve a job, the computation will run.

    Arguments:
        org_id: ID of `Organization`.

    Returns:
        A `Job` instance.
    """
    approved_job = self._requester.approve_job(job_id=self.id, org_id=org_id)

    return Job(
        project_id=self.project_id, **approved_job, requester=self._requester,
    )

get_results(self) #

Given the requesters project role and authorization level, returns the trained model's weights and metrics.

Returns:

Type Description
Tuple[Optional[numpy.ndarray], Dict]

weights: A numpy array. metrics: A dictionary of different metric values.

Source code in pycape/api/job/job.py
def get_results(self) -> Tuple[Optional[np.ndarray], Dict]:
    """
    Given the requesters project role and authorization level, returns the trained model's weights and metrics.

    Returns:
        weights: A numpy array.
        metrics: A dictionary of different metric values.
    """
    job_results = self._requester.get_job(
        project_id=self.project_id,
        job_id=self.id,
        return_params="model_metrics { name value }\nmodel_location",
    )

    # gql returns metrics in key/value pairs within an array
    # e.g. [{"name": "mse_result", "value": [1.0]}, {"name": "r_squared", "value": [1.0]]
    # here we map to a more pythonic key, value
    # {
    #   "mse_result": [1.0],
    #   "r_squared": [1.0],
    # }
    gql_metrics = job_results.get("model_metrics", [])
    metrics = {}
    for m in gql_metrics:
        metrics[m["name"]] = m["value"]

    location = job_results.get("model_location", None)
    if location is None or location == "":
        return None, metrics

    # pull the bucket info if the regression weights were stored on s3
    # location will look like s3://my-bucket/<job_id>
    p = urlparse(location)
    if p.scheme != "s3":
        raise StorageSchemeException(scheme=p.scheme)

    tf = tempfile.NamedTemporaryFile()
    file_name = setup_boto_file(
        uri=p,
        temp_file_name=tf.name,
        download_path=p.path.lstrip("/") + "/regression_weights.csv",
    )

    # check whether the file that contains the weights has a header
    if header_exists(path=file_name):
        # return the weights (decoded to np) & metrics
        return (
            csv.read_csv(file_name).to_pandas().to_numpy().astype(np.float64),
            metrics,
        )

    # return the weights (decoded to np) & metrics
    return np.loadtxt(file_name, delimiter=","), metrics

get_status(self) #

Query the current status of the Cape Job.

Returns:

Type Description
str

A Job status string.

Status Types:

Status Description
Initialized Job has been initialized.
NeedsApproval Job is awaiting approval by at least one party.
Approved Job has been approved, the computation will commence.
Rejected Job has been rejected, the computation will not run.
Started Job has started.
Completed Job has completed.
Stopped Job has been stopped.
Error Error in running Job.
Source code in pycape/api/job/job.py
def get_status(self) -> str:
    """
    Query the current status of the Cape `Job`.

    Returns:
        A `Job` status string.

    ** Status Types:**

    Status | Description
    ------ | ----------
    **`Initialized`** | Job has been initialized.
    **`NeedsApproval`** | Job is awaiting approval by at least one party.
    **`Approved`** | Job has been approved, the computation will commence.
    **`Rejected`** | Job has been rejected, the computation will not run.
    **`Started`** | Job has started.
    **`Completed`** | Job has completed.
    **`Stopped`** | Job has been stopped.
    **`Error`** | Error in running Job.
    """
    job = self._requester.get_job(
        project_id=self.project_id, job_id=self.id, return_params=""
    )
    return job.get("status", {}).get("code", "")

pycape.Task#

Tasks contain the instructions for how a Cape worker should run a job.

Parameters:

Name Type Description Default
model_location str

The AWS S3 bucket name to which Cape will write the output of the model training.

required
model_owner str

The ID of the organization participating in the computation that will own the

required

pycape.VerticallyPartitionedLinearRegressionJob#

Inherits from: Task.

Contains instructions for encrypted training of linear regression models using vertically-partitioned datasets.

Vertically-partitioned datasets refer to the joining of columns (i.e. features) from several parties.

Note

This task expects DataViews with floating-point inputs. Internally, values will be re-encoded by the Cape Worker into the fixed-point numbers necessary for encrypted computation.

Note

This task expects its input DataViews to be aligned by index (although indexing columns need not be present in either of the DataViews or their underlying datasets).

Note

This task expects its input DataViews to have max values scaled between 1.0 and 10.0.

Currently, input data must be scaled to single digits; for any floating-point vector c in the input data views x and y, c must be scaled such that 1.0 <= max(c) < 10.0. This bound allows the Cape Worker to allocate all of its precision for significant digits throughout the linear regression computation, while still maintaining the guarantee that fixed-point numbers won't overflow. For logarithimically-distributed vectors, we recommend applying a log-transform before scaling to this bound.

Parameters:

Name Type Description Default
x_train_dataview Union[`DataView`, `DataView`List[str]]

DataView that points to a dataset that contains training set values.

required
y_train_dataview Union[`DataView`, `DataView`List[str]]

DataView that points to a dataset that contains target values.

required
model_location str

The AWS S3 bucket name to which Cape will write the output of the model training.

required
model_owner str

The ID of the organization participating in the computation that will own the

required