Skip to content

Transformations#

Transformations are functions that alter your data, ensuring it is free of sensitive information.

Cape Python has five built-in transformation functions. This document describes what they do, and provides an example of how to use each transformation in your policy.

Date perturbation#

The date-perturbation transformation adds random noise to dates. The amount of noise depends on the min and max values that you set in the policy.

- transform:
    type: date-pertubation
    frequency: <one of: 'year', 'month', 'day', 'hour', 'minute', 'second'>
    min: <int or float>
    max: <int or float>
    # Optional. The base number to initialize the random number generator.
    # Pandas only (Spark does not currently support seeding)
    seed: <int>

Date truncation#

The date-truncation transformation shortens dates to a unit (year or month). Set the unit in frequency.

- transform:
    type: date-truncation
    frequency: <one of: 'year', 'month', 'day', 'hour', 'minute', 'second'>

Numeric pertubation#

The numeric-pertubation transformation adds random noise to numeric data sets. The amount of noise depends on the min and max values that you set in the policy.

- transform:
    type: numeric-pertubation
    dtype: <Pandas Series type or Spark Series type>
    min: <int or float>
    max: <int or float>
    # Optional. The base number to initialize the random number generator.
    seed: <int>

Numeric rounding#

The numeric-rounding transformation rounds numeric values to a given number of decimal places. Use precision to set the number of decimal places.

- transform:
    type: numeric-rounding
    dtype: <Pandas Series type or Spark Series type>
    precision: <int>

Tokenizer#

The tokenizer transformation maps a string to a token to obfuscate it.

Warning

Linkable tokenization for sensitive data is vulnerable to privacy attacks. Cape Privacy does not recommend sharing tokenized data with preserved linkability with untrusted or outside parties. Cape Python does not support anonymized transformations.

- transform:
    type: tokenizer
    # Default is 64
    max_token_len: <int or bytes>
    # If unspecified, Cape Python uses a random byte string
    key: <string or byte string>

ReversibleTokenizer#

The ReversibleTokenizer transformation maps a sting to a token to obfuscate it. However, when using the ReversibleTokenizer, the tokens can be reverted back to their plaintext form by using the TokenReverser.

- transform:
    type: reversible-tokenizer
    # If unspecified, Cape Python uses a random byte string
    key: <string or byte string>

TokenReverser#

The TokenReverser is designed to be used with the ReversibleTokenizer. The TokenReverser reverts tokens produced by the ReversibleTokenizer back to their plaintext form.

- transform:
    type: token-reverser
    # If unspecified, Cape Python uses a random byte string
    key: <string or byte string>