2. Upload a version

Once you have a registered model on the platform, you can start uploading versions that need to be monitored.
You must define a version, to start monitoring and gain visibility into your model.

🚧

One version at a time

Currently, the platform supports only one active version at a time. The same model can have different versions over time, but only one active one at any given moment.

Upload a new version via the SDK

πŸ“˜

Install the SDK

To install and get started with our SDK visit our SDK docs

To upload a version, you need to supply the collection of entities that belong to the version and their summary. The summary will include the data type and role of each entity, the importance of each feature (entities that are not features will get the value 0 automatically), and the statistical summary on each entity. This summary will be used later on to create the appropriate metrics per entity and enable comparison between the baseline statistical behavior and production.

To automate the process of creating version summaries, the Superwise SDK requires only two parameters:

  • Baseline dataset
  • Role definition for each entity that is not a "Feature"
import requests
from io import StringIO

import pandas as pd 

from superwise.models.version import Version
from superwise.resources.superwise_enums import DataEntityRole

url = 'https://gitlab.com/superwise.ai-public/integration/-/raw/main/getting_started/data/baseline.csv?inline=false'
baseline_data = pd.read_csv(StringIO(requests.get(url).text))

entities_collection = sw.data_entity.summarise(
    data=baseline_data,
    specific_roles = {
      'id': DataEntityRole.ID,
      'ts': DataEntityRole.TIMESTAMP,
      'prediction': DataEntityRole.PREDICTION_VALUE,
      'price': DataEntityRole.LABEL
    }
)

new_version = Version(
    model_id=diamond_model.id,
    name="1.0.0",
    data_entities=entities_collection,
)

new_version = sw.version.create(new_version)
sw.version.activate(new_version.id)

🚧

File size limitation

The baseline data file should be up to 100MB. if your file is more extensive, make sure you split it

Given these two minimal parameters, the SDK will automatically infer the data type of each entity, feature importance, and the baseline statistics of each entity. By default, the SDK will assign the "Feature" role for each entity that was detected in the baseline.
The code snippet below illustrates how you can explore the resulted entity collection and the summary of each entity.

ls = list()
for entity in entities_collection:
    ls.append(entity.get_properties())
    
entities_summary = pd.DataFrame(ls)
entities_summary.head()

Here is an example output of such summary dataset:

675

You can explore the inferred summary and if you need to you can override any specific statistical element in the version baseline. For example, let's override the expected range for the entity named 'carat' to be between 0 to 3.

for entity in entities_collection:
    if entity.name == 'carat':
        entity.summary['range'] = {'from': 0, 'to': 3}

To read more on how the Superwise SDK automatic inference works and how you can override each inferred element, take a look at our advanced configurations.

Advanced configuration

Override inferred data types

Superwise's SDK automatically infers the data type of each entity. To view the results and modify them if needed use the dedicated infer_dtypes API call.

Let's assume that a data entity named "country" was mistakenly identified as a numeric type instead of categorical. In this case, you can override the true type by updating the entities_dtypes variable. For example:

from superwise.controller.infer import infer_dtype

entities_dtypes = infer_dtype(baseline_data)
print(entities_dtypes)

entities_dtypes['carat'] = 'Categorical'

Next time you summary explicitly pass the entity types that you just modified:

entities_collection = sw.data_entity.summarise(
    data=baseline_data,
    entities_dtypes=entities_dtypes,
    specific_roles = {
      'id': DataEntityRole.ID,
      'ts': DataEntityRole.TIMESTAMP,
      'prediction': DataEntityRole.PREDICTION_VALUE,
      'price': DataEntityRole.LABEL
    }
)

new_version = Version(
    model_id=diamond_model.id,
    name="1.0.0",
    data_entities=entities_collection,
)

new_version = sw.version.create(new_version)
sw.version.activate(new_version.id)

Override feature importance

The Superwise SDK infers the importance of each feature automatically. To achieve this we build a proxy model and then use shapely values to assess the attribution of each feature based on the model output.
All feature importance values are then normalized on a scale of 0-100. The higher the importance the higher the impact on the model output. To override the inferred feature importance with an explicit one, you can pass it as a parameter. For example:

entities_collection = sw.data_entity.summarise(
    data=baseline_data,
    entities_dtypes=entities_dtypes,
    specific_roles = {
      'id': DataEntityRole.ID,
      'ts': DataEntityRole.TIMESTAMP,
      'prediction': DataEntityRole.PREDICTION_VALUE,
      'price': DataEntityRole.LABEL
    },
    importance_mapping = {"carat": 20, "cut": 40, "color": 40}
)

To read more on advanced options of the summarize API please visit our SDK reference guide

Uploading a new version based on the previous one

Usually, differences between versions of the same model are due to ad hoc schema changes. For example, you decided to add two new features to improve the model accuracy. But in some cases, a new version is a result of a retraining process on newer data to refit the model parameters under the same given schema.

Instead of redefining entities on the new version, you can leverage an existing version to define a new one by reusing the entities. This simplifies version setup, and more importantly, you'll be able to analyze and explore entity statistics holistically across the different versions over time. Entity definition continuity enables the platform and its monitoring engine to learn from the given history of the entity from a previous version to monitor an existing version without any cold start.

As you can see in the example below, we first pulled the version definition of a previously created version under the name "1.1.3", now elements that appear in the baseline dataset and were already defined in version "1.1.3" will be given the same summary from version "1.1.3" automatically. Any new entity that appears in the newer version will have a summary created by the given baseline dataset.

prev_version = None
prev_versions = sw.version.get_by_name("1.0.0")
for version in prev_versions:
    if version.model_id == diamond_model.id:
        prev_version = version
print(prev_version.get_properties())

entities_collection = sw.data_entity.summarise(
    data=baseline_data,
    entities_dtypes=entities_dtypes,
    specific_roles = {
      'id': DataEntityRole.ID,
      'ts': DataEntityRole.TIMESTAMP,
      'prediction': DataEntityRole.PREDICTION_VALUE,
      'price': DataEntityRole.LABEL
    },
    base_version=prev_version
)

new_version = Version(
    model_id=diamond_model.id,
    name="1.0.1",
    data_entities=entities_collection,
)

new_version = sw.version.create(new_version)
sw.version.activate(new_version.id)

Validating the version baseline

After uploading a new version, go to the Superwise versions screen, and you'll see the newly created version. You can explore in-depth and drill down even further in the analytics page to view and analyze the version baseline statistics.