Datasets

A dataset is a collection of data entities that define the schema of a model version. Datasets allow you to compare your production data with different collections of data.

Dataset as a version baseline

A dataset acts as a baseline (historical reference) describing a model's behavior;
it contains the model's inputs, outputs, and labels (optional). Most often, the training or validation datasets are initially used as baselines.
Superwise has out-of-the-box metrics that measure issues such as drifts and distribution shifts.
These metrics are based on discrepancies between a model's production data and its baseline.
The Superwise platform leverages the baseline dataset to provide three important capabilities:

Automated schema discovery - Using the provided baseline, the Superwise SDK automatically defines the version schema.
Automated definition of model baseline - Superwise will automatically summarize the statistical properties and distribution of your baseline dataset, so you can easily compare it with your ongoing production status.

Match entities

Data entities might appear in multiple models and versions across a project.

Data entities have their own distribution in different datasets. To analyze and monitor new production data more efficiently across the project, we are using the "matching entities" capability to match those entities that appear across the project.

Matching entities utilizing already calculated distribution attributes, therefore, allows us to compare new data to existing data from different datasets (apples to apples).

The same data entities can appear in multiple models and versions across a project. Moreover, each data entity can have its own distribution within different datasets. To analyze and monitor new production data more efficiently across the project, we use the match entities capability to match those entities that appear in different models across the same project.

Matching entities use distribution attributes that have already been calculated, thereby allowing us to compare new data to existing data from different datasets (i.e., apples to apples).

By connecting new data entities to existing ones within a project, the matching entities feature enables you to create segments and policies that are shared across models.