Dataset analytics

Superwise allows you to view statistics and analytics on your data (inference or datasets).
At the top of the screen, you can select the model/dataset for which you want to see the analytics. You can also compare these analytics with those from another model/dataset. We'll explain how to do that later in the section.

Query builder

The first thing you need to select is the "Source" filter. Choose whether you want to view analytics for a specific dataset or the model's production data.

Dataset source - Opens the option to select a specific dataset from the project's list of datasets

Production source - Refers to logged inference data. To tell Superwise what entities and data to present, there are a few additional parameters that need to be chosen:
- Model - The model whose production data will be used.
- Version - The model version whose data should be used. The default value is all versions.
- Date - The starting date from when the data should be fetched. The default value is the last 30 days.
- Segment - If segments were defined for the data, you could choose specific ones to be used. The default will show the entire dataset.

Entities table

The entities table describes each entity's name, role, type, expected values, importance (to features only), and distribution. The data presented for each entity includes:

Role - Determined according to the schema of the model's version/dataset. Read more about roles here.
Type - Determined according to the schema of the model's version/dataset. Read more about datatypes here .
Expected values - Superwise infers these automatically from the dataset values.
Importance - Superwise measures how each feature impacts the model prediction globally. We calculate feature importance using SHAP values while analyzing the dataset. Each feature in the data receives an importance score between 1 and 100; the sum of all values together is 100.

You can click on the entity type filter to see more analytics for the entities based on their type:

Categorical entities - Expected values, Missing values, Unique values, Top frequent, Importance, Distribution
Numeric entities - Expected values, Min, Max, Mean, Missing values, Importance, Distribution
Boolean entities - Expected values, Missing values, Importance, Distribution

Comparison mode

You can click the Compare button to compare the analytics for two models, two versions, two datasets, or a dataset to production data. In comparison mode, a new column called "Distribution distance" is added.
This allows you to understand how this entity has drifted between the timeframes or data being compared.
The following example compares two versions of the same model. You can see the change in the distribution of the "gender" feature between these two versions.