Distribution metrics
Statistical measures for any data entity throughout your ML process
(Inputs, metadata, outputs, and labels)
![Distribution.png 1800](https://files.readme.io/96e3451-Distribution.png)
Statistical property metrics measured on the entity level are divided according to data entity format type:
Metric | Categorical | Numeric | Boolean |
---|---|---|---|
Distribution shift | β | β | β |
Top frequent percents Frequency of the mode value. | β | - | - |
Unique values The number of unique values. | β | - | - |
Entropy Calculation of the entropy on the distribution of categorical entities. | β | - | - |
Min value The lowest value. | - | β | - |
Max value The highest value. | - | β | - |
Sum value The sum of values. | - | β | - |
Mean value The average value. | - | β | - |
Standard Deviation The measure of the amount of variation in a numeric entity. | - | β | - |
Proportion Percent of positive value. | - | - | β |
Feature Importance
Superwise measures the effect a feature has on a model. Superwise calculates feature importance using SHAP values during baseline creation, or you can set this manually while configuring the schema.
Distribution shift & drift metrics:
-
Distribution shift
How different the distribution in the selected data is from the baseline over time. The scale ranges from 0-100, where 0 indicates identical distribution, and 100 indicates orthogonal distribution. -
Input drift
The average distribution shifts across all features.
Updated over 2 years ago