A dataset consists of a collection of data entities (columns) where each of them has a name, type, and role.
For example- "Country" is a categorical (data type) dataset feature (role)

Entity role

To support different relevant pieces in the ML decision process, Superwise supports the following roles:

IDUnique identifier per row or prediction. Using the ID entity, one can send labels to be connected to the previously sent predictions and compute performance metrics.
Each schema must include exactly one data entity with an 'ID' role.

Note: We support both string and int32 data types for unique ID.
However, our recommendation is to use string UUID to support high volumes of data.
TimestampIndicates when the prediction took place. The timestamp column enables us to present model metrics according to the actual prediction date and not based on the time it was sent to the platform.
Each schema must include exactly one data entity with a 'Timestamp' role.
FeatureA data entity that is being used as input by the model.
Prediction probabilityA probability value that was generated by the model that can be used for classification use cases.
Prediction valueA model output value. Typically found in binary classification models as a boolean data type while for regression models it will be a numeric data type.
LabelThe actual ground-truth the model is trying to predict. The label should be the same data type as the prediction value.
Label weightThe label weight can be used when we want a label's prediction to represent more than one observation. It can be used for semi-supervised cases, where one label observation can represent more than one case
MetadataAvailable data that is not used directly as a feature by the model and therefore won’t be factored into model drift calculations but can be used for analytical purposes such as segmentation or dimension breakdowns.

Data entity type

Each data entity has its own data format. Based on data type, our platform calculates the relevant metrics per data entity to provide you with full observability. For example, categorical entities will automatically be computed for entropy while variance will be computed for numeric entities (to view the full set of metrics according to each data type, see here).

Supported data formats:

Data typePossible valuesExample
NumericFloat or int> Age: 50, 18, 63, 7, ...
> Amount: 1011.0, 23.7, 674.3, ...
CategoricalObject or string> Color: Red, Blue, Yellow, ...
> Country: England, Israel, USA, ...

Note: categorical entities with over 200 categories (e.g first name, street name, IP, etc.) will be counted as "Sparse" features, and will not have any metric calculated on it other than "Missing values"
Boolean0/1 or true/false> Fraud: true/false
> Is_active: 0/1
Timestampyyyy-mm-dd hh:mm:ss.SSS> Prediction_TS: '2021-12-04 18:27:20.213'