A dataset consists of a collection of data entities (columns) where each of them has a name, type, and role.
For example- "Country" is a categorical (data type) dataset feature (role)

Entity role

To support different relevant pieces in the ML decision process, Superwise supports the following roles:

Role

Description

ID

Unique identifier per row or prediction. Using the ID entity, one can send labels to be connected to the previously sent predictions and compute performance metrics.
Each schema must include exactly one data entity with an 'ID' role.

Note: We support both string and int32 data types for unique ID.
However, our recommendation is to use string UUID to support high volumes of data.

Timestamp

Indicates when the prediction took place. The timestamp column enables us to present model metrics according to the actual prediction date and not based on the time it was sent to the platform.
Each schema must include exactly one data entity with a 'Timestamp' role.

Feature

A data entity that is being used as input by the model.

Prediction probability

A probability value that was generated by the model that can be used for classification use cases.

Prediction value

A model output value. Typically found in binary classification models as a boolean data type while for regression models it will be a numeric data type.

Label

The actual ground-truth the model is trying to predict. The label should be the same data type as the prediction value.

Label weight

The label weight can be used when we want a label's prediction to represent more than one observation. It can be used for semi-supervised cases, where one label observation can represent more than one case

Metadata

Available data that is not used directly as a feature by the model and therefore won’t be factored into model drift calculations but can be used for analytical purposes such as segmentation or dimension breakdowns.

Data entity type

Each data entity has its own data format. Based on data type, our platform calculates the relevant metrics per data entity to provide you with full observability. For example, categorical entities will automatically be computed for entropy while variance will be computed for numeric entities (to view the full set of metrics according to each data type, see here).

Supported data formats:

Data type

Possible values

Example

Numeric

Float or int

Age: 50, 18, 63, 7, ...
Amount: 1011.0, 23.7, 674.3, ...

Categorical

Object or string

Color: Red, Blue, Yellow, ...
Country: England, Israel, USA, ...

Note: categorical entities with over 200 categories (e.g first name, street name, IP, etc.) will be counted as "Sparse" features, and will not have any metric calculated on it other than "Missing values"

Boolean

0/1 or true/false

Fraud: true/false
Is_active: 0/1

Timestamp

yyyy-mm-dd hh:mm:ss.SSS

Prediction_TS: '2021-12-04 18:27:20.213'


Did this page help you?