Versions

"Just keep swimming."
-Finding Nemo

Deploying a model to production is only step 1, as models require iterative improvement and ongoing updates. Differences between versions may be ad hoc schema changes or retraining on a new data set to refit the model hyperparameters under the same given schema.

The Superwise platform supports model versions so you can compare and contrast model behavior across any set of versions. You may toggle between model versions, but only one version per model on any given date is enforced.

1800

🚧

File size limitation

The baseline data file should be up to 100MB. if your file is more extensive, make sure you split it

Version schema

Because each model version could introduce new input formats, each version requires an explicit schema definition. Schema is a collection of different data entities (a.k.a columns) that are part of the specific version of the machine learning decision process. Each data entity has its own data type and has a specific role in the ML process.

Entity role

To support different relevant pieces in the ML decision process, Superwise supports the following roles:

RoleDescription
IDUnique identifier per row or prediction. Using the ID entity, one can send labels to be connected to the previously sent predictions and compute performance metrics.
Each schema must include exactly one data entity with an 'ID' role.
TimestampIndicates when the prediction took place. The timestamp column enables us to present model metrics according to the actual prediction date and not based on the time it was sent to the platform.
Each schema must include exactly one data entity with a 'Timestamp' role.
FeatureA data entity that is being used as input by the model.
Prediction probabilityA probability value that was generated by the model that can be used for classification use cases.
Prediction valueA model output value. Typically found in binary classification models as a boolean data type while for regression models it will be a numeric data type.
LabelThe actual ground-truth the model is trying to predict. The label should be the same data type as the prediction value.
Label weightThe label weight can be used when we want a label's prediction to represent more than one observation. It can be used for semi-supervised cases, where one label observation can represent more than one case
MetadataAvailable data that is not used directly as a feature by the model and therefore won’t be factored into model drift calculations but can be used for analytical purposes such as segmentation or dimension breakdowns.

Data type

Each data entity has its own data format. Based on data type, our platform calculates the relevant metrics per data entity to provide you with full observability. For example, categorical entities will automatically be computed for entropy while variance will be computed for numeric entities (to view the full set of metrics according to each data type, see here).

Supported data formats:

Data typePossible valuesExample
NumericFloat or int> Age: 50, 18, 63, 7, ...
> Amount: 1011.0, 23.7, 674.3, ...
CategoricalObject or string> Color: Red, Blue, Yellow, ...
> Country: England, Israel, USA, ...
Boolean0/1 or true/false> Fraud: true/false
> Is_active: 0/1
Timestampyyyy-mm-dd hh:mm:ss.SSS> Prediction_TS: '2021-12-04 18:27:20.213'