Versions
"Just keep swimming."
-Finding Nemo
Deploying a model to production is only step 1, as models require iterative improvement and ongoing updates. Differences between versions may be ad hoc schema changes or retraining on a new data set to refit the model hyperparameters under the same given schema.
The Superwise platform supports model versions so you can compare and contrast model behavior across any set of versions. You may toggle between model versions, but only one version per model on any given date is enforced.
File size limitation
The baseline data file should be up to 100MB. if your file is more extensive, make sure you split it
Version schema
Because each model version could introduce new input formats, each version requires an explicit schema definition. Schema is a collection of different data entities (a.k.a columns) that are part of the specific version of the machine learning decision process. Each data entity has its own data type and has a specific role in the ML process.
Entity role
To support different relevant pieces in the ML decision process, Superwise supports the following roles:
Role | Description |
---|---|
ID | Unique identifier per row or prediction. Using the ID entity, one can send labels to be connected to the previously sent predictions and compute performance metrics. Each schema must include exactly one data entity with an 'ID' role. |
Timestamp | Indicates when the prediction took place. The timestamp column enables us to present model metrics according to the actual prediction date and not based on the time it was sent to the platform. Each schema must include exactly one data entity with a 'Timestamp' role. |
Feature | A data entity that is being used as input by the model. |
Prediction probability | A probability value that was generated by the model that can be used for classification use cases. |
Prediction value | A model output value. Typically found in binary classification models as a boolean data type while for regression models it will be a numeric data type. |
Label | The actual ground-truth the model is trying to predict. The label should be the same data type as the prediction value. |
Label weight | The label weight can be used when we want a label's prediction to represent more than one observation. It can be used for semi-supervised cases, where one label observation can represent more than one case |
Metadata | Available data that is not used directly as a feature by the model and therefore wonβt be factored into model drift calculations but can be used for analytical purposes such as segmentation or dimension breakdowns. |
Data type
Each data entity has its own data format. Based on data type, our platform calculates the relevant metrics per data entity to provide you with full observability. For example, categorical entities will automatically be computed for entropy while variance will be computed for numeric entities (to view the full set of metrics according to each data type, see here).
Supported data formats:
Data type | Possible values | Example |
---|---|---|
Numeric | Float or int | > Age: 50, 18, 63, 7, ... > Amount: 1011.0, 23.7, 674.3, ... |
Categorical | Object or string | > Color: Red, Blue, Yellow, ... > Country: England, Israel, USA, ... |
Boolean | 0/1 or true/false | > Fraud: true/false > Is_active: 0/1 |
Timestamp | yyyy-mm-dd hh:mm:ss.SSS | > Prediction_TS: '2021-12-04 18:27:20.213' |
Updated over 2 years ago