Having seen many different ML implementations, we based the superwise system on several basic concepts that can be applied to any ML process.
A model version is a specific iteration of a given model. The difference between one version and another may be a different schema or retraining on a new data set. Our system supports switching between model versions, but does enforce one version per model on a given day.
The baseline is a historical reference data set describing a model's behavior. Various system calculations, such as drifts, are based on the difference between a model's current data and its baseline. The baseline contains the model's input, output, and labels (optional), for a historical time period. Training or validation data sets are often used as baselines. For more details, see our section on Getting Started.
An entity refers to each of the data elements sent to superwise. Each entity is defined by a role and a data type.
Role - Describes the entity used in the ML process.
- ID - Unique identifier per row. There can only be one ID entity per schema.
- Timestamp - Indicates when the prediction has happened. The timestamp format is yyyy-mm-dd hh:mm:ss.
- Feature - The data entity being used as input by the model.
- Prediction probability - Optional data that can be sent for classification models, with valid values between 0 and 1.
- Prediction value - The model's decisions. Depending on the model type, these can be Boolean (classification ), categorical (multiclass classification), or numeric (regression).
- Label - The actual ground truth the model is trying to predict. The label should be the same type as the prediction value entity.
- Label timestamp - Indicates when the label (feedback) has arrived. The label timestamp format is yyyy-mm-dd hh:mm:ss.
- Label weight - Typically used for exploration purposes, the label weight can be used when we want a label's prediction to represent more than one observation.
- Metadata - Available data that was not used in a model's training, but can be used for analytical purposes like segmentation or dimension breakdown. Metadata behaves like a feature with 0 importance.
- Numeric - float or int
- Boolean - 0/1 or true/false
- Timestamp - yyyy-mm-dd hh:mm:ss
- Unknown - can be used for empty columns sent to us.
A function applied on a data set of entities such as min, max, mean, and distribution drift. The available functions vary according to the data type involved.
A metric-entity combination, for example the distribution drift of a specific feature, or the maximal value of a label.
The way superwise organizes model monitoring. A combination of rules and logic applied to selected KPIs with the goal of pointing out business-critical ML or technical issues.
A subset of the population to which the model applies that can be used to refine monitoring and analytical definition. For example, analyze the data drift for a specific country, or monitor the model performance only for people under the age of 30.
Updated 9 days ago