Drift metrics
Distribution distance functions that yield a single, clear value
Detecting changes in the distribution of a feature, model probability, or any data entity is an important part of the ML process. To track such changes, one could use a long list of statistical parameters such as distribution mean, min value, max value, variance, and more (all available in the product). However, to reduce the noise level and determine whether there was a change, Superwise uses a unique set of distribution distance functions to yield a single, unambiguous value, allowing change to be assessed over time.
Drift calculations
Distribution change functions quantify the statistical distance (i.e., level of change) between two distributions (or two samples from which the empirical distribution is inferred). Different functions are used for different entity types. The metric scale ranges from 0-100, where 0 indicates identical distribution and 100 indicates orthogonal distribution.
Distribution change for categorical or boolean entities
We use a symmetric chi-square distance function for categorical or boolean data entities.
Definition:
data:image/s3,"s3://crabby-images/63867/638675dc13a2da221fbd0e66d676fb90e95a9f78" alt="Screen Shot 2021-06-16 at 14.22.00.png 1024"
Where P and Q are two distributions or samples of a random variable and P(i) & Q(i) are the probability value in the corresponding sample.
Example:
Let's assume we have two samples, P and Q, from some categorical feature with the following distribution:
Value | Sample 1 (P) | Sample 2 (Q) |
---|---|---|
A | 100 (20%) | 75 (30%) |
B | 250 (50%) | 75 (30%) |
C | 150 (30%) | 100 (40%) |
data:image/s3,"s3://crabby-images/9d0f9/9d0f9ae787ec3ac6c37e7f52348f4fda30481b0c" alt="Screen Shot 2021-06-16 at 12.44.05.png 1102"
data:image/s3,"s3://crabby-images/64b32/64b32d0156dd9800ee02618e4e481c461a02d9b6" alt="Screen Shot 2021-06-16 at 14.23.09.png 1026"
Numeric entities:
For numeric data entities, we use a normalized version of the Wasserstein distance function (aka the move earth distance). This distance function quantifies the amount of βworkβ required to convert distribution P to distribution Q. The Wasserstein distance between the distributions P and Q can be calculated based on their CDFs (cumulative distribution functions):
If U and V are the respective CDFs of P and Q, then
data:image/s3,"s3://crabby-images/cc315/cc3157039c92537bfef09ae25234d2d0e37ad597" alt="Screen Shot 2021-06-16 at 14.22.19.png 1202"
Our calculation is based on scipy implementation with a normalization step to bound the distance to 100. Hence, a distance between two distributions with no overlap will be 100, regardless of the actual distance between the two distributions.
Note: In the case where no data is received the distance would be None.
How to configure a drift metric
Click here
Read more
For more information about how to configure drift metrics: configure drift metrics
Updated over 2 years ago