Drift metrics

Distribution distance functions that yield a single, clear value

Detecting changes in the distribution of a feature, model probability, or any data entity is an important part of the ML process. To track such changes, one could use a long list of statistical parameters such as distribution mean, min value, max value, variance, and more (all available in the product). However, to reduce the noise level and determine whether there was a change, Superwise uses a unique set of distribution distance functions to yield a single, unambiguous value, allowing change to be assessed over time.

Drift calculations

Distribution change functions quantify the statistical distance (i.e., level of change) between two distributions (or two samples from which the empirical distribution is inferred). Different functions are used for different entity types. The metric scale ranges from 0-100, where 0 indicates identical distribution and 100 indicates orthogonal distribution.

Distribution change for categorical or boolean entities

We use a symmetric chi-square distance function for categorical or boolean data entities.

Definition:

1024

Where P and Q are two distributions or samples of a random variable and P(i) & Q(i) are the probability value in the corresponding sample.

Example:

Let's assume we have two samples, P and Q, from some categorical feature with the following distribution:

ValueSample 1 (P)Sample 2 (Q)
A100 (20%)75 (30%)
B250 (50%)75 (30%)
C150 (30%)100 (40%)
1102 1026

Numeric entities:

For numeric data entities, we use a normalized version of the Wasserstein distance function (aka the move earth distance). This distance function quantifies the amount of β€œwork” required to convert distribution P to distribution Q. The Wasserstein distance between the distributions P and Q can be calculated based on their CDFs (cumulative distribution functions):
If U and V are the respective CDFs of P and Q, then

1202

Our calculation is based on scipy implementation with a normalization step to bound the distance to 100. Hence, a distance between two distributions with no overlap will be 100, regardless of the actual distance between the two distributions.
Note: In the case where no data is received the distance would be None.

πŸ“˜

Read more

For more information about how to configure drift metrics: configure drift metric