Sagify integration

780

This doc refers to Sagify users who want to use the Superwise platform to define workflows that automatically monitor: data drift, performance degradation, data integrity, model activity, or any other customized monitoring use case.

Step 1: Create a Superwise Account

Go to Superwise and click the Account button to create an account. Using the free tier, you can monitor up to three models.

Step 2: Add your model

You can use the Superwise SDK to create the model. Click User profile and then select Personal tokens to create an access token on the Superwise dashboard.

Step 3: Initialize sagify

To initialize sagify, type the following command sagify init
Enter iris-model for the SageMaker app name, and answer y to the prompt asking Are you starting a new project? Next, make sure to choose Python version 3 and the AWS profile and region you wish to use. Type requirements.txt in answer to the prompt Type in the path to requirements.txt.
A module called sagify_base is created under the src directory. The module’s structure is as follows:

sagify_base/
    local_test/
        test_dir/
            input/
                config/
                    hyperparameters.json
                data/
                    training/
            model/
            output/
        deploy_local.sh
        train_local.sh
    prediction/
        __init__.py
        nginx.conf
        predict.py
        prediction.py
        predictor.py
        serve
        wsgi.py
    training/
        __init__.py
        train
        training.py
    __init__.py
    build.sh
    Dockerfile
    executor.sh
    push.sh

Step 4: Initialize the requirements.txt

Make sure the requirements.txt at the root of the project has the following content:

awscli
    flake8
    Flask
    joblib
    pandas
    s3transfer
    sagify>=0.18.0
    scikit-learn
    superwise

Step 5: Download the Iris data set

Download the Iris data set and save it in a file named "iris.data" under src/sagify_base/local_test/test_dir/input/data/training/.

Step 6: Implement the training logic

In the src/sagify_base/training/training.py file, replace the TODOs in the train(...) function with the following text:

input_file_path = os.path.join(input_data_path, 'iris.data')

df = pd.read_csv(
    input_file_path,
    header=None,
    names=['feature1', 'feature2', 'feature3', 'feature4', 'label']
)
df['date_time'] = pd.to_datetime('now')
df["id"] = df.apply(lambda _: uuid.uuid4(), axis=1)
df_train, df_test = train_test_split(df, test_size=0.3, random_state=42)

features_train_df = df_train[['feature1', 'feature2', 'feature3', 'feature4']]
labels_train_df = df_train[['label']]

features_train = features_train_df.values
labels_train = labels_train_df.values.ravel()

features_test_df = df_test[['feature1', 'feature2', 'feature3', 'feature4']]
labels_test_df = df_test[['label']]

features_test = features_test_df.values
labels_test = labels_test_df.values.ravel()

clf = SVC(gamma='auto', kernel="linear")
clf.fit(features_train, labels_train)

###### Report Testing Data ######
test_predictions = clf.predict(features_test)

accuracy = accuracy_score(labels_test, test_predictions)
output_model_file_path = os.path.join(model_save_path, 'model.pkl')
joblib.dump(clf, output_model_file_path)

accuracy_report_file_path = os.path.join(model_save_path, 'report.txt')
with open(accuracy_report_file_path, 'w') as _out:
    _out.write(str(accuracy))

project = Project(
    name="Demo Project",
    description="Demo Project"
)
project = sw.project.create(project)

model = Model(
    name="Demo Model",
    description="Iris Model",
    project_id=project.id
)
model = sw.model.create(model)

df["prediction"] = clf.predict(df[['feature1', 'feature2', 'feature3', 'feature4']])
df.to_csv("Iris Data.csv", index=False)

## Create a dataset
dataset = Dataset(name="Demo Dataset", files="Iris Data.csv", project_id=project.id)
dataset = sw.dataset.create(dataset)

## create and activate superwise version
version = Version(model_id=model.id, name="V1", dataset_id=dataset.id)
active_version = sw.version.create(version)
sw.version.activate(active_version.id)

🚧

File size limitation

Dataset file should be up to 30K rows. if your file is bigger, make sure you split it

And, at the top of the file, add the following:

import uuid

import joblib
import os

import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from superwise import Superwise
from superwise.models.project import Project
from superwise.models.model import Model
from superwise.models.version import Version
from superwise.models.dataset import Dataset

sw = Superwise(
    client_id="<my client_id>",
    secret="<my secret>",
)

Step 7: Implement the prediction logic

In the file src/sagify_base/prediction/prediction.py, replace the body of the predict(...) function with the following:

model_input = json_input['features']
prediction = ModelService.predict(model_input)
model = ModelService.get_superwise_model(MODEL_NAME)

for m in model_input:
    records = {
        "date_time" : str(datetime.utcnow()),
        "id" : str(uuid.uuid4()),
        "prediction": prediction,
        "feature1": m[0],
        "feature2": m[1],
        "feature3": m[2],
        "feature4": m[3]
    }
    transaction_id = sw.transaction.log_records(
        model_id=model[0].id,
        version_id=model[0].active_version_id,
        records=records
    )
    print(f"Created transaction:  {transaction_id}")
return {
    "prediction": prediction.item()
}

Within the ModelService class in the same file, replace the body of the get_model() function with the following:

if cls.model is None:
        cls.model = joblib.load(os.path.join(_MODEL_PATH, 'model.pkl'))
    return cls.model

Then, add a new function called get_superwise_model() to the ModelService class, using the following:

@classmethod
def get_superwise_model(cls,model_name):
    """ Get superwise model using superwise SDK """
    return sw.model.get_by_name(model_name)

Now, add the following text to the top of the file:

from superwise import Superwise
import joblib
import os
import pandas as pd
from datetime import datetime
import uuid

sw = Superwise(
    client_id="<my client_id>",
    secret="<my secret>",
)
MODEL_NAME = "Iris Model"
_MODEL_PATH = os.path.join('/opt/ml/', 'model')  # Path where all your model(s) live in

Step 8: Build and train the ML model

If you’re ready to build and train the ML model, run the command sagify build and then run sagify local train

Step 9: Call the inference REST API

To use the REST API, run the command sagify local deploy
Once that’s done, call the inference endpoint by running the following curl command:

curl -X POST \
http://localhost:8080/invocations \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
    "features":[[0.34, 0.45, 0.45, 0.3]]
}'

You should now be able to see data coming in on the Superwise dashboards.

2868