Collecting data from GCS
This guide will cover all the steps required to build a fully integrated pipeline with Superwise using Google cloud storage and Google cloud function.
Steps
- Create a cloud function
data:image/s3,"s3://crabby-images/040d2/040d2e3d3d6962ea0a3f557261b3ff5ef5b9e7ec" alt="Screen Shot 2022-08-01 at 23.43.42.png 1170"
- In order to invoke the cloud function on each new file, Add Eventarc trigger and select the following:
- Event provider - Cloud storage
- Event - google.cloud.storage.object.v1.finalized
data:image/s3,"s3://crabby-images/0d81b/0d81ba749f95752ac7ecf35ddd1cd3698e3e1cca" alt="Screen Shot 2022-08-01 at 23.48.11.png 1232"
- Runtime, build, and connections settings
a. Increase the allocated memory to 1GiB and The timeout to 300 seconds
b. Add client id and secret as environment variables (read more about Generating tokens)
data:image/s3,"s3://crabby-images/7d9d7/7d9d7063a8abfa88fcd2feda03533ec7e4560a43" alt="Screen Shot 2022-08-01 at 23.58.04.png 1200"
- Click next and move to the Code section:
a. Select python 3.10 as runtime for the function
b. Addsuperwise
package to the requirements.txt file
c. Insert the following code in main.py file
import functions_framework
from superwise import Superwise
sw_client = Superwise()
@functions_framework.cloud_event
def hello_gcs(cloud_event):
data = cloud_event.data
print(f"Event ID: {cloud_event['id']} | Event type: {cloud_event['type']} | Created: {data['timeCreated']}")
t_id = sw_client.transaction.log_from_gcs(
model_id=int(re.findall(r"model_id=(\d+)/", data['name'])[0]),
version_id=int(re.findall(r"version_id=(\d+)/", data['name'])[0]),
file_path=f"gs://{data['bucket']}/{data['name']}"
)
print(t_id)
Folder structure
The code above assumes the following path convention:
gs://<bucket_name>/model_id=<model_id>/version_id=<version_id>/.../file_name.parquet
For on-prem users
- Initiate
Superwise
with an additional argument - superwise_host Python SDK- Please use
log_file
instead, which supports sending files from your cloud provider's blob storage.In addition, If Superwise is deployed in a VPC and not exposed to the internet, the Cloud Function must be connected to the VPC in order to send data to Superwise.
That's it! Now you are integrated with Superwise!
Updated almost 2 years ago