Istio Service Mesh, Canary Release Routing Strategies for ML Deployments in a Kubernetes Cluster

Posted October 16, 2021 by Gowri Shankar ‐ 13 min read

Change is the only constant thing in this universe. Our data changes and cause data drift then the understanding of the nature change and cause concept drift. However, we believe building State of the Art(SOA), One of a Kind(OAK), and First, of its Time(FOT) in-silico intelligence will achieve a nirvana state and juxtapose us next to the hearts that are liberated from the cycle of life and death. Constructing a model is just the end of the inception, real trials of difficulty and the excruciating pain of managing changes are awaiting us. Shall we plan well ahead by having a conscious focus on a minimum viable product that promises a quicker time to market with a fail-fast approach? Our ego doesn't allow that because we do not consider software development is cool anymore, we believe building intelligence alone makes us deserving our salt. Today anyone can claim themselves a data scientist because of 2 reasons. Until 2020 we wrote SQL queries for existence. It is 2021 - Covid bug bit and mutated us, we survived variants and waves that naturally upgraded the SQL developer within to a data scientist(evolutionary process). Reason 2 - With all due respect to one man Dr.Andrew Ng, with his hard work and perseverance, made us believe we are all data scientists. By the way, they say ignorance is bliss and we can continue building our SOA, OAK, and FOT models forever at the expense of someone's cash. BTW, Anyone noticed Andrew is moving away from the model-centric AI to the data-centric AI - He is a genius and he will take to the place we truly belong.

In this post, I would like to pitch in a few critical concepts on model serving and deployment for making robust machine learning releases/upgrades. It also carries a practical guide for balancing the load and routing schemes to experiments on a subset of users in the production environment. This is the first post on MLOps where we study a few tools and technologies for rapid and continuous deployment.


This post is inspired by GCP guide for MLOps focusing canary deployment, refer here


The prime objective of this post is to understand the Canary model serving strategy for deploying machine learning models, In that quest, we shall learn the following

  • Prepare a Google Kubernetes Engine(GKE) cluster
  • Istio service mesh
  • Deploying models using TF Serving
  • Configuring Istio Ingress gateway, services, and rules
  • Configuring weight and content-based traffic routing strategies


Our models are nothing but the manifestation of the data that we have provided to make meaning out of a confined universe. This universe is continually changing for a simple reason we cannot estimate for all the confounders, which leads to drifts from the initial assumptions and presumptions we made. However, the changes can be monitored through the statistical properties of the features accounted, predictions made from those features, and their correlation quotients. Model drift refers to degradation of performance of the model due to the change in the universe, these changes are caused by one or either of the following drifts,

  1. Data Drift:
    A change in the measure of distribution is a clear indicator of data drift. For e.g, a marketing campaign for a particular product is targetted among the teens of average age 18 resulted from a loss of revenue from the adult group. Then a refocus of the target audience is suggested.

  2. Concept Drift:
    Concept drift is a change in our understanding of the confounders. For e.g for an alcoholic beverage company, the potential customer base is of age 18 and above. However, the federal government decides to change the legal age limit for consuming alcohol to 25. Then the age group between 18 and 25 completely goes out of the consideration in the recommendation system.

Canary Deployment Strategy

During the olden days, coal miners used a simple tactic to gauge the toxic gases by sending canaries before they step into the mines. This risk reduction strategy inspires us in software deployment and the upgrade process by assigning a subset of users to the new deployment. i.e Whenever there is an upgrade, a portion of the users are allowed to use the new pathways, and the rest of the traffic is sent to the stable deployment. Once the stability of the new release is confirmed, other users are brought into the new release gradually or in one shot.

GKE Cluster

Google Cloud developed Kubernetes and open-sourced it in 2014 and GKE is their fully managed environment that leverages the simplicity of PaaS and utilizes the flexibility of IaaS. The following video explains GKE in detail.

This guide uses the GKE cluster to deploy our machine learning models.

from IPython.display import YouTubeVideo
YouTubeVideo('Rl5M1CzgEH4', width="100%")

Istio Service Mesh

A service mesh is a networking layer that provides transparent and dedicated infrastructure for service-to-service communication between services or micro-services, using a proxy.

A service mesh consists of network proxies paired with each service in an application
and a set of task management processes. The proxies are called the data plane and the 
management processes are called the control plane. The data plane intercepts calls 
between different services and “processes” them; the control plane is the brain of the 
mesh that coordinates the behavior of proxies and provides APIs for operations and 
maintenance personnel to manipulate and observe the entire network

- Wikipedia

This guide uses Istio as a service mesh to expose the deployed models as microservice. Using Istio we can easily manage the Kubernetes services and expose them to potential consumers.

YouTubeVideo('8oLX5P4ctmY', width="100%")

TF Serving

There is one area where Tensorflow is undoubtedly superior compared with its competitors(especially PyTorch) is model serving. TF Serving is Tensorflow’s flexible, high-performance serving system for machine learning models designed considering the needs of the production environment. We are all familiar with SavedModel format that packages a complete TF program including trained parameters and computation. i.e It does not require the code that is used for building the model.

YouTubeVideo('4mqFDwIdKh0', width="100%")

GKE Cluster, Canary Deployment and Routing Strategies

In this section, we shall make a canary deployment on a GKE cluster with Istio service-to-service communication layer in a step-by-step manner. Following are the task we will accomplish by the end of this section.

  1. Activate Cloud Shell
  2. Create a GKE Cluster
  3. Install Istio package
  4. Deploy ML models using TF Serving
  5. Configure Istio Ingress Gateway
  6. Configure Istio Virtual Services
  7. Configure Istio Service Rules
  8. Configure weight-based routing
  9. Configure content-based routing

To do this in vivo, you need a Google Cloud account and a project.

All instructions and the configuration files(*.yaml) can be found here

Activate Cloud Shell and Get Sources

GCloud provides Cloud Shell free of cost and it can be activated by clicking the “Activate Cloud Shell” button in the developer console. Cloud shell also provides a free editor, a VS code server instance for code development.

Authorize Cloud Shell

gcloud auth list

Credentialed Accounts

ACCOUNT: *************

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

gcloud config set account *****

- Updated property [core/account].

Get Source Files From GCloud Repo.

kpt pkg get tfserving-canary

Package "tfserving-canary":
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> origin/master
Adding package "workshops/mlep-qwiklabs/tfserving-canary-gke".

Fetched 1 package(s).

cd tfserving-canary

Creating GKE Cluster

Update compute zone, set the project id and cluster name

gcloud config set compute/zone us-central1-f
PROJECT_ID=$(gcloud config get-value project)

Create GKE Cluster with Istio Add On

gcloud services enable
gcloud beta container clusters create CLUSTER_NAME

WARNING: Currently VPC-native is the default mode during cluster creation for versions greater than 1.21.0-gke.1500. To create advanced routes based clusters, please pass the `--no-enable-ip-alias` flag
WARNING: Starting with version 1.18, clusters will have shielded GKE nodes by default.
WARNING: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
WARNING: Starting with version 1.19, newly created clusters and node-pools will have COS_CONTAINERD as the default node image when no image type is specified.
Creating cluster canary-cluster in us-central1-f...done.     
Created [].
To inspect the contents of your cluster, go to:
kubeconfig entry generated for canary-cluster.
NAME: canary-cluster
LOCATION: us-central1-f
MASTER_VERSION: 1.21.4-gke.2300
MACHINE_TYPE: n1-standard-4
NODE_VERSION: 1.21.4-gke.2300

If you bump into any quota issues, check this and this

Verify the Cluster

gcloud container clusters get-credentials $CLUSTER_NAME

Verify the Istio Services

kubectl get service -n istio-system

> istio-citadel ClusterIP  8060/TCP,15014/TCP 3m13s
> istio-galley ClusterIP  443/TCP,15014/TCP,9901/TCP 3m13s
> istio-ingressgateway LoadBalancer 15020:32067/TCP,80:31368/TCP,443:30274/TCP,31400:31329/TCP,15029:31974/TCP,15030:31896/TCP,15031:32040/TCP,15032:31023/TCP,15443:30193/TCP 3m12s
> istio-pilot ClusterIP  15010/TCP,15011/TCP,8080/TCP,15014/TCP 3m12s
> istio-policy ClusterIP  9091/TCP,15004/TCP,15014/TCP 3m11s
> istio-sidecar-injector ClusterIP  443/TCP,15014/TCP 3m11s
> istio-telemetry ClusterIP  9091/TCP,15004/TCP,15014/TCP,42422/TCP 3m11s
> istiod-istio-1611 ClusterIP  15010/TCP,15012/TCP,443/TCP,15014/TCP,853/TCP 89s
> prometheus ClusterIP  9090/TCP 89s
> promsd ClusterIP  9090/TCP 3m11s
> **Verify the Kubernetes Pods and Containers are Deployed and Running**  
> kubectl get pods -n istio-system
> istio-citadel-76685f699d-cgrsw 1/1 Running 0 6m43s
> istio-galley-58d48bcb98-4cds6 1/1 Running 0 6m43s
> istio-ingressgateway-5fb67c59c4-vpq5f 1/1 Running 0 6m43s
> istio-pilot-dc6499cf7-t5kxq 2/2 Running 0 6m42s
> istio-policy-676cd7984-v6jfd 2/2 Running 2 6m42s
> istio-security-post-install-1.4.10-gke.17-ngldz 0/1 Completed 0 6m11s
> istio-sidecar-injector-6bcb464d69-255wf 1/1 Running 0 6m42s
> istio-telemetry-75ff96df6f-qswvt 2/2 Running 2 6m42s
> istiod-istio-1611-8859565d6-lswrk 1/1 Running 0 5m2s
> prometheus-7bd69d7dd-vxdxw 2/2 Running 0 5m2s
> promsd-6d88cd87-9pjpr 2/2 Running 1 6m41s
> **Configuring Automatic Sidecar Injection**

Pods in the Istio mesh run as a sidecar proxy to take full advantage of its capabilities. More info here

kubectl label namespace default istio-injection=enabled

Model Deployment

Acquire the SavedModel Files

export MODEL_BUCKET={PROJECT_ID}-bucket gsutil mb gs://{MODEL_BUCKET}

gsutil cp -r gs://workshop-datasets/models/resnet_101 gs://{MODEL_BUCKET}

Copying gs://workshop-datasets/models/resnet_101/1/saved_model.pb [Content-Type=application/octet-stream]...
Copying gs://workshop-datasets/models/resnet_101/1/variables/ [Content-Type=application/octet-stream]...
Copying gs://workshop-datasets/models/resnet_101/1/variables/variables.index [Content-Type=application/octet-stream]...
- [3 files][173.7 MiB/173.7 MiB]
Operation completed over 3 objects/173.7 MiB.

gsutil cp -r gs://workshop-datasets/models/resnet_50 gs://{MODEL_BUCKET}

Copying gs://workshop-datasets/models/resnet_50/1/saved_model.pb [Content-Type=application/octet-stream]...
Copying gs://workshop-datasets/models/resnet_50/1/variables/ [Content-Type=application/octet-stream]...
Copying gs://workshop-datasets/models/resnet_50/1/variables/variables.index [Content-Type=application/octet-stream]...
\ [3 files][ 99.4 MiB/ 99.4 MiB]
Operation completed over 3 objects/99.4 MiB.

Config Map

Update the config map files with the your bucket name, use cloud editor.

  • File 1: tfserving-canary/tf-serving/configmap-resnet101.yaml
  • File 2: tfserving-canary/tf-serving/configmap-resnet50.yaml

# Copyright 2020 Google Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: ConfigMap
metadata: # kpt-merge: /resnet50-configs
  name: resnet50-configs
  MODEL_NAME: image_classifier
  MODEL_PATH: gs://vf-core-1-bucket/resnet_50  # HERE HERE

Model deployment has 3 steps,

  1. Configuring the deployment via configmap-*.yaml file This file has SavedModel location and name
  2. Deploy the model using deployment-*.yaml file This file has specs for containers, ports, replica details etc
  3. Expose the deployed model as a service using service.yaml This step exposes a stable IP address and DNS name.

kubectl apply -f tf-serving/configmap-resnet50.yaml

- configmap/resnet50-configs created

kubectl apply -f tf-serving/deployment-resnet50.yaml

- deployment.apps/image-classifier-resnet50 created

kubectl get deployments NAME READY UP-TO-DATE AVAILABLE AGE image-classifier-resnet50 1/1 1 1 27s

# service.yaml
apiVersion: v1
kind: Service
  name: image-classifier
  namespace: default
    app: image-classifier
    service: image-classifier
  type: ClusterIP
  - port: 8500
    protocol: TCP
    name: tf-serving-grpc
  - port: 8501
    protocol: TCP
    name: tf-serving-http
    app: image-classifier

The selector field refers to the app: image-classifier label. 
What it means is that the service will load balance across all pods annotated 
with this label. At this point these are the pods comprising the ResNet50 
deployment. The service type is ClusterIP. The IP address exposed by the 
service is only visible within the cluster.

- GCloud QWik Labs

kubectl apply -f tf-serving/service.yaml service/image-classifier created

Configuring Istio Ingress Gateway

Istio Ingress gateway manages inbound and outbount traffic for the service mesh.

kubectl apply -f tf-serving/gateway.yaml

- created

Virtual services, along with destination rules are the key building blocks 
of Istio’s traffic routing functionality. A virtual service lets you configure 
how requests are routed to a service within an Istio service mesh. Each 
virtual service consists of a set of routing rules that are evaluated in 
order, letting Istio match each given request to the virtual service to a 
specific real destination within the mesh.

- GCloud, QWikLabs

kubectl apply -f tf-serving/virtualservice.yaml

- created

Access ResNet50 Model

export INGRESS_HOST=(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

export INGRESS_PORT=(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(“http2”)].port}')




Service Mesh

curl -d @payloads/request-body.json -X POST http://{GATEWAY_URL}/v1/models/image_classifier:predict

#CURL Output
    "predictions": [
            "labels": ["military uniform", "pickelhaube", "suit", "Windsor tie", "bearskin"],
            "probabilities": [0.453408211, 0.209194973, 0.193582058, 0.0409308933, 0.0137334978]

Deploying ResNet101 as a Canary Release

weight selector provides the route splitting information.

# virtualservice-weight-100.yaml
kind: VirtualService
  name: image-classifier
  - "*"
  - image-classifier-gateway
  - route:
    - destination:
        host: image-classifier
        subset: resnet50
          number: 8501
      weight: 100
    - destination:
        host: image-classifier
        subset: resnet101
          number: 8501
      weight: 0

kubectl apply -f tf-serving/virtualservice-weight-100.yaml

- configured

kubectl apply -f tf-serving/configmap-resnet101.yaml

- configmap/resnet101-configs created

kubectl apply -f tf-serving/deployment-resnet101.yaml

- deployment.apps/image-classifier-resnet101 created

kubectl get deployments

NAME READY UP-TO-DATE AVAILABLE AGE image-classifier-resnet101 0/1 1 0 8m35s image-classifier-resnet50 1/1 1 1 21m

Routing Split 70/30

# virtualservice-weight-70.yaml
kind: VirtualService
  name: image-classifier
  - "*"
  - image-classifier-gateway
  - route:
    - destination:
        host: image-classifier
        subset: resnet50
          number: 8501
      weight: 70
    - destination:
        host: image-classifier
        subset: resnet101
          number: 8501
      weight: 30


Routing Split by User Group

user-group selector specifies which user group to use which model. Here canary user group is routed to resnet 101.

kind: VirtualService
  name: image-classifier
  - "*"
  - image-classifier-gateway
  - match:
    - headers:
          exact: canary
      - destination:
          host: image-classifier
          subset: resnet101
            number: 8501
  - route:
    - destination:
        host: image-classifier
        subset: resnet50
          number: 8501

curl -d @payloads/request-body.json -H “user-group: canary” -X POST http://GATEWAY_URL/v1/models/image_classifier:predict


gcloud container clusters delete canary-cluster

gsutil rm -r gs://vf-core-1-bucket

> Removing gs://vf-core-1-bucket/resnet_101/1/saved_model.pb#1634378699740657...
> Removing gs://vf-core-1-bucket/resnet_101/1/variables/
> Removing gs://vf-core-1-bucket/resnet_101/1/variables/variables.index#1634378700932108...
> Removing gs://vf-core-1-bucket/resnet_50/1/saved_model.pb#1634378703445890...
> / [4 objects]
> ==> NOTE: You are performing a sequence of gsutil operations that may
> run significantly faster if you instead use gsutil -m rm ... Please
> see the -m section under "gsutil help options" for further information
> about when gsutil -m can be advantageous.

Removing gs://vf-core-1-bucket/resnet_50/1/variables/
Removing gs://vf-core-1-bucket/resnet_50/1/variables/variables.index#1634378704635678...
/ [6 objects]
Operation completed over 6 objects.
Removing gs://vf-core-1-bucket/...


MLOps is a critical area in machine learning and AI development. Right tools for the right job at right time will significantly save our energy, reduce anxiety and eventually lead to a sound sleep at night. For e.g, a Kubernetes cluster and TF serving are essentials and not a luxury when we think about drifts and continuous integration. Istio service mesh makes our job easy by avoiding one more codebase for an application layer for serving(Flask and FastAPI are the popular candidates). Further, the Canary deployment strategy enables us to make quick upgrades and easy rollback when things go wrong.

This post is a long pending one, MLOps is a huge topic - I guess I will be writing more on it mainly focusing on automation strategies. Hope you all benefit from this post.



Dump of most of the commands I executed in the cloud shell today.

407 gcloud auth list
408 gcloud config set account `ACCOUNT`
409 gcloud config set account **\*\***\***\*\***
410 gcloud config list project
411 pwd
412 cd
413 pwd
414 kpt pkg get tfserving-canary
415 ls
416 cd tfserving-canary/
417 gcloud config set compute/zone us-central1-f
418 PROJECT_ID=$(gcloud config get-value project)
  419  CLUSTER_NAME=canary-cluster
  420  gcloud beta container clusters create $CLUSTER_NAME   --project=$PROJECT_ID --addons=Istio --istio-config=auth=MTLS_PERMISSIVE --cluster-version=latest --machine-type=n1-standard-4 --num-nodes=3
421 gcloud enable
422 gcloud services enable
423 gcloud beta container clusters create $CLUSTER_NAME   --project=$PROJECT_ID --addons=Istio --istio-config=auth=MTLS_PERMISSIVE --cluster-version=latest --machine-type=n1-standard-4 --num-nodes=3
424 gcloud compute project-info describe --project $PROJECT_ID
  425  gcloud compute regions describe region-name
  426  gcloud compute regions describe us-central1-f
  427  gcloud compute regions describe us-central1
  428  gcloud config set compute/zone us-central1
  429  gcloud config set compute/zone us-central1-f
  430  gcloud beta container clusters create $CLUSTER_NAME   --project=$PROJECT_ID --addons=Istio --istio-config=auth=MTLS_PERMISSIVE --cluster-version=latest --machine-type=n1-standard-4 --num-nodes=3
431 gcloud beta container clusters create $CLUSTER_NAME   --project=$PROJECT_ID --addons=Istio --istio-config=auth=MTLS_PERMISSIVE --cluster-version=latest --machine-type=n1-standard-4 --num-nodes=2
432 gcloud container clusters get-credentials $CLUSTER_NAME
  433  kubectl get service -n istio-system
  434  kubectl get pods -n istio-system
  435  kubectl label namespace default istio-injection=enabled
  436  export MODEL_BUCKET=${PROJECT_ID}-bucket
437 gsutil mb gs://${MODEL_BUCKET}
  438  gsutil cp -r gs://workshop-datasets/models/resnet_101 gs://${MODEL_BUCKET}
439 gsutil cp -r gs://workshop-datasets/models/resnet_50 gs://${MODEL_BUCKET}
  440  echo $MODEL_BUCKET
  441  kubectl apply -f tf-serving/configmap-resnet50.yaml
  442  kubectl apply -f tf-serving/deployment-resnet50.yaml
  443  kubectl get deployments
  444  kubectl apply -f tf-serving/service.yaml
  445  kubectl apply -f tf-serving/gateway.yaml
  446  kubectl apply -f tf-serving/virtualservice.yaml
  447  export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
448 export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?("http2")].port}')
  450  echo $GATEWAY_URL
  451  curl -d @payloads/request-body.json -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
452 ls
453 echo $GATEWAY_URL
  454  kubectl apply -f tf-serving/virtualservice-weight-100.yaml
  455  kubectl apply -f tf-serving/configmap-resnet101.yaml
  456  kubectl apply -f tf-serving/deployment-resnet101.yaml
  457  curl -d @payloads/request-body.json -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
458 kubectl apply -f tf-serving/virtualservice-weight-70.yaml
459 curl -d @payloads/request-body.json -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
  460  kubectl apply -f tf-serving/destinationrule.yaml
  461  curl -d @payloads/request-body.json -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
462 kubectl apply -f tf-serving/virtualservice-weight-100.yaml
463 kubectl apply -f tf-serving/configmap-resnet101.yaml
464 kubectl apply -f tf-serving/deployment-resnet101.yaml
465 curl -d @payloads/request-body.json -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
  466  kubectl apply -f tf-serving/virtualservice-weight-70.yaml
  467  curl -d @payloads/request-body.json -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
468 kubectl get deployments
469 kubectl apply -f tf-serving/virtualservice-focused-routing.yaml
470 curl -d @payloads/request-body.json -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
  471  curl -d @payloads/request-body.json -H "user-group: canary" -X POST http://$GATEWAY_URL/v1/models/image_classifier:predict
472 kubectl get deployments
473 kubectl apply -f tf-serving/deployment-resnet101.yaml
474 history