SlideShare a Scribd company logo
1 of 54
Download to read offline
Jupyter Notebooks
Workflow Building
Pipelines
Tools
Serving
Metadata
Kale
Fairing
TFX
KF Pipelines
HP Tuning
Tensorboard
KFServing
Seldon Core
TFServing, + Training Operators
Pytorch
XGBoost, +
Tensorflow
Prometheus
KFServing
Animesh Singh, Tommy Li
MPI
MXNet
❑  Rollouts:
Is this rollout safe? How do I roll
back? Can I test a change
without swapping traffic?
❑  Protocol Standards:
How do I make a prediction?
GRPC? HTTP? Kafka?
❑  Cost:
Is the model over or under scaled?
Are resources being used efficiently?
❑  Monitoring:
Are the endpoints healthy? What is
the performance profile and request
trace?
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
❑  Frameworks:
How do I serve on Tensorflow?
XGBoost? Scikit Learn? Pytorch?
Custom Code?
❑  Features:
How do I explain the predictions?
What about detecting outliers and
skew? Bias detection? Adversarial
Detection?	
❑  How do I wire up custom pre and
post processing	
Production Model Serving? How hard could it be?
❑  How do I handle batch
predictions?
❑  How do I leverage standardized
Data Plane protocol so that I can
move my model across MLServing
platforms?
●  Seldon	Core	was	pioneering	Graph	Inferencing.	
●  IBM	and	Bloomberg	were	exploring	serverless	ML	lambdas.	IBM	gave	a	talk	on	
the	ML	Serving	with	Knative	at	last	KubeCon	in	Seattle	
●  Google	had	built	a	common	Tensorflow	HTTP	API	for	models.	
●  Microsoft	Kubernetizing	their	Azure	ML	Stack	
Experts fragmented across industry
●  Kubeflow	created	the	conditions	for	collaboration.	
●  A	promise	of	open	code	and	open	community.	
●  Shared	responsibilities	and	expertise	across	multiple	companies.	
●  Diverse	requirements	from	different	customer	segments	
Putting the pieces together
Introducing KFServing
KFServing
●  Founded by Google, Seldon, IBM, Bloomberg and Microsoft	
●  Part of the Kubeflow project
●  Focus on 80% use cases - single model rollout and update
●  Kfserving 1.0 goals:
○  Serverless ML Inference
○  Canary rollouts
○  Model Explanations
○  Optional Pre/Post processing
KFServing Stack
•  Event	triggered	functions	on	Kubernetes
•  			Scale	to	and	from	zero
•  Queue	based	autoscaling	for	GPUs	and	TPUs.	KNative	autoscaling	by	default	provides	inflight	requests	per	pod	
•  Traditional	CPU	autoscaling	if	desired.	Traditional	scaling	hard	for	disparate	devices	(GPU,	CPU,	TPU)	
Knative provides a set of building blocks that enable declarative, container-based, serverless workloads
on Kubernetes. Knative Serving provides primitives for serving platforms such as:
KNative
IBM is 	
2nd largest contributor
Connect: Traffic Control, Discovery,
Load Balancing, Resiliency
Observe: Metrics, Logging, Tracing
Secure: Encryption (TLS),
Authentication, and Authorization of
service-to-service communication
Control: Policy Enforcement
An open service mesh platform to connect, observe, secure, and control microservices. 	
Founded by Google, IBM and Lyft. IBM is the 2nd largest contributor	
Istio
Manages the hosting aspects of your models
•  InferenceService	-	manages the lifecycle of
models
	
•  Configuration	-	manages history of model
deployments. Two configurations for default and
canary.
	
•  Revision	-	A snapshot of your model version
•  Route	-	Endpoint and network traffic management
Route Default
Configuration		
Revision	1
Revision	M	90
%
KFService	
Canary
Configuration		
Revision	1
Revision	N	10
%
KFServing: Default and
Canary Configurations
Model	Servers	
							-		TensorFlow
- Nvidia TRTIS
- PyTorch
- XGBoost
- SKLearn
- ONNX
				
	
							Components:	
•  									-		Predictor, Explainer, Transformer
(pre-processor, post-processor)
							Storage	
	-		AWS/S3
- GCS
- Azure Blob
- PVC
Supported Frameworks, Components and
Storage Subsystems
The InferenceService architecture consists of a static graph of components which coordinate
requests for a single model. Advanced features such as Ensembling, A/B testing, and Multi-Arm-
Bandits should compose InferenceServices together.
Inference Service Control Plane
KFServing Deployment View
-  Today’s	popular	model	servers,	such	as	TFServing,	ONNX,	Seldon,	TRTIS,	all	
communicate	using	similar	but	non-interoperable	HTTP/gRPC	protocol	
-  KFServing	v1	data	plane	protocol	uses	TFServing	compatible	HTTP	API	and	
introduces	explain	verb	to	standardize	between	model	servers,	punt	on	v2	for	gRPC	
and	performance	optimization.	
KFServing Data Plane Unification
API	 Verb	 Path	 Payload	
List	Models	 GET	 /v1/models	 [model_names]	
Readiness	 GET	 /v1/models/<model_name>	
Predict	 POST	 /v1/models/<model_name>:predict	 Request:	{instances:[]}	
Response:	{predictions:[]}	
Explain	 POST	 /v1/models<model_name>:explain	 Request:	{instances:[]}	
Response:	{predictions:[],	
explanations:[]}	
KFServing Data Plane v1 protocol
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
default:
sklearn:
modelUri: "gs://kfserving-samples/models/sklearn/iris"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: "flowers-sample"
spec:
default:
tensorflow:
modelUri: "gs://kfserving-samples/models/tensorflow/flowers"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: ”pytorch-iris"
spec:
default:
pytorch:
modelUri: "gs://kfserving-samples/models/pytorch/iris"
KFServing Examples
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
name: "my-model"
spec:
default:
# 90% of traffic is sent to this model
tensorflow:
modelUri: "gs://mybucket/mymodel-2"
canaryTrafficPercent: 10
canary:
# 10% of traffic is sent to this model
tensorflow:
modelUri: "gs://mybucket/mymodel-3"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
name: "my-model"
spec:
default:
tensorflow:
modelUri: "gs://mybucket/mymodel-2"
# Defaults to zero, so can also be omitted or explicitly set to zero.
canaryTrafficPercent: 0
canary:
# Canary is created but no traffic is directly forwarded.
tensorflow:
modelUri: "gs://mybucket/mymodel-3"
Canary
Pinned
Canary/Pinned Examples
●  Flexible,	high	performance	serving	system	for	TensorFlow		
●  https://www.tensorflow.org/tfx/guide/serving	
●  Stable	and	Google	has	been	using	it	since	2016	
●  Saved	model	format	and	graphdef	
●  Written	in	C++,	support	both	REST	and	gRPC	
●  KFServing	allows	you	to	easily	spin	off	an	Inference	Service	with	TFServing	to	
serve	your	tensorflow	model	on	CPU	or	GPU	with	serverless	features	like	canary	
rollout,	autoscaling.	
TensorFlow Serving(TFServing)
Inference Service with Transformer and TFServing
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
predictor:
tensorflow:
storageUri: s3://examples/bert
runtimeVersion: 1.14.0-gpu
resources:
limits:
nvidia.com/gpu: 1
Pre/Post Processing
Tensorflow Model
Server
class BertTransformer(kfserving.KFModel):
def __init__(self, name):
super().__init__(name)
self.bert_tokenizer = BertTokenizer(vocab_file)
def preprocess(self, inputs: Dict) -> Dict:
encoded_features = bert_tokenizer.encode_plus(
text=text_a, text_pair=text_b)
return {“input_ids”: encoded_features[“input_ids”],
“input_mask”: encoded_features[“attention_mask”],
“segment_ids”: encoded_features[“segment_ids”],
“label_ids”: 1}
def postprocess(self, inputs: Dict) -> Dict:
return inputs
●  NVIDIA’s	highly-optimized	model	runtime	on	GPUs	
●  https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs	
●  Supports	model	repository,	versioning	
●  Dynamic	batching	
●  Concurrent	model	execution	
●  Supports	TensorFlow,	PyTorch,	ONNX	models	
●  Written	in	C++,	support	both	REST	and	gRPC	
●  TensorRT	Optimizer	can	further	bring	down	the	BERT	inference	latency	
NVIDIA Triton Inference Server
Inference Service with Triton Inference Service
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
env:
name: STORAGE_URI
value: s3://examples/bert_transformer
predictor:
tensorrt:
storageUri: s3://examples/bert
runtimeVersion: r20.02
resources:
limits:
nvidia.com/gpu: 1 Triton	Inference	Server
infer_ctx = InferContext(url, protocol, model_name,
model_version)
unique_ids = np.int32([1])
segment_ids = features["segment_ids"]
input_ids = features["input_ids"]
input_mask = features["input_mask"]
result = infer_ctx.run({ 'unique_ids' : (unique_ids,),
'segment_ids' : (segment_ids,),
'input_ids' : (input_ids,),
'input_mask' : (input_mask,) },
{ 'end_logits' :
InferContext.ResultFormat.RAW,
'start_logits' :
InferContext.ResultFormat.RAW }, batch_size)
Pre/Post Processing
PyTorch Model Server
●  PyTorch	model	server	maintained	by	KFServing	
●  https://github.com/kubeflow/kfserving/tree/master/python/pytorchserver	
●  Implemented	in	Python	with	Tornado	server	
●  Loads	model	state	dict	and	model	class	python	file	
●  GPU	Inference	is	supported	in	KFServing	0.3	release	
	
●  Alternatively	you	can	export	PyTorch	model	in	ONNX	format	and	serve	on	
TensorRT	Inference	Server	or	ONNX	Runtime	Server.
ONNX Runtime Server
●  ONNX	Runtime	is	a	performance-focused	inference	
	engine	for	ONNX	models	
●  https://github.com/microsoft/onnxruntime	
●  Supports	Tensorflow,	Pytorch	models	which	can	be	converted	to	ONNX		
●  Written	in	C++,	support	both	REST	and	gRPC	
●  ONNX	Runtime	optimized	BERT	transformer	network	to	further	bring	down	the	
latency	
https://github.com/onnx/tutorials/blob/master/tutorials/Inference-TensorFlow-
Bert-Model-for-High-Performance-in-ONNX-Runtime.ipynb
Inference Service with PyTorch/ONNX Runtime
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
env:
name: STORAGE_URI
value: s3://examples/bert_transformer
predictor:
pytorch:
storageUri: s3://examples/bert
runtimeVersion: v0.3.0-gpu
resources:
limits:
nvidia.com/gpu: 1 Pytorch	Model	Server
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving-onnx
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
env:
name: STORAGE_URI
value: s3://examples/bert_transformer
predictor:
onnx:
storageUri: s3://examples/bert
runtimeVersion: 0.5.1
Pre/Post Processing
ONNX	Runtime	Server
Pre/Post Processing
GPU Autoscaling - KNative solution
Ingress	
Activator	
(buffers	requests)	
Autoscaler	
Queue	
Proxy	
Model	
server	
when	scale	==	0	or	handling	
burst	capacity	
when	scale	>	0	
metrics	
●  Scale	based	on	#	in-flight	requests	against	expected	concurrency	
●  Simple	solution	for	heterogeneous	ML	inference	autoscaling	
scale	
metrics	
0...N	Replicas	
API	
Requests
`
26	
KFServing: Default, Canary and Autoscaler
But the Data Scientist Sees...
●  A pointer to a Serialized Model File
●  9 lines of YAML
●  A live model at an HTTP endpoint
=
http
●  Scale to Zero
●  GPU Autoscaling
●  Safe Rollouts
●  Optimized Serving Containers
●  Network Policy and Auth
●  HTTP APIs (gRPC soon)
●  Tracing
●  Metrics
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
spec:
default:
predictor:
tensorflow:
storageUri: "gs://kfserving-samples/models/tensorflow/flowers"
Production	users	include:	
Bloomberg
Summary
●  With	KFServing	user	can	easily	deploy	the	service	for	inference	on	GPU	with	
performant	industry	leading	model	servers	as	well	as	benefiting	from	all	the	
serverless	features.	
●  Autoscale	the	inference	workload	based	on	your	QPS,	much	better	resource	
utilization.	
●  gRPC	can	provide	better	performance	over	REST	which	allows	multiplexing	and	
protobuf	is	a	efficient	and	packed	format	than	JSON.	
●  Transformer	can	work	seamlessly	with	different	model	servers	thanks	to	
KFServing’s	data	plane	standardization.	
●  GPUs	benefit	a	lot	from	batching	the	requests.
Model Serving is accomplished. Can the
predictions be trusted?
Prepared
and
Analyzed
Data	
Trained
Model	
Deployed
Model
Prepared
Data	
Untrained
Model	
Can the model explain
its predictions?
Are there concept drifts?
Is there an outlier? 
Is the model vulnerable
to adversarial attacks?
Production ML Architecture
InferenceService	
logger
Broker
Trigger
Outlier	
Detection
Alerting	
API	
Serving	
Model
Explainer
Adversarial	
Detection	
Concept	
Drift
Payload Logging
Payload Logging
Why:
●  Capture payloads for analysis and future retraining of the model
●  Perform offline processing of the requests and responses
KfServing Implementation (alpha):
●  Add to any InferenceService Endpoint: Predictor, Explainer, Transformer
●  Log Requests, Responses or Both from the Endpoint
●  Simple specify a URL to send the payloads
●  URL will receive CloudEvents
POST	/event	HTTP/1.0	
Host:	example.com	
Content-Type:	application/json	
ce-specversion:	1.0	
ce-type:	repo.newItem	
ce-source:	http://bigco.com/repo	
ce-id:	610b6dd4-c85d-417b-b58f-3771e532	
		
<payload>
Payload Logging
apiVersion:	"serving.kubeflow.org/v1alpha2"	
kind:	"InferenceService"	
metadata:	
	name:	"sklearn-iris"	
spec:	
	default:	
			predictor:	
					minReplicas:	1						
					logger:	
							url:	http://message-dumper.default/	
							mode:	all	
					sklearn:	
							storageUri:	"gs://kfserving-samples/models/sklearn/iris"	
							resources:	
									requests:	
											cpu:	0.1
Payload Logging Architecture Examples
Model
InferenceService	
logger
Broker
Trigger
Outlier
Detector
Alerting
API	
Http kafka
Bridge Kafka Cluster
Serving
Machine Learning Explanations
KfServing Explanations
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "income"
spec:
default:
predictor:
sklearn:
storageUri: "gs://seldon-models/sklearn/income/model"
explainer:
alibi:
type: AnchorTabular
storageUri: "gs://seldon-models/sklearn/income/
explainer"	
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "moviesentiment"
spec:
default:
predictor:
sklearn:
storageUri: "gs://seldon-models/sklearn/moviesentiment"
explainer:
alibi:
type: AnchorText
https://github.com/SeldonIO/alibi	
	
	
	State	of	the	art	implementations:	
	
Seldon Alibi:Explain
Janis KlaiseGiovanni Vacanti Alexandru CocaArnaud Van Looveren
•  Anchors	
•  Counterfactuals		
•  Contrastive	explanations	
•  Trust	scores
38
AIX360 toolkit is an open-source library to help explain AI and
machine learning models and their predictions. This includes three
classes of algorithms: local post-hoc, global post-hoc, and directly
interpretable explainers for models that use image, text, and
structured/tabular data.
The AI Explainability360 Python package includes a comprehensive
set of explainers, both at global and local level.
Toolbox
Local post-hoc	
Global post-hoc	
Directly interpretable	
http://aix360.mybluemix.net
AI	Explainability	360		
↳	(AIX360)	
https://github.com/IBM/AIX360	
	
Explanations: Resources
AIX360
AIX360 Explainability in KFServing
ML	Inference	Analysis
Don’t trust predictions on instances outside of training distribution!
•  Outlier Detection
•  Adversarial Detection
•  Concept Drift
ML Inference Analysis
Don’t trust predictions on instances outside of training distribution!
→ Outlier Detection
Detector types:
-  stateful online vs. pretrained offline
-  feature vs. instance level detectors
Data types:
-  tabular, images & time series
Outlier types:
-  global, contextual & collective outliers
Outlier Detection
Don’t trust predictions on instances outside of training distribution!
→ Adversarial Detection
-  Outliers w.r.t. the model prediction
-  Detect small input changes with a big impact on predictions!
Adversarial Detection
Production data distribution != training distribution?
→ Concept Drift! Retrain!
Need to track the right distributions:
-  feature vs. instance level
-  continuous vs. discrete
-  online vs. offline training data
-  track streaming number of outliers
Concept Drift
Outlier Detection on CIFAR10
CIFAR10	Model
InferenceService	
logger
Broker
Trigger
CIFAR10	
Outlier Detector
Message	
Dumper	
API	
Serving
Adversarial Detection Demos
KFServing	MNIST	Model	with	
Alibi:Detect	VAE	Adversarial	Detector	
https://github.com/SeldonIO/alibi-detect/tree/master/integrations/
samples/kfserving/ad-mnist	
	
KFServing	Traffic	Signs	Model	with	
Alibi:Detect	VAE	Adversarial	Detector
47
•  
Adversarial Robustness 360 ↳ (ART)
ART is a library dedicated to adversarial machine learning. Its
purpose is to allow rapid crafting and analysis of attack, defense
and detection methods for machine learning models. Applicable
domains include finance, self driving vehicles etc.	
The Adversarial Robustness Toolbox provides an implementation
for many state-of-the-art methods for attacking and defending
classifiers.	
https://github.com/IBM/adversarial-robustness-toolbox	
Toolbox: Attacks, defenses, and metrics
Evasion attacks
Defense methods
Detection methods
Robustness metrics	
ART	
https://art-demo.mybluemix.net/
Open	Source	Dojo	 48	
KFServing – Demo One https://www.youtube.com/watch?v=xg5ar6vSAXY&t
Alert		
Temp	
check	
Event	Backbone	-	Kafka			
Data	Processing		-	Spark		
Weather	
Data	
Predict	near	
failure	/	Schedule	
maintenance	
Microservices	
MQTT	
Store	n	
Store	2	
HTTP	
Data	Collection	
Nifi	Debeezium	
Store	1	
Postgres	
Minio	
CDC	
Connect & Collect Organize & Integrate Analyze & Infuse
	
Kubeflow		
KFServing	
Application
Model
History S3
Data Warehouse
KFServing Pipeline – Demo
Demo Scenario: Intelligent Refrigeration Unit Monitoring Scenario
Problem	Statement		
•  The	failure	of	a	refrigerated	unit	is	a	common	problem	with	failures	leading	to	spoiled	good	which	become	a	cost	to	the	business.		Performing	manual	checks	
on	the	units	is	time	consuming,	costly	and	typically	is	just	a	visual	check	to	see	if	the	unit	is	running	and	the	temperature	in	range.		
•  As	a	general	case	being	able	to	automatically	monitor	the	operation	of	the	units	,	and	predict	the	likelihood	of	the	unit	failing	in	advance	of	an	actual	failure	
can	save	costs	in	terms	of	reducing	manual	workload	and	more	significantly	in	reducing	the	amount	of	spoiled	goods	by	being	able	to	take	preventative	
actions	before	the	unit	fails.		
•  Such	an	intelligent	monitoring	solution	has	value	across	different	industries,	from	super	markets,	to	healthcare	but	perhaps	one	of	the	most	significant	
opportunities	comes	with	container	shipping	and	monitoring	the	refrigerated	containers	through	their	life	cycle	of	being	loaded	at	a	supplier,	being	
transported	to	the	dock,	loaded	to	a	container	ship,	through	the	sea	voyage,	being	unloaded	and	transported	to	the	importer.	
•  Being	able	to	predict	a	likely	failure	before	loading	a	container	onto	a	ship,	enables	preventive	maintenance	to	be	done	before	loading,		this	reduces	risk	of	
good	being	spoiled	while	at	sea	where	manual	actions	are	limited.		Such	“intelligently”	monitored	containers	incur	lower	insurance	premiums	for	the	voyage	
as	the	risk	of	loss	of	good	is	reduced.		
	
Proposed	Intelligent	Refrigeration	monitoring	system	
•  The	IT	team	are	proposing	the	development	of	an	intelligent	application	for	monitoring	and	management	of	the	refrigeration	units	in	the	containers.	To	
accelerate	the	development	of	the	application	they	IT	team	are	proposing	to	follow	a	cloud	native	approach	,	building	on	a	Kubernetes	(	OpenShift	)	based	
private	cloud.	The	development	team	will	follow	the	cloud	native	architecture	and	application	patterns	for	intelligent	applications		which	include		
1.  Device	Data	collection	
2.  Real	time	intelligence	with	Machine	Learning		
3.  Event	driven	applications	with	Event	Driven	Microservices
KFServing Pipeline – Demo Two
MNIST	model	end	to	end	using	Kubeflow	Pipelines	with	Tekton.	The	notebook	demonstrates	how	to	compile	and	execute	an	End	to	End	Machine	Learning	
workflow	that	uses	Katib,	TFJob,	KFServing,	and	Tekton	pipeline.	This	pipeline	contains	5	steps,	it	finds	the	best	hyperparameter	using	Katib,	creates	PVC	for	storing	
models,	processes	the	hyperparameter	results,	trains	the	model	on	TFJob	with	the	best	hyperparameter	using	more	iterations,	and	finally	serves	the	model	using	
KFServing..	
			
https://github.com/kubeflow/kfp-tekton/tree/master/samples/e2e-mnist
KFServing – Existing Features
q  Crowd sourced capabilities – Contributions by AWS, Bloomberg, Google, Seldon, IBM, NVidia and others.
q  Support for multiple runtimes pre-integrated (TFServing, Nvdia Triton (GPU optimization), ONNX Runtime, SKLearn,
PyTorch, XGBoost, Custom models.
q  Serverless ML Inference and Autoscaling: Scale to zero (with no incoming traffic) and Request queue based autoscaling
q  Canary and Pinned rollouts: Control traffic percentage and direction, pinned rollouts
q  Pluggable pre-processor/post-processor via Transformer: Gives capabilities to plug in pre-processing/post-processing
implementation, control routing and placement (e.g. pre-processor on CPU, predictor on GPU)
q  Pluggable analysis algorithms: Explainability, Drift Detection, Anomaly Detection, Adversarial Detection (contributed by
Seldon) enabled by Payload Logging (built using CloudEvents standardized eventing protocol)
q  Batch Predictions: Batch prediction support for ML frameworks (TensorFlow, PyTorch, ...)
q  Integration with existing monitoring stack around Knative/Istio ecosystem: Kiali (Service placements, traffic and graphs),
Jaeger (request tracing), Grafana/Prometheus plug-ins for Knative)
q  Multiple clients: kubectl, Python SDK, Kubeflow Pipelines SDK
q  Standardized Data Plane V2 protocol for prediction/explainability et all: Already implemented by Nvidia Triton
q  MMS: Multi-Model-Serving for serving multiple models per custom KFService instance
q  More Data Plane v2 API Compliant Servers: SKLearn, XGBoost, PyTorch…
q  Multi-Model-Graphs and Pipelines: Support chaining multiple models together in a Pipelines
q  PyTorch support via AWS TorchServe
q  gRPC Support for all Model Servers
q  Support for multi-armed-bandits
q  Integration with IBM AIX360 for Explainability, AIF360 for Bias detection and ART for Adversarial detection
KFServing – Upcoming Features
Open Source Projects
●  ML Inference
○  KFServing
○  Seldon Core
https://github.com/kubeflow/kfserving
https://github.com/SeldonIO/seldon-core
●  Model Explanations
○  Seldon Alibi
○  IBM AI Explainability 360	
https://github.com/seldonio/alibi
https://github.com/IBM/AIX360
●  Outlier and Adversarial Detection and Concept Drift
○  Seldon Alibi-detect
https://github.com/seldonio/alibi-detect
●  Adversarial Attack, Detection and Defense
○  IBM Adversarial Robustness 360	
https://github.com/IBM/adversarial-robustness-toolbox

More Related Content

What's hot

MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowJan Kirenz
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingTheofilos Papapanagiotou
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesArun Gupta
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflowamesar0
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)DataWorks Summit
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For DevelopersKevin Brockhoff
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Animesh Singh
 
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계
[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계PgDay.Seoul
 
An intro to Kubernetes operators
An intro to Kubernetes operatorsAn intro to Kubernetes operators
An intro to Kubernetes operatorsJ On The Beach
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud PlatformMeetupDataScienceRoma
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon
 

What's hot (20)

MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflow
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계
[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계
 
Observability
ObservabilityObservability
Observability
 
An intro to Kubernetes operators
An intro to Kubernetes operatorsAn intro to Kubernetes operators
An intro to Kubernetes operators
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
 

Similar to KFServing - Serverless Model Inferencing

Service Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathonService Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathonMichael Hofmann
 
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...Nicola Ferraro
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...confluent
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...HostedbyConfluent
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesMatt Turner
 
Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)Jose Ulises Nino Rivera
 
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud OrchestrationCloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud OrchestrationCloudify Community
 
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...confluent
 
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018Amazon Web Services Korea
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless SolutionRyan ZhangCheng
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksAndrii Gakhov
 
Managing Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesManaging Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesIftach Schonbaum
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
 
Open Source Networking Days- Service Mesh
Open Source Networking Days- Service MeshOpen Source Networking Days- Service Mesh
Open Source Networking Days- Service MeshCloudOps2005
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetesinside-BigData.com
 

Similar to KFServing - Serverless Model Inferencing (20)

Service Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathonService Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathon
 
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Serving models using KFServing
Serving models using KFServingServing models using KFServing
Serving models using KFServing
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your Microservices
 
Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)
 
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud OrchestrationCloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
 
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
 
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless Solution
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Managing Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesManaging Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on Kubernetes
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
Envoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer ProductivityEnvoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer Productivity
 
Open Source Networking Days- Service Mesh
Open Source Networking Days- Service MeshOpen Source Networking Days- Service Mesh
Open Source Networking Days- Service Mesh
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
 

More from Animesh Singh

Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Animesh Singh
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIAnimesh Singh
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesAnimesh Singh
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Animesh Singh
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceAnimesh Singh
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAnimesh Singh
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAnimesh Singh
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep LearningAnimesh Singh
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Animesh Singh
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...Animesh Singh
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...Animesh Singh
 
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAs a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAnimesh Singh
 
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Animesh Singh
 
Finding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsFinding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsAnimesh Singh
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Animesh Singh
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackAnimesh Singh
 
Cloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source TriumvirateCloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source TriumvirateAnimesh Singh
 
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & CloudantBuild Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & CloudantAnimesh Singh
 

More from Animesh Singh (20)

Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AI
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep Learning
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAs a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
 
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
 
Finding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsFinding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User Groups
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
 
Cloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source TriumvirateCloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source Triumvirate
 
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & CloudantBuild Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

KFServing - Serverless Model Inferencing

  • 1. Jupyter Notebooks Workflow Building Pipelines Tools Serving Metadata Kale Fairing TFX KF Pipelines HP Tuning Tensorboard KFServing Seldon Core TFServing, + Training Operators Pytorch XGBoost, + Tensorflow Prometheus KFServing Animesh Singh, Tommy Li MPI MXNet
  • 2. ❑  Rollouts: Is this rollout safe? How do I roll back? Can I test a change without swapping traffic? ❑  Protocol Standards: How do I make a prediction? GRPC? HTTP? Kafka? ❑  Cost: Is the model over or under scaled? Are resources being used efficiently? ❑  Monitoring: Are the endpoints healthy? What is the performance profile and request trace? Prepared and Analyzed Data Trained Model Deployed Model Prepared Data Untrained Model ❑  Frameworks: How do I serve on Tensorflow? XGBoost? Scikit Learn? Pytorch? Custom Code? ❑  Features: How do I explain the predictions? What about detecting outliers and skew? Bias detection? Adversarial Detection? ❑  How do I wire up custom pre and post processing Production Model Serving? How hard could it be? ❑  How do I handle batch predictions? ❑  How do I leverage standardized Data Plane protocol so that I can move my model across MLServing platforms?
  • 3. ●  Seldon Core was pioneering Graph Inferencing. ●  IBM and Bloomberg were exploring serverless ML lambdas. IBM gave a talk on the ML Serving with Knative at last KubeCon in Seattle ●  Google had built a common Tensorflow HTTP API for models. ●  Microsoft Kubernetizing their Azure ML Stack Experts fragmented across industry
  • 4. ●  Kubeflow created the conditions for collaboration. ●  A promise of open code and open community. ●  Shared responsibilities and expertise across multiple companies. ●  Diverse requirements from different customer segments Putting the pieces together
  • 6. KFServing ●  Founded by Google, Seldon, IBM, Bloomberg and Microsoft ●  Part of the Kubeflow project ●  Focus on 80% use cases - single model rollout and update ●  Kfserving 1.0 goals: ○  Serverless ML Inference ○  Canary rollouts ○  Model Explanations ○  Optional Pre/Post processing
  • 8. •  Event triggered functions on Kubernetes •  Scale to and from zero •  Queue based autoscaling for GPUs and TPUs. KNative autoscaling by default provides inflight requests per pod •  Traditional CPU autoscaling if desired. Traditional scaling hard for disparate devices (GPU, CPU, TPU) Knative provides a set of building blocks that enable declarative, container-based, serverless workloads on Kubernetes. Knative Serving provides primitives for serving platforms such as: KNative IBM is 2nd largest contributor
  • 9. Connect: Traffic Control, Discovery, Load Balancing, Resiliency Observe: Metrics, Logging, Tracing Secure: Encryption (TLS), Authentication, and Authorization of service-to-service communication Control: Policy Enforcement An open service mesh platform to connect, observe, secure, and control microservices. Founded by Google, IBM and Lyft. IBM is the 2nd largest contributor Istio
  • 10. Manages the hosting aspects of your models •  InferenceService - manages the lifecycle of models •  Configuration - manages history of model deployments. Two configurations for default and canary. •  Revision - A snapshot of your model version •  Route - Endpoint and network traffic management Route Default Configuration Revision 1 Revision M 90 % KFService Canary Configuration Revision 1 Revision N 10 % KFServing: Default and Canary Configurations
  • 11. Model Servers - TensorFlow - Nvidia TRTIS - PyTorch - XGBoost - SKLearn - ONNX Components: •  - Predictor, Explainer, Transformer (pre-processor, post-processor) Storage - AWS/S3 - GCS - Azure Blob - PVC Supported Frameworks, Components and Storage Subsystems
  • 12. The InferenceService architecture consists of a static graph of components which coordinate requests for a single model. Advanced features such as Ensembling, A/B testing, and Multi-Arm- Bandits should compose InferenceServices together. Inference Service Control Plane
  • 15. API Verb Path Payload List Models GET /v1/models [model_names] Readiness GET /v1/models/<model_name> Predict POST /v1/models/<model_name>:predict Request: {instances:[]} Response: {predictions:[]} Explain POST /v1/models<model_name>:explain Request: {instances:[]} Response: {predictions:[], explanations:[]} KFServing Data Plane v1 protocol
  • 16. apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: default: sklearn: modelUri: "gs://kfserving-samples/models/sklearn/iris" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: "flowers-sample" spec: default: tensorflow: modelUri: "gs://kfserving-samples/models/tensorflow/flowers" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: ”pytorch-iris" spec: default: pytorch: modelUri: "gs://kfserving-samples/models/pytorch/iris" KFServing Examples
  • 17. apiVersion: "serving.kubeflow.org/v1alpha1" kind: "KFService" metadata: name: "my-model" spec: default: # 90% of traffic is sent to this model tensorflow: modelUri: "gs://mybucket/mymodel-2" canaryTrafficPercent: 10 canary: # 10% of traffic is sent to this model tensorflow: modelUri: "gs://mybucket/mymodel-3" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "KFService" metadata: name: "my-model" spec: default: tensorflow: modelUri: "gs://mybucket/mymodel-2" # Defaults to zero, so can also be omitted or explicitly set to zero. canaryTrafficPercent: 0 canary: # Canary is created but no traffic is directly forwarded. tensorflow: modelUri: "gs://mybucket/mymodel-3" Canary Pinned Canary/Pinned Examples
  • 18. ●  Flexible, high performance serving system for TensorFlow ●  https://www.tensorflow.org/tfx/guide/serving ●  Stable and Google has been using it since 2016 ●  Saved model format and graphdef ●  Written in C++, support both REST and gRPC ●  KFServing allows you to easily spin off an Inference Service with TFServing to serve your tensorflow model on CPU or GPU with serverless features like canary rollout, autoscaling. TensorFlow Serving(TFServing)
  • 19. Inference Service with Transformer and TFServing apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving spec: default transformer: custom: container: image: bert-transformer:v1 predictor: tensorflow: storageUri: s3://examples/bert runtimeVersion: 1.14.0-gpu resources: limits: nvidia.com/gpu: 1 Pre/Post Processing Tensorflow Model Server class BertTransformer(kfserving.KFModel): def __init__(self, name): super().__init__(name) self.bert_tokenizer = BertTokenizer(vocab_file) def preprocess(self, inputs: Dict) -> Dict: encoded_features = bert_tokenizer.encode_plus( text=text_a, text_pair=text_b) return {“input_ids”: encoded_features[“input_ids”], “input_mask”: encoded_features[“attention_mask”], “segment_ids”: encoded_features[“segment_ids”], “label_ids”: 1} def postprocess(self, inputs: Dict) -> Dict: return inputs
  • 20. ●  NVIDIA’s highly-optimized model runtime on GPUs ●  https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs ●  Supports model repository, versioning ●  Dynamic batching ●  Concurrent model execution ●  Supports TensorFlow, PyTorch, ONNX models ●  Written in C++, support both REST and gRPC ●  TensorRT Optimizer can further bring down the BERT inference latency NVIDIA Triton Inference Server
  • 21. Inference Service with Triton Inference Service apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving spec: default transformer: custom: container: image: bert-transformer:v1 env: name: STORAGE_URI value: s3://examples/bert_transformer predictor: tensorrt: storageUri: s3://examples/bert runtimeVersion: r20.02 resources: limits: nvidia.com/gpu: 1 Triton Inference Server infer_ctx = InferContext(url, protocol, model_name, model_version) unique_ids = np.int32([1]) segment_ids = features["segment_ids"] input_ids = features["input_ids"] input_mask = features["input_mask"] result = infer_ctx.run({ 'unique_ids' : (unique_ids,), 'segment_ids' : (segment_ids,), 'input_ids' : (input_ids,), 'input_mask' : (input_mask,) }, { 'end_logits' : InferContext.ResultFormat.RAW, 'start_logits' : InferContext.ResultFormat.RAW }, batch_size) Pre/Post Processing
  • 22. PyTorch Model Server ●  PyTorch model server maintained by KFServing ●  https://github.com/kubeflow/kfserving/tree/master/python/pytorchserver ●  Implemented in Python with Tornado server ●  Loads model state dict and model class python file ●  GPU Inference is supported in KFServing 0.3 release ●  Alternatively you can export PyTorch model in ONNX format and serve on TensorRT Inference Server or ONNX Runtime Server.
  • 23. ONNX Runtime Server ●  ONNX Runtime is a performance-focused inference engine for ONNX models ●  https://github.com/microsoft/onnxruntime ●  Supports Tensorflow, Pytorch models which can be converted to ONNX ●  Written in C++, support both REST and gRPC ●  ONNX Runtime optimized BERT transformer network to further bring down the latency https://github.com/onnx/tutorials/blob/master/tutorials/Inference-TensorFlow- Bert-Model-for-High-Performance-in-ONNX-Runtime.ipynb
  • 24. Inference Service with PyTorch/ONNX Runtime apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving spec: default transformer: custom: container: image: bert-transformer:v1 env: name: STORAGE_URI value: s3://examples/bert_transformer predictor: pytorch: storageUri: s3://examples/bert runtimeVersion: v0.3.0-gpu resources: limits: nvidia.com/gpu: 1 Pytorch Model Server apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving-onnx spec: default transformer: custom: container: image: bert-transformer:v1 env: name: STORAGE_URI value: s3://examples/bert_transformer predictor: onnx: storageUri: s3://examples/bert runtimeVersion: 0.5.1 Pre/Post Processing ONNX Runtime Server Pre/Post Processing
  • 25. GPU Autoscaling - KNative solution Ingress Activator (buffers requests) Autoscaler Queue Proxy Model server when scale == 0 or handling burst capacity when scale > 0 metrics ●  Scale based on # in-flight requests against expected concurrency ●  Simple solution for heterogeneous ML inference autoscaling scale metrics 0...N Replicas API Requests
  • 27. But the Data Scientist Sees... ●  A pointer to a Serialized Model File ●  9 lines of YAML ●  A live model at an HTTP endpoint = http ●  Scale to Zero ●  GPU Autoscaling ●  Safe Rollouts ●  Optimized Serving Containers ●  Network Policy and Auth ●  HTTP APIs (gRPC soon) ●  Tracing ●  Metrics apiVersion: "serving.kubeflow.org/v1alpha2" kind: "InferenceService" metadata: name: "flowers-sample" spec: default: predictor: tensorflow: storageUri: "gs://kfserving-samples/models/tensorflow/flowers" Production users include: Bloomberg
  • 28. Summary ●  With KFServing user can easily deploy the service for inference on GPU with performant industry leading model servers as well as benefiting from all the serverless features. ●  Autoscale the inference workload based on your QPS, much better resource utilization. ●  gRPC can provide better performance over REST which allows multiplexing and protobuf is a efficient and packed format than JSON. ●  Transformer can work seamlessly with different model servers thanks to KFServing’s data plane standardization. ●  GPUs benefit a lot from batching the requests.
  • 29. Model Serving is accomplished. Can the predictions be trusted? Prepared and Analyzed Data Trained Model Deployed Model Prepared Data Untrained Model Can the model explain its predictions? Are there concept drifts? Is there an outlier? Is the model vulnerable to adversarial attacks?
  • 32. Payload Logging Why: ●  Capture payloads for analysis and future retraining of the model ●  Perform offline processing of the requests and responses KfServing Implementation (alpha): ●  Add to any InferenceService Endpoint: Predictor, Explainer, Transformer ●  Log Requests, Responses or Both from the Endpoint ●  Simple specify a URL to send the payloads ●  URL will receive CloudEvents POST /event HTTP/1.0 Host: example.com Content-Type: application/json ce-specversion: 1.0 ce-type: repo.newItem ce-source: http://bigco.com/repo ce-id: 610b6dd4-c85d-417b-b58f-3771e532 <payload>
  • 34. Payload Logging Architecture Examples Model InferenceService logger Broker Trigger Outlier Detector Alerting API Http kafka Bridge Kafka Cluster Serving
  • 36. KfServing Explanations apiVersion: "serving.kubeflow.org/v1alpha2" kind: "InferenceService" metadata: name: "income" spec: default: predictor: sklearn: storageUri: "gs://seldon-models/sklearn/income/model" explainer: alibi: type: AnchorTabular storageUri: "gs://seldon-models/sklearn/income/ explainer" apiVersion: "serving.kubeflow.org/v1alpha2" kind: "InferenceService" metadata: name: "moviesentiment" spec: default: predictor: sklearn: storageUri: "gs://seldon-models/sklearn/moviesentiment" explainer: alibi: type: AnchorText
  • 37. https://github.com/SeldonIO/alibi State of the art implementations: Seldon Alibi:Explain Janis KlaiseGiovanni Vacanti Alexandru CocaArnaud Van Looveren •  Anchors •  Counterfactuals •  Contrastive explanations •  Trust scores
  • 38. 38 AIX360 toolkit is an open-source library to help explain AI and machine learning models and their predictions. This includes three classes of algorithms: local post-hoc, global post-hoc, and directly interpretable explainers for models that use image, text, and structured/tabular data. The AI Explainability360 Python package includes a comprehensive set of explainers, both at global and local level. Toolbox Local post-hoc Global post-hoc Directly interpretable http://aix360.mybluemix.net AI Explainability 360 ↳ (AIX360) https://github.com/IBM/AIX360 Explanations: Resources AIX360
  • 41. Don’t trust predictions on instances outside of training distribution! •  Outlier Detection •  Adversarial Detection •  Concept Drift ML Inference Analysis
  • 42. Don’t trust predictions on instances outside of training distribution! → Outlier Detection Detector types: -  stateful online vs. pretrained offline -  feature vs. instance level detectors Data types: -  tabular, images & time series Outlier types: -  global, contextual & collective outliers Outlier Detection
  • 43. Don’t trust predictions on instances outside of training distribution! → Adversarial Detection -  Outliers w.r.t. the model prediction -  Detect small input changes with a big impact on predictions! Adversarial Detection
  • 44. Production data distribution != training distribution? → Concept Drift! Retrain! Need to track the right distributions: -  feature vs. instance level -  continuous vs. discrete -  online vs. offline training data -  track streaming number of outliers Concept Drift
  • 45. Outlier Detection on CIFAR10 CIFAR10 Model InferenceService logger Broker Trigger CIFAR10 Outlier Detector Message Dumper API Serving
  • 47. 47 •  Adversarial Robustness 360 ↳ (ART) ART is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attack, defense and detection methods for machine learning models. Applicable domains include finance, self driving vehicles etc. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classifiers. https://github.com/IBM/adversarial-robustness-toolbox Toolbox: Attacks, defenses, and metrics Evasion attacks Defense methods Detection methods Robustness metrics ART https://art-demo.mybluemix.net/
  • 48. Open Source Dojo 48 KFServing – Demo One https://www.youtube.com/watch?v=xg5ar6vSAXY&t
  • 50. Demo Scenario: Intelligent Refrigeration Unit Monitoring Scenario Problem Statement •  The failure of a refrigerated unit is a common problem with failures leading to spoiled good which become a cost to the business. Performing manual checks on the units is time consuming, costly and typically is just a visual check to see if the unit is running and the temperature in range. •  As a general case being able to automatically monitor the operation of the units , and predict the likelihood of the unit failing in advance of an actual failure can save costs in terms of reducing manual workload and more significantly in reducing the amount of spoiled goods by being able to take preventative actions before the unit fails. •  Such an intelligent monitoring solution has value across different industries, from super markets, to healthcare but perhaps one of the most significant opportunities comes with container shipping and monitoring the refrigerated containers through their life cycle of being loaded at a supplier, being transported to the dock, loaded to a container ship, through the sea voyage, being unloaded and transported to the importer. •  Being able to predict a likely failure before loading a container onto a ship, enables preventive maintenance to be done before loading, this reduces risk of good being spoiled while at sea where manual actions are limited. Such “intelligently” monitored containers incur lower insurance premiums for the voyage as the risk of loss of good is reduced. Proposed Intelligent Refrigeration monitoring system •  The IT team are proposing the development of an intelligent application for monitoring and management of the refrigeration units in the containers. To accelerate the development of the application they IT team are proposing to follow a cloud native approach , building on a Kubernetes ( OpenShift ) based private cloud. The development team will follow the cloud native architecture and application patterns for intelligent applications which include 1.  Device Data collection 2.  Real time intelligence with Machine Learning 3.  Event driven applications with Event Driven Microservices
  • 51. KFServing Pipeline – Demo Two MNIST model end to end using Kubeflow Pipelines with Tekton. The notebook demonstrates how to compile and execute an End to End Machine Learning workflow that uses Katib, TFJob, KFServing, and Tekton pipeline. This pipeline contains 5 steps, it finds the best hyperparameter using Katib, creates PVC for storing models, processes the hyperparameter results, trains the model on TFJob with the best hyperparameter using more iterations, and finally serves the model using KFServing.. https://github.com/kubeflow/kfp-tekton/tree/master/samples/e2e-mnist
  • 52. KFServing – Existing Features q  Crowd sourced capabilities – Contributions by AWS, Bloomberg, Google, Seldon, IBM, NVidia and others. q  Support for multiple runtimes pre-integrated (TFServing, Nvdia Triton (GPU optimization), ONNX Runtime, SKLearn, PyTorch, XGBoost, Custom models. q  Serverless ML Inference and Autoscaling: Scale to zero (with no incoming traffic) and Request queue based autoscaling q  Canary and Pinned rollouts: Control traffic percentage and direction, pinned rollouts q  Pluggable pre-processor/post-processor via Transformer: Gives capabilities to plug in pre-processing/post-processing implementation, control routing and placement (e.g. pre-processor on CPU, predictor on GPU) q  Pluggable analysis algorithms: Explainability, Drift Detection, Anomaly Detection, Adversarial Detection (contributed by Seldon) enabled by Payload Logging (built using CloudEvents standardized eventing protocol) q  Batch Predictions: Batch prediction support for ML frameworks (TensorFlow, PyTorch, ...) q  Integration with existing monitoring stack around Knative/Istio ecosystem: Kiali (Service placements, traffic and graphs), Jaeger (request tracing), Grafana/Prometheus plug-ins for Knative) q  Multiple clients: kubectl, Python SDK, Kubeflow Pipelines SDK q  Standardized Data Plane V2 protocol for prediction/explainability et all: Already implemented by Nvidia Triton
  • 53. q  MMS: Multi-Model-Serving for serving multiple models per custom KFService instance q  More Data Plane v2 API Compliant Servers: SKLearn, XGBoost, PyTorch… q  Multi-Model-Graphs and Pipelines: Support chaining multiple models together in a Pipelines q  PyTorch support via AWS TorchServe q  gRPC Support for all Model Servers q  Support for multi-armed-bandits q  Integration with IBM AIX360 for Explainability, AIF360 for Bias detection and ART for Adversarial detection KFServing – Upcoming Features
  • 54. Open Source Projects ●  ML Inference ○  KFServing ○  Seldon Core https://github.com/kubeflow/kfserving https://github.com/SeldonIO/seldon-core ●  Model Explanations ○  Seldon Alibi ○  IBM AI Explainability 360 https://github.com/seldonio/alibi https://github.com/IBM/AIX360 ●  Outlier and Adversarial Detection and Concept Drift ○  Seldon Alibi-detect https://github.com/seldonio/alibi-detect ●  Adversarial Attack, Detection and Defense ○  IBM Adversarial Robustness 360 https://github.com/IBM/adversarial-robustness-toolbox