Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AI & Machine Learning Pipelines with Knative

KubeCon 2018 North America AI ML Knative Talk

AI & Machine Learning Pipelines with Knative

  1. 1. Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model AI & Machine Learning Pipelines with Knative @AnimeshSingh @Tomipli
  2. 2. ≈ Center for Open Source Data and AI Technologies (CODAIT) Code – Build and improve practical frameworks to enable more developers to realize immediate value. Content – Showcase solutions for complex and real-world AI problems. Community – Bring developers and data scientists to engage with IBM Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow Improving Enterprise AI lifecycle in Open Source •  Team contributes to over 10 open source projects •  17 committers and many contributors in Apache projects •  Over 1100 JIRAs and 66,000 lines of code committed to Apache Spark itself; over 65,000 LoC into SystemML •  Over 25 product lines within IBM leveraging Apache Spark •  Speakers at over 100 conferences, meetups, unconferences and more CODAIT codait.org
  3. 3. Agenda 3 •  Progress in ML and DL •  AI and Cloud: Complimentary •  Why we need Knative to build AI Platform •  How to build a transparent and trusted AI platform leveraging Knative •  Demo CODAIT codait.org
  4. 4. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 4 1997
  5. 5. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 5 2011
  6. 6. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 6 2017
  7. 7. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 7 2017 2018
  8. 8. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 8
  9. 9. A human brain has: •  200 billion neurons •  32 trillion connections between them •  25 million “neurons” •  100 million connections (parameters) Deep Learning = Training Artificial Neural Networks IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 9
  10. 10. Neural Network Design Workflow! domain data design neural network HPO •  neural network structure •  hyperparameters NO Performance meets needs? Start another experiment optimal hyperparameters
  11. 11. Neural Network Design Workflow! domain data HPO •  neural network structure •  hyperparameters NO yes Performance meets needs? Start another experiment trained model deployCloud optimal hyperparameters evaluate BAD Still good! design neural network
  12. 12. So AI in general and Deep Learning in particular are very iterative and repetitive. And they need Cloud. Why?
  13. 13. AI requires the strength of HPC & GPUs
  14. 14. Ability to scale AI workloads on demand
  15. 15. 15 ! ! 1.  Model/Data Parallelism! ! 2.  MPI! 3.  NCCL! 4.  …..! ! Ability to utilize various technologies and achieve high performance computing.
  16. 16. 16 ! ! Microservices! ! Containers! ! DevOps automation! To scale we need to go Cloud native for AI
  17. 17. API! UI! CLI! Kubernetes ! ! Master! Worker Node 1! Worker Node 2! Worker Node 3! Worker Node n! Registry •  Etcd •  API Server •  Controller Manager Server •  Scheduler Server And Cloud native means we use Kubernetes!!
  18. 18. 18 But is Kubernetes enough for Cloud Native platform?!
  19. 19. 19 Kubernetes is not the end game – Says who?!
  20. 20. 20 What else do we need? Let`s understand it from the context of an AI Lifecycle! AI PLATFORM
  21. 21. AIOps Prepared and Analyzed Data AI PLATFORM Initial Model Trained Model Deployed Model We need a Cloud native AI Platform to build, train, deploy and monitor Models!
  22. 22. AIOps Prepared and Analyzed Data AI PLATFORM Trained Model Deployed Model Also, we need a transparent and trusted AI Platform! Prepared and Analyzed Data AI PLATFORM Initial Model Deployed Model Is model training is showing increasing loss? Are model weights biased? Is the model vulnerable to adversarial attacks? Has the training data changed? Are the model predictions less accurate? Are Hyperparameters suboptimal? Are predictions biased? Is the dataset biased?
  23. 23. AIOps Prepared and Analyzed Data Trained Model Deployed Model Infact, we need a transparent, trusted and automated AI Pipeline! Prepared and Analyzed Data Initial Model Deployed Model Is model training is showing increasing loss? Are model weights biased? Is the model vulnerable to adversarial attacks? Has the training data changed? Are ate model predictions less accurate? Are Hyperparameters suboptimal? Are predictions biased? Is the dataset biased?
  24. 24. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Harden Train Deploy Trigger: Model performance is suboptimal Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Actions Transparent, trusted, automated and event driven AI Pipeline!
  25. 25. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Harden Train Deploy Trigger: Model performance is suboptimal Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Actions Transparent, trusted, automated, event driven and auditable AI Pipeline!
  26. 26. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Debias Train Deploy Trigger: Model performance is biased Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Data Scientist AIOps Engineer Actions Transparent, trusted, automated, event driven, auditable AI Pipeline as a Service!
  27. 27. 
 Training Pipe Model Validation Pipe If we translate it to logical architecture, it looks like this…! 
 
 Data Pipe Model Deployment Pipe Deployment Analysis Pipe AI Pipeline (Python Definition – Orchestrate and Track) AI Developer and Data Scientist AI Operator 
 Python Function 
 Python Function 
 Python Function 
 Python Function 
 Python Function
  28. 28. So we need more than Kubernetes. We need to be able to……! build Data Scientists Code orchestrate the ML code automate the ML workflow send event notifications Data Scientist AIOps Engineer
  29. 29. That means, we need the concepts of..! Build Serving Pipeline Eventing
  30. 30. So……We need KNative! Build Serving Pipeline Eventing
  31. 31. ● Build ● Eventing ● Serving ● Pipeline KNative provides a set of building blocks that enable modern, source-centric and container-based serverless workloads on Kubernetes. Uses K8S CRDs to define a new set of APIs around Well….what is Knative?! Build Serving Pipeline Eventing
  32. 32. Knative Build Components •  Build •  Builder •  BuildTemplate For example, you can write a build that uses Kubernetes-native resources to obtain your source code from a repository, build a container image, then run that image. •  A Build can include multiple steps where each step specifies a Builder. •  A Builder is a type of container image that you create to accomplish any task, whether that's a single step in a process, or the whole process itself. •  The steps in a Build can push to a repository. •  A BuildTemplate can be used to defined reusable parameterized templates. Knative Build! Build — Source-to-container build orchestration
  33. 33. KNative Eventing components ●  Sources — Sources that are firing events ●  Channels — A single event forwarding and persistence layer ●  Subscriptions — Deliver and forward events to channels/services. Knative Eventing is designed around the following goals: ●  Knative Eventing services are loosely coupled. ●  Event producers and event sources are independent. ●  Other services can be connected to the Eventing system.. ●  Ensure cross-service interoperability. Knative Eventing is consistent with the CloudEvents specification that is developed by the CNCF Serverless WG. Knative Eventing! Eventing — Delivery and management of events, universal subscription, binding services to event ecosystems,
  34. 34. Knative Pipeline! Knative Pipeline components ● Task — A collection of sequential steps you would want to run as part of your CI flow. It including the inputs/ outputs and steps ● Pipeline — A graph of Tasks to execute. ● Runs — To invoke a Pipeline or a Task. High level details of this design: •  Pipelines do not know what will trigger them, they can be triggered by events or by manually creating PipelineRuns •  Tasks can exist and be invoked completely independently of Pipelines •  Test results are a first class concept, being able to navigate test results easily is powerful •  Tasks can depend on artifacts, output and parameters created by other tasks. •  Resources are the artifacts used as inputs and outputs of TaskRuns. Pipeline— Configure and run CI/CD style pipelines for your kubernetes application
  35. 35. ●  Configuration ○  Desired current state of deployment (#HEAD) ○  Records both code and configuration (separated, ala 12 factor) ○  Stamps out builds / revisions as it is updated ●  Revision ○  Code and configuration snapshot ○  k8s infra: Deployment, ReplicaSet, Pods, etc ●  Route ○  Traffic assignment to Revisions (fractional scaling or by name) ○  Built using Istio ●  Service ○  Provides a simple entry point for UI and CLI tooling to achieve common behavior ○  Acts as a top-level controller to orchestrate Route and Configuration. Configuration Revision Revision Revision Route latest explicit creates Service creates creates Knative Serving! Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic Knative Serving components
  36. 36. Revision Serving Knative Serving!
  37. 37. Deployment Revision Serving Knative Serving Building Block!
  38. 38. Worker Node Worker Node Deployment Replica Set Revision Serving Knative Serving Building Block!
  39. 39. Worker Node Pod Worker Node Deployment Replica Set Revision Serving Knative Serving Building Block!
  40. 40. Worker Node Pod Worker Node Deployment Replica Set Revision Pod Serving Knative Serving Building Block!
  41. 41. Worker Node Worker Node Deployment Revision Replica Set Serving Knative Serving Building Block!
  42. 42. Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  43. 43. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  44. 44. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  45. 45. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Serving Knative Serving Building Block!
  46. 46. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route Serving Knative Serving Building Block!
  47. 47. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route Serving Knative Serving Building Block!
  48. 48. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route 5% 95% Serving Knative Serving Building Block!
  49. 49. Route Configuration Revision Creates References References Serving Knative Serving Building Block!
  50. 50. Route Configuration Revision Creates References References Service Knative Serving Building Block!
  51. 51. Route Configuration Revision Creates References References Creates Service Creates Knative Serving Building Block!
  52. 52. ● Build — Source-to-container build orchestration ● Eventing — Universal subscription, binding services to event ecosystems, delivery and management of events ● Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic ● Pipeline — Configure and run CI/CD style pipelines for your kubernetes application. . So Knative uses K8S CRDs to define a new set of APIs ! Build Serving Pipeline Eventing
  53. 53. AIOps Prepared and Analyzed Data Trained Model Deployed Model But do we really need something like Knative to build an ML Platform?! Prepared and Analyzed Data Initial Model Deployed Model
  54. 54. AIOps Prepared and Analyzed Data Trained Model Deployed Model Let`s go through different phases of AI Lifecycle through our use cases…! Many tools available to build initial models! Prepared and Analyzed Data Initial Model Deployed Model
  55. 55. AIOps Prepared and Analyzed Data Trained Model Deployed Model Many tools to train machine learning and deep learning models! Prepared and Analyzed Data Initial Model Deployed Model
  56. 56. AIOps Prepared and Analyzed Data Trained Model Deployed Model We need a multi framework ML- DL cloud native platform ! Prepared and Analyzed Data Initial Model Deployed Model e.g. Tensorflow is awesome , but has static graphs so PyTorch’s dynamic graphs are becoming more popular. How do we handle multiple Open Source Deep Learning frameworks in a consistent way. and leverage the power of cloud for distributed computing?
  57. 57. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: Fabric for Deep Learning! Prepared and Analyzed Data Initial Model Deployed Model FfDL
  58. 58. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL Github Page https://github.com/IBM/FfDL FfDL dwOpen Page https://developer.ibm.com/code/open/projects/ fabric-for-deep-learning-ffdl/ FfDL Announcement Blog http://developer.ibm.com/code/2018/03/20/ fabric-for-deep-learning FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20/ democratize-ai-with-fabric-for-deep-learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/papers/ paper_29.pdf •  Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’ aims at making Deep Learning easily accessible to Data Scientists, and AI developers. •  FfDL Provides a consistent way to train and visualize Deep Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch, Keras etc. FfDL 58 Community Partners FfDL is one of InfoWorld’s 2018 Best of Open Source Software Award winners for machine learning and deep learning!
  59. 59. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL is built using Microservices architecture on Kubernetes •  FfDL platform uses a microservices architecture to offer resilience, scalability, multi-tenancy, and security without modifying the deep learning frameworks, and with no or minimal changes to model code. •  FfDL control plane microservices are deployed as pods on Kubernetes to manage this cluster of GPU- and CPU- enabled machines effectively •  Tested Platforms: Kube DIND, IBM Cloud Public, IBM Cloud Private, GPUs using both Kubernetes feature gate Accelerators and NVidia device plugins 59
  60. 60. source code training definition Access to elastic compute leveraging Kubernetes Auto-allocation means infrastructure is used only when needed Kubernetes container training artifacts compute cluster NVIDIA Tesla K80, P100, V100 Cloud Object Storage Training assets are managed and tracked. IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 60
  61. 61. NVIDIA GPUs Kubernetes container orchestration training runs containers Model training distributed across containers server cluster dataset Cloud Object Storage 61
  62. 62. OBJECT STORAGE Model Definition 
 Training Data Trained Models REST API Parameter Server Lifecycle Manager Job Monitor Training Data Mongo DB Trainer Service Launch Job Status Job Info " Prometheus Push Gateway Alert Manager Web UI Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Log Collector Training Job MOUNT OBJECT STORAGE BUCKET Elastic Searc h FfDL: Architecture - Current Release! Open MPI CLIs Browser
  63. 63. AIOps Prepared and Analyzed Data Trained Model Deployed Model Great , but we also need a batch scheduler for AI Workloads. Enter Kube-Batch! Prepared and Analyzed Data Initial Model Deployed Model kube-batch
  64. 64. OBJECT STORAGE Model Definition 
 Training Data Trained Models REST API Parameter Server Job Monitor Mongo DB Trainer Service Launch Job Job Info " Prometheus Push Gateway Alert Manager Web UI Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Log Collector Training Job MOUNT OBJECT STORAGE BUCKET Elastic Searc h Open MPI CLIs Browser Scheduling Engine Kube- Batch QueueJob Advanced Batch Scheduling with Kube-Batch !Advanced Batch Scheduling with Kube-Batch !
  65. 65. ! ! ! ! ! ! ! ! ! ! ! ! Kube-Batch: Filling gaps for FfDL and AI Workloads! •  Job Queuing for enabling job priority and preemption •  Holistic scheduling •  Support for job dynamic prioritization/budgeting •  User requested time based scheduling •  Preemption/resumption in a supported checkpoint/ restart env •  Topology aware placement (data intensive workloads requiring network topology aware placement) •  Partial preemption of elastic jobs •  Performance estimation •  Kube-Batch –  Kubernetes incubator project –  Introduces batch scheduling in Kubernetes –  Support for simple job definition and lifecycle management –  Support for job queuing –  Introduces queue-based quota management
  66. 66. 12 Watson services/apps represented as 800+ Kubernetes services IBM Watson workloads: Proven AI workload on IBM Cloud Kubernetes Service One deployment example: 3000+ pods on 500+ nodes “We no longer worry about managing the infrastructure because IBM Cloud Kubernetes Service takes care of that for us.” – Watson Project Team Jason McGee / © 2018 IBM Corporation
  67. 67. AIOps Prepared and Analyzed Data Trained Model Deployed Model Training is accomplished. Model is ready – Can we trust it?! Prepared and Analyzed Data Initial Model Deployed Model Can the model be trusted?
  68. 68. Is it fair? Is it easy to understand? Did anyone tamper with it? Is it accountable? #21, #32, #93 #21, #32, #93 What does it take to trust a decision made by a machine?! (Other than that it is 99% accurate)?!
  69. 69. AIOps Prepared and Analyzed Data Trained Model Deployed Model So let`s start with vulnerability detection of Models?! Prepared and Analyzed Data Initial Model Deployed Model Is the model vulnerable to adversarial attacks?
  70. 70. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: Adversarial Robustness Toolbox! Prepared and Analyzed Data Initial Model Deployed Model ART
  71. 71. IBM Adversarial Robustness Toolbox ART ART is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attack and defense methods for machine learning models. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classifiers. 71 https://github.com/IBM/adversarial-robustness- toolbox The Adversarial Robustness Toolbox contains implementations of the following attacks: Deep Fool (Moosavi-Dezfooli et al., 2015) Fast Gradient Method (Goodfellow et al., 2014) Jacobian Saliency Map (Papernot et al., 2016) Universal Perturbation (Moosavi-Dezfooli et al., 2016) Virtual Adversarial Method (Moosavi-Dezfooli et al., 2015) C&W Attack (Carlini and Wagner, 2016) NewtonFool (Jang et al., 2017) The following defense methods are also supported: Feature squeezing (Xu et al., 2017) Spatial smoothing (Xu et al., 2017) Label smoothing (Warde-Farley and Goodfellow, 2016) Adversarial training (Szegedy et al., 2013) Virtual adversarial training (Miyato et al., 2017)
  72. 72. AIOps Prepared and Analyzed Data Trained Model Deployed Model Robustness check accomplished. How do we check for bias throughout lifecycle?! Prepared and Analyzed Data Initial Model Deployed Model Are model weights and classifiers biased? Are predictions biased? Is the dataset biased?
  73. 73. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: AI Fairness 360! Prepared and Analyzed Data Initial Model Deployed Model AIF360
  74. 74. AI Fairness 360 https://github.com/IBM/AIF360 AIF360AIF360 toolkit is an open-source library to help detect and remove bias in machine learning models. The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models. Toolbox Fairness metrics (30+) Fairness metric explanations Bias mitigation algorithms (9+) 74 Supported bias mitigation algorithms Optimized Preprocessing (Calmon et al., 2017) Disparate Impact Remover (Feldman et al., 2015) Equalized Odds Postprocessing (Hardt et al., 2016) Reweighing (Kamiran and Calders, 2012) Reject Option Classification (Kamiran et al., 2012) Prejudice Remover Regularizer (Kamishima et al., 2012) Calibrated Equalized Odds Postprocessing (Pleiss et al., 2017) Learning Fair Representations (Zemel et al., 2013) Adversarial Debiasing (Zhang et al., 2018) Supported fairness metrics Comprehensive set of group fairness metrics derived from selection rates and error rates Comprehensive set of sample distortion metrics Generalized Entropy Index (Speicher et al., 2018)
  75. 75. (d’Alessandro et al., 2017) AIF 360 detects for fairness in building and deploying models throughout AI Lifecycle!
  76. 76. dataset metric pre- processing algorithm in- processing algorithm post- processing algorithm classifier metric classifier metric explainer dataset metric explainer With Metrics, Algorithms, and Explainers!
  77. 77. AIOps Prepared and Analyzed Data Trained Model Deployed Model Model is trained, tested and validated. Just deploy as microservices on Kubernetes? Do we need anything else?! Prepared and Analyzed Data Initial Model Deployed Model
  78. 78. AIOps Prepared and Analyzed Data Trained Model Deployed Model Model is trained, tested and validated. Just deploy as microservices on Kubernetes? Do we need anything else?! Prepared and Analyzed Data Initial Model Deployed Model q  As I develop multiple versions of my models, how can I easily dark launch and shift traffic? q  How can I add and enforce policies on my model microservices? q  The network among microservices is not reliable. q  How can I monitor and trace my model microservices? q  How can I ensure the communication among microservices is secure?
  79. 79. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter Istio! Prepared and Analyzed Data Initial Model Deployed Model
  80. 80. An open service mesh platform to connect, observe, secure, and control microservices. Istio!
  81. 81. Connect: Traffic Control, Discovery, Load Balancing, Resiliency Observe: Metrics, Logging, Tracing Secure: Encryption (TLS), Authentication, and Authorization of service-to-service communication Control: Policy Enforcement Istio!
  82. 82. BA call How does it work?
 !
  83. 83. 1. Deploy a proxy (Envoy) beside your application (“sidecar deployment”) Env oy A Envoy Env oy B Envoy call How does it work?
 !
  84. 84. 2. Deploy Pilot to configure the sidecars Envoy Pilot config Env oy A Envoy Env oy B Envoy How does it work?
 !
  85. 85. 3. Deploy Telemetry to get telemetry Envoy Pilot Env oy A Envoy Env oy B Envoy Envoy Telemetry telemetry How does it work?
 !
  86. 86. 4. Deploy Citadel to assign identities and enable secure communication Envoy Pilot Env oy A Envoy Env oy B Envoy Envoy Telemetry Envoy Citadel How does it work?
 !
  87. 87. 5. Deploy Policy to enforce policies Envoy Pilot policy decisions Envoy A Envoy Envoy B Envoy Envoy Policy Envoy Telemetry Envoy Citadel How does it work?
 !
  88. 88. Pod Traffic Routing
  89. 89. 50% 50% Traffic Splitting Pod
  90. 90. user = jason Traffic Steering Pod
  91. 91. 100% Access Policy Pod
  92. 92. And….Istio is part of Knative!!!!
  93. 93. AIOps Model is deployed. We need to ensure triggers and actions can be defined based on any vulnerability detection, dataset changes, bias detection etc… ! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  94. 94. AIOps Enter Apache OpenWhisk! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  95. 95. Triggers (response) Rules Actions (code) Source (events) Results OpenWhisk Apache OpenWhisk! FaaS platform to execute code in response to events! Delivered as
 Open source via Apache openwhisk.org
  96. 96. OpenWhisk Apache OpenWhisk!
  97. 97. Event Provider Periodic IBM Cloudant Message Hub Mobile Push Github OpenWhisk IBM App Connect OpenWhisk!
  98. 98. 98 And….OpenWhisk is being developed to work with Knative!!!!
  99. 99. AIOps And last but not the least, what do we use for Pipeline orchestration?! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  100. 100. 100 •  Currently in IBM we use IBM DevOps toolchain pipeline, built on top of Jenkins •  On open source, we are investigating, Pachyderm, Argo and Airflow. •  Pachyderm and Argo are Dataflow and Workflow orchestrators on Kubernetes •  Both have strong merits, and provide a DevOps Operator view around pipeline execution times, logs, metrics, exit handlers, notification systems etc. •  As discussed, Knative is now evolving its own Pipelining capabilities, though in early stage AI Pipeline Orchestration!
  101. 101. AIOps Prepared and Analyzed Data Trained Model Deployed Model AIOps platform is ready. How do Data Scientists interact with it using Notebooks?! Prepared and Analyzed Data Initial Model Deployed Model Jupyter Enterprise Gateway
  102. 102. Jupyter Enterprise Gateway March 30 2018 / © 2018 IBM Corporation Jupyter Enterprise Gateway at IBM Code https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter-incubator/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ 102 A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases Kernel Kernel Kernel Kernel Kernel KernelKernel
  103. 103. Jupyter Notebooks: Support for FfDL User Experience 103 •  User select where to run the experiment •  Job is packaged and submitted on behalf of user •  User has access to Job Console to monitor experiment
  104. 104. AIOps Prepared and Analyzed Data Trained Model Deployed Model Together: A Transparent, and trusted Open Source AI Pipeline using Knative Prepared and Analyzed Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART
  105. 105. DEMO 105 CODAIT codait.org
  106. 106. Demo Model: Gender Classification! Data UTKFace Simple convolutional neural network (CNN) with 3 convolutional layers and 2 fully connected layers Male Female SoftMax Confident Score
  107. 107. Demo Flow! Fabric for Deep Learning Data Robustness Check Model Deployment ADVERSARIAL ROBUSTNESS TOOLBOX Fairness Check AI Fairness 360 Model Revision 1 10% 90% Model Revision 2 Gender Classification Model Source to Container: Knative Build ML Code Training
  108. 108. To fully automate the workflow, Knative Pipeline can be used with Knative Eventing! Fabric for Deep Learning Data Robustness Check Model Deployment ADVERSARIAL ROBUSTNESS TOOLBOX Fairness Check AI Fairness 360 Model Revision 1 10% 90% Model Revision 2 Gender Classification Model Source to Container: Knative Build ML Code Training Eventing
  109. 109. Knative Eventing Sources, Channel and Subscription can be defined!
  110. 110. Knative Pipeline Tasks and Flow Definition can be defined!
  111. 111. For this demonstration, we used Argo Workflows for Pipelining!
  112. 112. •  Knative brings serveless actions, events, workflows, apps, containers, service-mesh in one single stack •  Knative eliminates the need for stitching various individual components together (events, actions, service mesh, pipeline etc.) •  Knative Build and Serving are simple to use and integrate with our existing containers. •  Traffic Routing and Canary testing on Knative can be done without any Istio knowledge. •  Built-in monitoring tools are very helpful to visualize the benefits of building services using Knative. •  Knative Eventing shows promise, needs to evolve more for production •  Knative Build-Pipeline is still in very early stage, and changing rapidly. •  Knative Build-Pipeline could get very complicated with many tasks and it needs a better debugging tool/GUI to monitor errors and visualize the pipeline flow. What we learnt!
  113. 113. AIOps Prepared and Analyzed Data Trained Model Deployed Model Bringing it together: A Secure, transparent, and trusted Open Source AI Pipeline Prepared and Analyzed Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART
  114. 114. Links:! •  Knative: https://github.com/knative •  Fabric for Deep Learning: https://github.com/IBM/FfDL •  Adversarial Robustness Toolbox: https://github.com/IBM/adversarial-robustness-toolbox •  AI Fairness 360: http://aif360.mybluemix.net/ •  Jupyter Enterprise Gateway: https://github.com/jupyter/enterprise_gateway •  Kube-Batch: https://github.com/kubernetes-sigs/kube-batch •  Istio: https://github.com/istio/istio •  OpenWhisk: https://github.com/apache/incubator-openwhisk •  Kiali: https://github.com/kiali/kiali •  Argo: https://github.com/argoproj/argo •  PyTorch: https://pytorch.org/
  115. 115. Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model @AnimeshSingh @Tomipli Thank You! AI & Machine Learning Pipelines with Knative

×