Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Big Data Architecture
Next
Download to read offline and view in fullscreen.

6

Share

Download to read offline

Machine Learning, hype or hit?

Download to read offline

Machine Learning 101, as presented at SAPTechEd Barcelona, nov 2016

Machine Learning, hype or hit?

  1. 1. ANP126 Machine Learning: Hype or Hit? Fred Verheul
  2. 2. Agenda 1. Introduction: Hype or Hit?! 2. Machine Learning 1. Demo, SAP ICN 2. Skill set for aspiring ML experts 3. Take-aways 2
  3. 3. Agenda 1. Introduction: Hype or Hit?! 2. Machine Learning 1. Demo, SAP ICN 2. Skill set for aspiring ML experts 3. Take-aways 3
  4. 4. Machine Learning "Field of study that gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959) 4
  5. 5. What is Machine Learning? 5 Computer Computer Traditional Programming Machine Learning Data Data Program Output Program Output
  6. 6. Examples: Recommender systems 6
  7. 7. Examples: Natural Language Processing 7 Siri Google Translate
  8. 8. Examples, continued… 8 SPAM- filtering Handwriting recognition
  9. 9. ML in the news: IBM Watson 9
  10. 10. ML in the news: Deepmind’s AlphaGo 10
  11. 11. ML in the news: business example 11
  12. 12. Vendor Platforms… 12
  13. 13. Tricking a neural network… 13 A cat! Surely also a cat?! More examples and explanation by Julia Evans (@b0rk)
  14. 14. Machine Learning gone wrong 14
  15. 15. Data Mining Fail (by Carina C. Zona) 15
  16. 16. Prediction is hard… 16
  17. 17. Agenda 1. Introduction: Hype or Hit?! 2. Machine Learning 1. Demo, SAP ICN 2. Skill set for aspiring ML experts 3. Take-aways 17
  18. 18. CRISP-DM: data mining process 18 ML important ML important
  19. 19. Data: terminology 19 feature target / label instance
  20. 20. Examples of ML tasks Supervised learning Regression  target is numeric Classification  target is categorical 20 Unsupervised learning Clustering Dimensionality reduction
  21. 21. Exploratory Data Analysis 21
  22. 22. Data preparation • Data Cleaning • Missing Data • Feature Engineering • Normalization • Categorical data  Numerical features • Log-based features or target • Date/time-related features • Combine features, e.g. by +, -, x, / 22
  23. 23. Modeling: so many algorithms… 23
  24. 24. ML Algorithms: by Representation Collection of candidate models/programs, aka hypothesis space 24 Decision trees Instance-based Neural networks Model ensembles
  25. 25. ML Algorithms: by Evaluation Evaluation: Quality measure for a model 25 Regression Example metric: Root Mean Squared Error RMSE = Binary classification: confusion matrix Accuracy: 8 + 971 -> 97,9% Example: medical test for a disease Positive Negative P True positives TP False Negatives FN N False positives FP True Negatives TN True Class Predicted class Accuracy: Better evaluation metrics: • Precision: 8 / (8 + 19) • Recall: 8 / (8 + 2)
  26. 26. Optimization: how the algorithm ‘learns’, depends on representation and evaluation ML Algorithms: by Optimization 26 Greedy Search, ex. of combinatorial optimization Gradient Descent (or in general: Convex Optimization) Linear Programming (or in general: Constrained/Nonlinear Optimization)
  27. 27. Algorithms by Evaluation: Heuristics • Hill climbing • Simulated Annealing • Nelder-Mead Simplex Method • Artificial Bee Colony Optimization • Genetic Algorithms • Particle Swarm Optimization • Ant Colony Optimization 27
  28. 28. Choice of ML-algorithm, considerations • Size & Dimensionality of training set • Computational efficiency • Model building, no of parameters • Eager vs lazy learning • Online vs batch • Interpretability 28
  29. 29. Evaluation: training vs test data 29 5-fold cross validation
  30. 30. Training error vs test error 30
  31. 31. Overfitting 31
  32. 32. Chebishev distance (L∞-norm: || ||∞ ) || P – Q ||∞ = max( , ) Number of moves of a King on a chessboard ;-) Manhattan distance (L1-norm: || ||1 ) || P – Q ||1 = + 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 Line through (2,2) and (6,5) Line y = 2 (between 2 and 6) Vertical line x = 6 (between 2 and 5) Distance metrics Euclidean distance (L2-norm: || ||2 ) || P – Q ||2 = (length of) 32 P Q Many more: Cosine distance, Edit distance (aka Levenshtein distance), …
  33. 33. Agenda 1. Introduction: Hype or Hit?! 2. Machine Learning 1. Demo, SAP ICN 2. Skill set for aspiring ML experts 3. Take-aways 33
  34. 34. Agenda 1. Introduction: Hype or Hit?! 2. Machine Learning 1. Demo, SAP ICN 2. Skill set for aspiring ML experts 3. Take-aways 34
  35. 35. So you want to be a Data Scientist? 35
  36. 36. CRISP-DM: data mining process 36
  37. 37. Hacking skills • Programming languages: • Libraries (examples): • Tensorflow, Caffe, Theano, Keras • SciPy & scikit-learn • Spark MLLib (Scala/Java/Python) 37
  38. 38. Math skills: Statistics 38 Source: http://xkcd.com/552/
  39. 39. More math skills that may be needed… 39 Calculus Linear Algebra
  40. 40. Data Science for Business • Focuses more on general principles than specific algorithms • Not math-heavy, does contain some math • O’Reilly link: http://shop.oreilly.com/product/063692 0028918.do • Book website: http://data-science-for- biz.com/DSB/Home.html 40
  41. 41. Agenda 1. Introduction: Hype or Hit?! 2. Machine Learning 1. Demo, SAP ICN 2. Skill set for aspiring ML experts 3. Take-aways 41
  42. 42. What has NOT been covered • Deep learning / Neural Networks • Specifics of ML-algorithms • Tools / Libraries / Code • SAP Products, like HANA / Predictive Analytics / Vora / … • Hardware • … 42
  43. 43. Take-aways • Goal of ML: generalize from training data (not optimization!!) • Part of ‘Data Mining Process’, not a goal in and of itself • No magic! Just some clever algorithms… • Increasingly important non-technical aspects: • Ethics • Algorithmic transparency 43
  44. 44. Thank You www.soapeople.com info@soapeople.com @SOAPEOPLE Fred Verheul Big Data Consultant +31 6 3919 2986 fred.verheul@soapeople.com
  • nourredineZaher

    May. 2, 2019
  • PaulSoriano6

    Apr. 24, 2018
  • AchimT

    Dec. 20, 2016
  • YanitsaKircheva

    Nov. 29, 2016
  • timoelliott

    Nov. 29, 2016
  • BernhardLuecke

    Nov. 29, 2016

Machine Learning 101, as presented at SAPTechEd Barcelona, nov 2016

Views

Total views

1,530

On Slideshare

0

From embeds

0

Number of embeds

13

Actions

Downloads

39

Shares

0

Comments

0

Likes

6

×