SlideShare a Scribd company logo
1 of 21
Practical Machine
Learning and Rails
Andrew Cantino
  VP Engineering, Mavenlink    @tectonic




  Founder, Agile Productions   @ryanstout
This talk will
- introduce machine learning

- make you ML-aware

- have examples
This talk will not
- give you a PhD

- implement algorithms

- cover collaborative filtering,
  optimization, clustering, advanced statistics,   genetic algorithms, classical AI, NLP, ...
What is Machine Learning?
Many different algorithms

that predict data

from other data

using applied statistics.
"Enhance and rotate 20 degrees"
What data?
       The web is data.

                                           User decisions
       APIs         A/B Tests
                                 Databases
                   Logs          Streams



Browser versions
                       Reviews
                                  Clicktrails
Okay. We have data.
What do we do with it?


We   classify it.
Classification
Classification



            OR
Classification



    :)      OR   :(
Classification
• Documents
    o Sort email (Gmail's importance filter)
    o Route questions to appropriate expert (Aardvark)
    o Categorize reviews (Amazon)



•   Users
    o   Expertise; interests; pro vs free; likelihood of paying;
        expected future karma


•   Events
    o   Abnormal vs. normal
Algorithms:
     Decision Tree Learning
Algorithms:
        Decision Tree Learning

                                                                Features
                            Email contains
                            word "viagra"

                            no       yes

           Email contains                    Email contains
            word "Ruby"                       attachment?


           no         yes                     no        yes


   P(Spam)=10%     P(Spam)=5%         P(Spam)=70%       P(Spam)=95%




                                                       Labels
Algorithms:
     Support Vector Machines (SVMs)




                          Graphics from Wikipedia
Algorithms:
     Support Vector Machines (SVMs)




                          Graphics from Wikipedia
Algorithms:
           Naive Bayes

•   Break documents into words and treat each
    word as an independent feature

•   Surprisingly effective on simple text and
    document classification

•   Works well when you have lots of data



                                          Graphics from Wikipedia
Algorithms:
             Naive Bayes

You received 100 emails, 70 of which were spam.
Word                 Spam with this word   Ham with this word

viagra               42 (60%)              1 (3.3%)

ruby                 7 (10%)               15 (50%)

hello                35 (50%)              24 (80%)



A new email contains hello and viagra. The probability that it
is spam is:
P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra)
                  = 0.7 * (0.5 * 0.6)        / (0.59 * 0.43)
                  = 82%
                                                      Graphics from Wikipedia
Algorithms:
               Neural Nets
                         Hidden layer

Input layer (features)

                                        Output layer (Classification)




                                                      Graphics from Wikipedia
Curse of Dimensionality

The more features
  and labels that you
  have, the more data
  that you need.




       http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg
Overfitting
•   With enough parameters, anything is
    possible.

•   We want our algorithms to generalize and
    infer, not memorize specific training
    examples.

•   Therefore, we test our algorithms on
    different data than we train them on.

More Related Content

What's hot

What's hot (20)

[第2版]Python機械学習プログラミング 第10章
[第2版]Python機械学習プログラミング 第10章[第2版]Python機械学習プログラミング 第10章
[第2版]Python機械学習プログラミング 第10章
 
ICLR2020読み会 (neural-tangents)
ICLR2020読み会 (neural-tangents)ICLR2020読み会 (neural-tangents)
ICLR2020読み会 (neural-tangents)
 
機械学習は化学研究の"経験と勘"を合理化できるか?
機械学習は化学研究の"経験と勘"を合理化できるか?機械学習は化学研究の"経験と勘"を合理化できるか?
機械学習は化学研究の"経験と勘"を合理化できるか?
 
大規模凸最適化問題に対する勾配法
大規模凸最適化問題に対する勾配法大規模凸最適化問題に対する勾配法
大規模凸最適化問題に対する勾配法
 
Fisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight MapsFisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight Maps
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習
 
Goodfellow先生おすすめのGAN論文6つを紹介
Goodfellow先生おすすめのGAN論文6つを紹介Goodfellow先生おすすめのGAN論文6つを紹介
Goodfellow先生おすすめのGAN論文6つを紹介
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
 
試して、比べて、使ってみる時系列における異常検知。
試して、比べて、使ってみる時系列における異常検知。試して、比べて、使ってみる時系列における異常検知。
試して、比べて、使ってみる時系列における異常検知。
 
15. Transformerを用いた言語処理技術の発展.pdf
15. Transformerを用いた言語処理技術の発展.pdf15. Transformerを用いた言語処理技術の発展.pdf
15. Transformerを用いた言語処理技術の発展.pdf
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
 
最近(2020/09/13)のarxivの分布外検知の論文を紹介
最近(2020/09/13)のarxivの分布外検知の論文を紹介最近(2020/09/13)のarxivの分布外検知の論文を紹介
最近(2020/09/13)のarxivの分布外検知の論文を紹介
 
文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection
文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection
文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
[DL輪読会]DIVERSE TRAJECTORY FORECASTING WITH DETERMINANTAL POINT PROCESSES
[DL輪読会]DIVERSE TRAJECTORY FORECASTING WITH DETERMINANTAL POINT PROCESSES[DL輪読会]DIVERSE TRAJECTORY FORECASTING WITH DETERMINANTAL POINT PROCESSES
[DL輪読会]DIVERSE TRAJECTORY FORECASTING WITH DETERMINANTAL POINT PROCESSES
 
K shapes zemiyomi
K shapes zemiyomiK shapes zemiyomi
K shapes zemiyomi
 
[DL Hacks 実装]MIDINET: A Convolutional Generative Adversarial Network For Symb...
[DL Hacks 実装]MIDINET: A Convolutional Generative Adversarial Network For Symb...[DL Hacks 実装]MIDINET: A Convolutional Generative Adversarial Network For Symb...
[DL Hacks 実装]MIDINET: A Convolutional Generative Adversarial Network For Symb...
 

Similar to Practical Machine Learning and Rails Part1

Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
darwinrlo
 
07-Classification.pptx
07-Classification.pptx07-Classification.pptx
07-Classification.pptx
Shree Shree
 

Similar to Practical Machine Learning and Rails Part1 (20)

Static Analysis
Static AnalysisStatic Analysis
Static Analysis
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Machine Learning 101 - AWS Machine Learning Web Day
Machine Learning 101 - AWS Machine Learning Web DayMachine Learning 101 - AWS Machine Learning Web Day
Machine Learning 101 - AWS Machine Learning Web Day
 
NAIVE BAYES ALGORITHM
NAIVE BAYES ALGORITHMNAIVE BAYES ALGORITHM
NAIVE BAYES ALGORITHM
 
The Art of Identifying Vulnerabilities - CascadiaFest 2015
The Art of Identifying Vulnerabilities  - CascadiaFest 2015The Art of Identifying Vulnerabilities  - CascadiaFest 2015
The Art of Identifying Vulnerabilities - CascadiaFest 2015
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Data mining on yelp dataset
Data mining on yelp datasetData mining on yelp dataset
Data mining on yelp dataset
 
2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
 
The Magical Art of Extracting Meaning From Data
The Magical Art of Extracting Meaning From DataThe Magical Art of Extracting Meaning From Data
The Magical Art of Extracting Meaning From Data
 
Knowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningKnowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep Learning
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practices
 
07-Classification.pptx
07-Classification.pptx07-Classification.pptx
07-Classification.pptx
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Practical Data Analysis in Python
Practical Data Analysis in PythonPractical Data Analysis in Python
Practical Data Analysis in Python
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Part 3 Machine Learnning
Part 3 Machine LearnningPart 3 Machine Learnning
Part 3 Machine Learnning
 

More from ryanstout (8)

Neural networks - BigSkyDevCon
Neural networks - BigSkyDevConNeural networks - BigSkyDevCon
Neural networks - BigSkyDevCon
 
Volt 2015
Volt 2015Volt 2015
Volt 2015
 
Isomorphic App Development with Ruby and Volt - Rubyconf2014
Isomorphic App Development with Ruby and Volt - Rubyconf2014Isomorphic App Development with Ruby and Volt - Rubyconf2014
Isomorphic App Development with Ruby and Volt - Rubyconf2014
 
Reactive programming
Reactive programmingReactive programming
Reactive programming
 
Concurrency Patterns
Concurrency PatternsConcurrency Patterns
Concurrency Patterns
 
EmberJS
EmberJSEmberJS
EmberJS
 
Practical Machine Learning and Rails Part2
Practical Machine Learning and Rails Part2Practical Machine Learning and Rails Part2
Practical Machine Learning and Rails Part2
 
Intro to Advanced JavaScript
Intro to Advanced JavaScriptIntro to Advanced JavaScript
Intro to Advanced JavaScript
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Practical Machine Learning and Rails Part1

  • 2. Andrew Cantino VP Engineering, Mavenlink @tectonic Founder, Agile Productions @ryanstout
  • 3. This talk will - introduce machine learning - make you ML-aware - have examples
  • 4. This talk will not - give you a PhD - implement algorithms - cover collaborative filtering, optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...
  • 5. What is Machine Learning? Many different algorithms that predict data from other data using applied statistics.
  • 6. "Enhance and rotate 20 degrees"
  • 7. What data? The web is data. User decisions APIs A/B Tests Databases Logs Streams Browser versions Reviews Clicktrails
  • 8. Okay. We have data. What do we do with it? We classify it.
  • 11. Classification :) OR :(
  • 12. Classification • Documents o Sort email (Gmail's importance filter) o Route questions to appropriate expert (Aardvark) o Categorize reviews (Amazon) • Users o Expertise; interests; pro vs free; likelihood of paying; expected future karma • Events o Abnormal vs. normal
  • 13. Algorithms: Decision Tree Learning
  • 14. Algorithms: Decision Tree Learning Features Email contains word "viagra" no yes Email contains Email contains word "Ruby" attachment? no yes no yes P(Spam)=10% P(Spam)=5% P(Spam)=70% P(Spam)=95% Labels
  • 15. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia
  • 16. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia
  • 17. Algorithms: Naive Bayes • Break documents into words and treat each word as an independent feature • Surprisingly effective on simple text and document classification • Works well when you have lots of data Graphics from Wikipedia
  • 18. Algorithms: Naive Bayes You received 100 emails, 70 of which were spam. Word Spam with this word Ham with this word viagra 42 (60%) 1 (3.3%) ruby 7 (10%) 15 (50%) hello 35 (50%) 24 (80%) A new email contains hello and viagra. The probability that it is spam is: P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra) = 0.7 * (0.5 * 0.6) / (0.59 * 0.43) = 82% Graphics from Wikipedia
  • 19. Algorithms: Neural Nets Hidden layer Input layer (features) Output layer (Classification) Graphics from Wikipedia
  • 20. Curse of Dimensionality The more features and labels that you have, the more data that you need. http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg
  • 21. Overfitting • With enough parameters, anything is possible. • We want our algorithms to generalize and infer, not memorize specific training examples. • Therefore, we test our algorithms on different data than we train them on.