SlideShare a Scribd company logo
1 of 30
Download to read offline
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




             The SHOGUN Machine Learning Toolbox 2.0
                                       (and its python interface)


                S¨ren Sonnenburg, Gunnar R¨tsch, Sebastian Henschel,
                  o                          a
             Christian Widmer,Jonas Behr, Alexander Zien, Fabio De Bona,
                  Alexander Binder, Christian Gehl, and Vojtech Franc
            GSoC students: Sergey Lisitsyn, Heiko Strathmann, many more...




                                                                                        fml
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




What is Shogun?


                               Machine Learning Toolkit
                                      Broad range of ML algorithms (600 classes)
                                      Large-scale algorithms (up to 50 million examples)
                                      Core written in C++ (> 190, 000 lines of code)
                                      SWIG bindings (support for 8 target languages)



                               Used in many projects
                                      Gene starts: ARTS [7]
                                      Splice sites: mSplicer [5]
                                      Sensor fusion (private sector)
                                      ...many more (see google scholar)!

                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Architecture




              SWIG - Simple Wrapper Interface Generator
              Bindings to a growing number of languages!
                                                                                                            pics/msklogo.p
              Typemaps!!
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Shogun’s history




              Project started 1999
              Early focus on large-scale SVMs and Kernels
              GSoC significantly pushed project forward
                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration

Machine Learning - Learning from Data


What is Machine Learning and what can it do for you?


      What is ML?
      AIM: Learning from empirical data!

      Applications
              speech and handwriting recognition
              medical diagnosis, bioinformatics
              computer vision, object recognition
              stock market analysis
              network security, intrusion detection . . .


                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration

Machine Learning - Learning from Data


What is Machine Learning and what can it do for you?


      What is ML?
      AIM: Learning from empirical data!

      Applications
              speech and handwriting recognition
              medical diagnosis, bioinformatics
              computer vision, object recognition
              stock market analysis
              network security, intrusion detection . . .


                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration

Support Vector Machines


Support Vector Machine (SVMs)




      SVM primal

                                                                n
                                   1        2
               min                   w      2           +C           max 1 − yi w xi , 0)
                 w                 2
                                                              i=1
                          regularizer = robustness
                                                                    loss = error on train data

              Training: Solve optimization problem                                                          pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration

Support Vector Machines


Support Vector Machine (SVMs)




      SVM primal

                                                                n
                                   1        2
               min                   w      2           +C           max 1 − yi w xi , 0)
                 w                 2
                                                              i=1
                          regularizer = robustness
                                                                    loss = error on train data

              Training: Solve optimization problem                                                          pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration

Support Vector Machines


SVM with Kernels




      SVM dual
                                                                  k(xi ,xj )
                                          n     n                                  n
                                 1
                           max −                    αi αj yi yj xT xj
                                                                 i         )−          αi ,
                            α    2
                                        i=1 j=1                                  i=1

                                     s.t. 0 ≤ αi ≤ C ∀i ∈ {1, n}

              Kernel: Similarity measure; generalization of dot product
                                                                                                            pics/msklogo.p
              Corresponds to dot product in higher dimensional space
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Demo:

              Support Vector Classification
                     Task: separate 2 clouds of points in 2D

      Simple code example: SVM Training
      lab = BinaryLabels(labels)
      train_xt = RealFeatures(features)
      gk = GaussianKernel(train_xt, train_xt, width)
      svm = LibSVM(10.0, gk, lab)
      svm.train()

      test_examples = RealFeatures(test_features)
      out = svm.apply(test_examples)

                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




SVMs and Kernels

              Provides generic interface to 11 SVM solvers
                     Established implementations for solving SVMs with kernels
                     More recent developments: Fast linear SVM solvers

              Kernels for Real-valued Data (in demo)
                     Linear Kernel, Polynomial Kernel, Gaussian Kernel


              String Kernels
                     Applications in Bioinformatics [4, 8, 10]
                     Intrusion Detection

              Heterogeneous Data Sources
                                                                   M
                     Combined kernel: K (x, z) = i=1 βi · Ki (x, z)
                     βi can be learned using Multiple Kernel Learning [6, 2]
                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Beyond Classification




             (a) GP regression              (b) Structured Output             (c) Multitask Learning


              Regression: Labels are real values (think least squares)
              Structured Output Learning: Predict complex structures
              Multitask Learning: Solve several related problems
              simultaneuously                                                                               pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Multitask Learning
      Example: Learn movie user preferece




              Multitask Learning: Jointly learn models for different countries
                                                                          pics/msklogo.p
              Couple related models more strongly
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Multitask Learning
      Example: Learn movie user preferece




              Multitask Learning: Jointly learn models for different countries
                                                                          pics/msklogo.p
              Couple related models more strongly
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Multitask Learning
      Example: Learn movie user preferece




              Multitask Learning: Jointly learn models for different countries
                                                                          pics/msklogo.p
              Couple related models more strongly
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Multitask Learning
      Example: Learn movie user preferece




              Multitask Learning: Jointly learn models for different countries
                                                                          pics/msklogo.p
              Couple related models more strongly
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Regularization-based MTL

      Multitask Learning is often implemented using regularization:
                                               T          T                     2A
              Graph-regularizer:               s=1        t=1    w s − wt         s,t
                     Keeps model parameters similar
                     Based on given similarity matrix A

                                                             n
              L2,1 -regularizer: W              2,1   =      i=1    wi
                     Selects common sub-space
                     Allows any wt in that sub-space

              Clustered MTL:
                     Unknown task relationship
                     Identifies similar tasks
                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Multitask Learning:

      MTL Training
          feat, labels = ... # Shogun Data objects

          task_one = Task(0,10)
          task_two = Task(10,20)
          group = TaskGroup()
          group.append_task(task_one)
          group.append_task(task_two)

          mtlr = MultitaskL12(0.1,0.1,feat,labels,group)
          mtlr.train()


              Efficient LibLinear-style solver Graph-reg SVM [9]
                                                                                                            pics/msklogo.p
              10 other MTL methods (based on SLEP[3]/MALSAR[1])
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Structured Output Learning




              Complex outputs
              Similar framework, different loss function
              Bundle-methods: state of the art solvers!
                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Other methods




         (d) Sparse/L1 methods              (e) Gaussian processes                 (f) Dim-reduct




                              ...and much more I can’t talk about!                                          pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Python integration




      Python integration
              Serialization
              Matrix integration
              No-copy data wrapping
              Rapid prototyping with directors




                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Python integration
      pythonic interaction with shogun objects
              m_real = array(in_data, dtype=float64, order=’F’)
              f_real = RealFeatures(m_real)

              # slicing
              print f_real[0:3, 1]

              # operators
              f_real += f_real
              f_real *= f_real
              f_real -= f_real

              # no copy
              a = RealFeatures()
                                                                                                            pics/msklogo.p
              a.frombuffer(feats, False)
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Python integration: Directors

      Simple code example: SVM Training
      class ExampleLinearKernel(DirectorKernel):
          def __init__(self):
              DirectorKernel.__init__(self, True)
          def kernel_function(self, idx_a, idx_b):
              seq1 = self.get_lhs().get_feature_vector(idx_a)
              seq2 = self.get_rhs().get_feature_vector(idx_b)
              return numpy.dot(seq1, seq2)

      k = ExampleLinearKernel()

      svm = SVMLight()
      svm.set_kernel(k)
      svm.train(train_data)
                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




How to get started



      Dive into Shogun
              Visit our website
              Source on github (fork-me!)
              Documentation available
              Many python examples (> 200)
              Debian Package, MacPorts
              Active Mailing-List



                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




When is SHOGUN for you?



              You want to work with SVMs (11 solvers to choose from)
              You want to work with Kernels (35 different kernels)
              ⇒ Esp.: String Kernels / combinations of Kernels
              You’re interested recent ML developments (MTL, Structured
              Output)
              You have large scale computations to do (up to 50 million)
              You use one of the following languages:
              Python, Octave/MATLAB, R, Java, C#, Ruby, Lua, C++



                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Contributors

      Original authors:     Gunnar Raetsch, Soeren Sonnenburg, Christian Widmer,
      Alexander Binder, Alexander Zien, Marius Kloft, Sebastian Henschel, Christian Gehl,
      Jonas Behr.

      Integrated Code:
      Alex Smola (prloqo), Antoine Bordes (LaRank), Thorsten Joachims (SVMLight),
      Chin-Chung Chang and Chih-Jen Lin (LIBSVM), Chih-Jen Lin (LibLinear), Vojtech
      Franc (LibOCAS), Leon Bottou (SGD SVM), Vikas Sindhwani (SVMLin), Jieping Ye
      and Jun Liu (SLEP), Jiayu Zhou and Jieping Ye (MALSAR)

      GSoC alumni:
      Heiko Strathmann (both 2011 and 2012), Sergey Lisitsyn (both 2011 and 2012),
      Chiyuan Zhang (2012), Fernando Iglesias (2012), Viktor Gal (2012), Michal Uricar
      (2012), Jacob Walker (2012), Evgeniy Andreev (2012), Baozeng Ding (2011), Alesis
      Novik (2011), Shashwat Lal Das (2011)

                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




Thank you!



                                   Thank you for your attention!!




      For more information, visit:
              Implementation http://www.shogun-toolbox.org
              More machine learning software http://mloss.org
              Machine Learning Data http://mldata.org


                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




References I


             Zhou Jiayu, Jianhui Chen, and Jieping Ye.
             User Manual MALSAR : Multi-tAsk Learning via Structural
             Regularization.
             Technical report, Arizona State University, 2012.

             M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.R. M¨ller, and A. Zien.
                                                                    u
             Efficient and accurate lp-norm multiple kernel learning.
             Advances in Neural Information Processing Systems, 22(22):997–1005,
             2009.
             Jun Liu, Shuiwang Ji, and Jieping Ye.
             SLEP : Sparse Learning with Efficient Projections.
             2011.

                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




References II

             G. Schweikert, A. Zien, G. Zeller, J. Behr, C. Dieterich, C.S. Ong,
             P. Philips, F. De Bona, L. Hartmann, A. Bohlen, et al.
             mGene: Accurate SVM-based gene finding with an application to
             nematode genomes.
             Genome research, 19(11):2133, 2009.

             Gabriele Schweikert, Alexander Zien, Georg Zeller, Jonas Behr, Christoph
             Dieterich, Cheng Soon Ong, Petra Philips, Fabio De Bona, Lisa Hartmann,
             Anja Bohlen, Nina Kr¨ger, S¨ren Sonnenburg, and Gunnar R¨tsch.
                                  u      o                               a
             mGene: accurate SVM-based gene finding with an application to
             nematode genomes.
             Genome research, 19(11):2133–43, November 2009.

             S. Sonnenburg, G. R¨tsch, C. Sch¨fer, and B. Sch¨lkopf.
                                  a             a            o
             Large scale multiple kernel learning.
             The Journal of Machine Learning Research, 7:1565, 2006.
                                                                                                            pics/msklogo.p
Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration




References III

             S Sonnenburg, A Zien, and G R¨tsch.
                                            a
             ARTS: accurate recognition of transcription starts in human.
             Bioinformatics, 2006.

             S. Sonnenburg, A. Zien, and G. R¨tsch.
                                              a
             ARTS: accurate recognition of transcription starts in human.
             Bioinformatics, 22(14):e472, 2006.

             C Widmer, M Kloft, N G¨rnitz, and G R¨tsch.
                                     o             a
             Efficient Training of Graph-Regularized Multitask SVMs.
             In ECML 2012, 2012.

             C. Widmer, J. Leiva, Y. Altun, and G. Raetsch.
             Leveraging Sequence Classification by Taxonomy-based Multitask
             Learning.
             In Research in Computational Molecular Biology, pages 522–534.
             Springer, 2010.                                                                                pics/msklogo.p

More Related Content

What's hot

Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...Wesley De Neve
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHuge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHiroshi Watanabe
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
Multilayer Slides
Multilayer  SlidesMultilayer  Slides
Multilayer SlidesESCOM
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer ChemistryPreferred Networks
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Oswald Campesato
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerShunta Saito
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017Yu-Hsun (lymanblue) Lin
 

What's hot (20)

Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHuge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
Multilayer Slides
Multilayer  SlidesMultilayer  Slides
Multilayer Slides
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Arvindsujeeth scaladays12
Arvindsujeeth scaladays12Arvindsujeeth scaladays12
Arvindsujeeth scaladays12
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
 

Similar to Shogun Machine Learning Toolkit Demo

Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientistsaeberspaecher
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
 
Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOnSean Yu
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowOswald Campesato
 
Monteverdi 2.0 - Remote sensing software for Pleiades images analysis
Monteverdi 2.0 - Remote sensing software for Pleiades images analysisMonteverdi 2.0 - Remote sensing software for Pleiades images analysis
Monteverdi 2.0 - Remote sensing software for Pleiades images analysisotb
 
Secure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion AttacksSecure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion AttacksPluribus One
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlowOswald Campesato
 
Machine_learning_internship_report_facemaskdetection.pptx
Machine_learning_internship_report_facemaskdetection.pptxMachine_learning_internship_report_facemaskdetection.pptx
Machine_learning_internship_report_facemaskdetection.pptxpratikpatil862906
 
Configuring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank ScholtenConfiguring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank Scholtenlucenerevolution
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) OverviewNaoki (Neo) SATO
 

Similar to Shogun Machine Learning Toolkit Demo (20)

Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOn
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
Digit recognizer
Digit recognizerDigit recognizer
Digit recognizer
 
Monteverdi 2.0 - Remote sensing software for Pleiades images analysis
Monteverdi 2.0 - Remote sensing software for Pleiades images analysisMonteverdi 2.0 - Remote sensing software for Pleiades images analysis
Monteverdi 2.0 - Remote sensing software for Pleiades images analysis
 
Secure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion AttacksSecure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion Attacks
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlow
 
Deep Learning in a nutshell
Deep Learning in a nutshellDeep Learning in a nutshell
Deep Learning in a nutshell
 
Machine_learning_internship_report_facemaskdetection.pptx
Machine_learning_internship_report_facemaskdetection.pptxMachine_learning_internship_report_facemaskdetection.pptx
Machine_learning_internship_report_facemaskdetection.pptx
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Configuring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank ScholtenConfiguring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank Scholten
 
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
 
Acceleo Code Generation
Acceleo Code GenerationAcceleo Code Generation
Acceleo Code Generation
 

Shogun Machine Learning Toolkit Demo

  • 1. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration The SHOGUN Machine Learning Toolbox 2.0 (and its python interface) S¨ren Sonnenburg, Gunnar R¨tsch, Sebastian Henschel, o a Christian Widmer,Jonas Behr, Alexander Zien, Fabio De Bona, Alexander Binder, Christian Gehl, and Vojtech Franc GSoC students: Sergey Lisitsyn, Heiko Strathmann, many more... fml
  • 2. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration What is Shogun? Machine Learning Toolkit Broad range of ML algorithms (600 classes) Large-scale algorithms (up to 50 million examples) Core written in C++ (> 190, 000 lines of code) SWIG bindings (support for 8 target languages) Used in many projects Gene starts: ARTS [7] Splice sites: mSplicer [5] Sensor fusion (private sector) ...many more (see google scholar)! pics/msklogo.p
  • 3. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Architecture SWIG - Simple Wrapper Interface Generator Bindings to a growing number of languages! pics/msklogo.p Typemaps!!
  • 4. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Shogun’s history Project started 1999 Early focus on large-scale SVMs and Kernels GSoC significantly pushed project forward pics/msklogo.p
  • 5. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Machine Learning - Learning from Data What is Machine Learning and what can it do for you? What is ML? AIM: Learning from empirical data! Applications speech and handwriting recognition medical diagnosis, bioinformatics computer vision, object recognition stock market analysis network security, intrusion detection . . . pics/msklogo.p
  • 6. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Machine Learning - Learning from Data What is Machine Learning and what can it do for you? What is ML? AIM: Learning from empirical data! Applications speech and handwriting recognition medical diagnosis, bioinformatics computer vision, object recognition stock market analysis network security, intrusion detection . . . pics/msklogo.p
  • 7. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Support Vector Machines Support Vector Machine (SVMs) SVM primal n 1 2 min w 2 +C max 1 − yi w xi , 0) w 2 i=1 regularizer = robustness loss = error on train data Training: Solve optimization problem pics/msklogo.p
  • 8. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Support Vector Machines Support Vector Machine (SVMs) SVM primal n 1 2 min w 2 +C max 1 − yi w xi , 0) w 2 i=1 regularizer = robustness loss = error on train data Training: Solve optimization problem pics/msklogo.p
  • 9. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Support Vector Machines SVM with Kernels SVM dual k(xi ,xj ) n n n 1 max − αi αj yi yj xT xj i )− αi , α 2 i=1 j=1 i=1 s.t. 0 ≤ αi ≤ C ∀i ∈ {1, n} Kernel: Similarity measure; generalization of dot product pics/msklogo.p Corresponds to dot product in higher dimensional space
  • 10. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Demo: Support Vector Classification Task: separate 2 clouds of points in 2D Simple code example: SVM Training lab = BinaryLabels(labels) train_xt = RealFeatures(features) gk = GaussianKernel(train_xt, train_xt, width) svm = LibSVM(10.0, gk, lab) svm.train() test_examples = RealFeatures(test_features) out = svm.apply(test_examples) pics/msklogo.p
  • 11. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration SVMs and Kernels Provides generic interface to 11 SVM solvers Established implementations for solving SVMs with kernels More recent developments: Fast linear SVM solvers Kernels for Real-valued Data (in demo) Linear Kernel, Polynomial Kernel, Gaussian Kernel String Kernels Applications in Bioinformatics [4, 8, 10] Intrusion Detection Heterogeneous Data Sources M Combined kernel: K (x, z) = i=1 βi · Ki (x, z) βi can be learned using Multiple Kernel Learning [6, 2] pics/msklogo.p
  • 12. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Beyond Classification (a) GP regression (b) Structured Output (c) Multitask Learning Regression: Labels are real values (think least squares) Structured Output Learning: Predict complex structures Multitask Learning: Solve several related problems simultaneuously pics/msklogo.p
  • 13. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Multitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  • 14. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Multitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  • 15. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Multitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  • 16. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Multitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  • 17. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Regularization-based MTL Multitask Learning is often implemented using regularization: T T 2A Graph-regularizer: s=1 t=1 w s − wt s,t Keeps model parameters similar Based on given similarity matrix A n L2,1 -regularizer: W 2,1 = i=1 wi Selects common sub-space Allows any wt in that sub-space Clustered MTL: Unknown task relationship Identifies similar tasks pics/msklogo.p
  • 18. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Multitask Learning: MTL Training feat, labels = ... # Shogun Data objects task_one = Task(0,10) task_two = Task(10,20) group = TaskGroup() group.append_task(task_one) group.append_task(task_two) mtlr = MultitaskL12(0.1,0.1,feat,labels,group) mtlr.train() Efficient LibLinear-style solver Graph-reg SVM [9] pics/msklogo.p 10 other MTL methods (based on SLEP[3]/MALSAR[1])
  • 19. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Structured Output Learning Complex outputs Similar framework, different loss function Bundle-methods: state of the art solvers! pics/msklogo.p
  • 20. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Other methods (d) Sparse/L1 methods (e) Gaussian processes (f) Dim-reduct ...and much more I can’t talk about! pics/msklogo.p
  • 21. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Python integration Python integration Serialization Matrix integration No-copy data wrapping Rapid prototyping with directors pics/msklogo.p
  • 22. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Python integration pythonic interaction with shogun objects m_real = array(in_data, dtype=float64, order=’F’) f_real = RealFeatures(m_real) # slicing print f_real[0:3, 1] # operators f_real += f_real f_real *= f_real f_real -= f_real # no copy a = RealFeatures() pics/msklogo.p a.frombuffer(feats, False)
  • 23. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Python integration: Directors Simple code example: SVM Training class ExampleLinearKernel(DirectorKernel): def __init__(self): DirectorKernel.__init__(self, True) def kernel_function(self, idx_a, idx_b): seq1 = self.get_lhs().get_feature_vector(idx_a) seq2 = self.get_rhs().get_feature_vector(idx_b) return numpy.dot(seq1, seq2) k = ExampleLinearKernel() svm = SVMLight() svm.set_kernel(k) svm.train(train_data) pics/msklogo.p
  • 24. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration How to get started Dive into Shogun Visit our website Source on github (fork-me!) Documentation available Many python examples (> 200) Debian Package, MacPorts Active Mailing-List pics/msklogo.p
  • 25. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration When is SHOGUN for you? You want to work with SVMs (11 solvers to choose from) You want to work with Kernels (35 different kernels) ⇒ Esp.: String Kernels / combinations of Kernels You’re interested recent ML developments (MTL, Structured Output) You have large scale computations to do (up to 50 million) You use one of the following languages: Python, Octave/MATLAB, R, Java, C#, Ruby, Lua, C++ pics/msklogo.p
  • 26. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Contributors Original authors: Gunnar Raetsch, Soeren Sonnenburg, Christian Widmer, Alexander Binder, Alexander Zien, Marius Kloft, Sebastian Henschel, Christian Gehl, Jonas Behr. Integrated Code: Alex Smola (prloqo), Antoine Bordes (LaRank), Thorsten Joachims (SVMLight), Chin-Chung Chang and Chih-Jen Lin (LIBSVM), Chih-Jen Lin (LibLinear), Vojtech Franc (LibOCAS), Leon Bottou (SGD SVM), Vikas Sindhwani (SVMLin), Jieping Ye and Jun Liu (SLEP), Jiayu Zhou and Jieping Ye (MALSAR) GSoC alumni: Heiko Strathmann (both 2011 and 2012), Sergey Lisitsyn (both 2011 and 2012), Chiyuan Zhang (2012), Fernando Iglesias (2012), Viktor Gal (2012), Michal Uricar (2012), Jacob Walker (2012), Evgeniy Andreev (2012), Baozeng Ding (2011), Alesis Novik (2011), Shashwat Lal Das (2011) pics/msklogo.p
  • 27. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration Thank you! Thank you for your attention!! For more information, visit: Implementation http://www.shogun-toolbox.org More machine learning software http://mloss.org Machine Learning Data http://mldata.org pics/msklogo.p
  • 28. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration References I Zhou Jiayu, Jianhui Chen, and Jieping Ye. User Manual MALSAR : Multi-tAsk Learning via Structural Regularization. Technical report, Arizona State University, 2012. M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.R. M¨ller, and A. Zien. u Efficient and accurate lp-norm multiple kernel learning. Advances in Neural Information Processing Systems, 22(22):997–1005, 2009. Jun Liu, Shuiwang Ji, and Jieping Ye. SLEP : Sparse Learning with Efficient Projections. 2011. pics/msklogo.p
  • 29. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration References II G. Schweikert, A. Zien, G. Zeller, J. Behr, C. Dieterich, C.S. Ong, P. Philips, F. De Bona, L. Hartmann, A. Bohlen, et al. mGene: Accurate SVM-based gene finding with an application to nematode genomes. Genome research, 19(11):2133, 2009. Gabriele Schweikert, Alexander Zien, Georg Zeller, Jonas Behr, Christoph Dieterich, Cheng Soon Ong, Petra Philips, Fabio De Bona, Lisa Hartmann, Anja Bohlen, Nina Kr¨ger, S¨ren Sonnenburg, and Gunnar R¨tsch. u o a mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome research, 19(11):2133–43, November 2009. S. Sonnenburg, G. R¨tsch, C. Sch¨fer, and B. Sch¨lkopf. a a o Large scale multiple kernel learning. The Journal of Machine Learning Research, 7:1565, 2006. pics/msklogo.p
  • 30. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration References III S Sonnenburg, A Zien, and G R¨tsch. a ARTS: accurate recognition of transcription starts in human. Bioinformatics, 2006. S. Sonnenburg, A. Zien, and G. R¨tsch. a ARTS: accurate recognition of transcription starts in human. Bioinformatics, 22(14):e472, 2006. C Widmer, M Kloft, N G¨rnitz, and G R¨tsch. o a Efficient Training of Graph-Regularized Multitask SVMs. In ECML 2012, 2012. C. Widmer, J. Leiva, Y. Altun, and G. Raetsch. Leveraging Sequence Classification by Taxonomy-based Multitask Learning. In Research in Computational Molecular Biology, pages 522–534. Springer, 2010. pics/msklogo.p