SlideShare a Scribd company logo
1 of 36
Download to read offline
Conditional Trees
                or
 Unbiased recursive partitioning
A conditional inference framework

        Christoph Molnar
    Supervisor: Stephanie Möst

     Department of Statistics, LMU


        18 December 2012




                                     1 / 36
Overview


   Introduction and Motivation

   Algorithm for unbiased trees

   Conditional inference with permutation tests

   Examples

   Properties

   Summary




                                                  2 / 36
CART trees



      Model: Y = f (X )
      Structure of decision trees
      Recursive partitioning of covariable space X
      Split optimizes criterion (Gini, information gain, sum of
      squares) depending on scale of Y
      Split point search: exhaustive search procedure
      Avoid overfitting: Early stopping or pruning
      Usage: prediction and explanation
      Other tree types: ID3, C4.5, CHAID, . . .




                                                                  3 / 36
What are conditional trees?




       Special kind of trees
       Recursive partitioning with binary splits and early stopping
       Constant models in terminal nodes
       Variable selection, early stopping and split point search based
       on conditional inference
       Uses permutation tests for inference
       Solves problems of CART trees




                                                                         4 / 36
Why conditional trees?



   Helps to overcome problems of trees:
       overfitting (can be solved with other techniques as well)
       Selection bias towards covariables with many possible splits
       (i.e. numeric, multi categorial)
       Difficult interpretation due to selection bias
       Variable selection: No concept of statistical significance
       Not all scales of Y and X covered (ID3, C4.5, ...)




                                                                      5 / 36
Simulation: selection bias



       Variable selection unbiased ⇔ Probability of selecting a
       covariable, which is independent from Y is the same for all
       independent covariables
       Measurement scale of covariable shouldn’t play a role
       Simulation illustrating the selection bias:
       Y ∼ N(0, 1)
       X1 ∼ M n, 1 , 1
                    2 2
       X2 ∼ M n, 1 , 1 , 1
                    3 3 3
       X3 ∼ M n, 1 , 1 , 1 , 1
                    4 4 4 4




                                                                     6 / 36
Simulation: results
       Selection frequencies for the first split:
       X1 : 0.128, X2 : 0.302, X3 : 0.556, none: 0.014
         X1         X2                    X3          none
       0.0      0.2       0.4           0.6         0.8   1.0


       Strongly biased towards variables with many possible splits
       Example of a tree:
                           yes     x3 = 1,2   no



                                −0.19    x2 = 1,3



                                    −0.098     0.36
       Overfitting! (Note: complexity parameter not cross-validated)
       Desirable here: No split at all
       Problem source: Exhaustive search through all variables and all
       possible split points
       Numeric/multi-categorial categorial have more split options ⇒
       Multiple comparison problem                                       7 / 36
Idea of conditional trees




       Variable selection and search for split point ⇒ two steps
       Embed all decisions into hypothesis tests
       All tests with conditional inference (permutation tests)




                                                                   8 / 36
Ctree algorithm



    1   Stop criterion
            Test global null hypothesis H0 of independence between Y and
            all Xj with
                         j       j
            H0 = ∩m H0 and H0 : D(Y|Xj ) = D(Y)
                    j=1
            If H0 not rejected ⇒ Stop
    2   Select variable Xj∗ with strongest association
    3   Search best split point for Xj∗ and partitionate data
    4   Repeat steps 1.), 2.) and 3.) for both of the new partitions




                                                                           9 / 36
How can we test hypothesis of independence?




       Parametric tests depend on distribution assumptions
       Problem: Unknown conditional distribution
       D(Y |X ) = D(Y |X1 , ..., Xm ) = D(Y |f (X1 ), ..., f (Xm ))
       Need for a general framework, which can handle arbitrary
       scales
   Let the data speak: ⇒ permutation tests!




                                                                      10 / 36
Excursion: permutation tests




                               11 / 36
Permutation tests: simple example


      Possible treatments for disease: A or B
      Numeric measurement (blood value)
      Question: Different blood values between treatment A and B?
      ⇔ µB = µA ?
      Test statistic: T0 = µA − µB
                           ˆ    ˆ
      H0 : µA − µB = 0,       H 1 : µ A − µB = 0
      Distribution unknown ⇒ Permutation test

                                                       Treatment
        q       q       q q   q       q q q   q    q   q A
                                                       q B
                    1                   2
                                  y

      T0 = µA − µB = 2.06 - 1.2 = 0.86
           ˆ    ˆ


                                                                   12 / 36
Permute

  Original data:

          B     B     B     B     A     A     A     A     B     A
          0.5   0.9   1.1   1.2   1.5   1.9   2.0   2.1   2.3   2.8

  One possible permutation:

          B     B     B     B     A     A     A     A     B     A
          2.8   2.3   1.1   1.9   1.2   2.1   1.5   0.5   0.9   2.0

      Permute the labels (A and B) and the numeric measurement
      Calculate test statistic T for each permutation
      Do this with all possible permutations
      Result: Distribution of test statistic conditioned on sample

                                                                      13 / 36
P-value and decision


                    k = {Permutation samples : |ˆA,perm − µB,perm | > |ˆA − µB |}
                                                µ         ˆ            µ    ˆ
                                        k
                    p-value =      #Permutations
                    p-value < α = 0.05? ⇒ If yes, H0 can be rejected




              0.6
                                                                                     Test statistic of
    density




              0.4                                                                     q original
                                                                                      q permutation
              0.2
                            q        q qq q q q   qqq q q q    qq
                                 q                                     qqq q q
                      qq     q    qqq    q q q qqq q qq q q qqq          q
              0.0                           q          q                         q
                     −1.0          −0.5         0.0          0.5           1.0
                                   Difference of means per treatment




                                                                                                         14 / 36
General algorithm for permutation tests


       Requirement: Under H0 response and covariables are
       exchangeable
       Do the following:
         1   Calculate test statistic T0
         2   Calculate test statistic T for all permutations of pairs Y , X
         3   Compute nextreme : Count number of T which are more
             extreme than T0
                            nextreme
         4   p-value p = npermutations
         5   Reject H0 if p < α, with significance level α
       If # possible permutations too big, draw random permutations
       in 2.) (Monte Carlo sampling)




                                                                              15 / 36
Framework by Strasser and Weber



      General test statistic:
                            n
      Tj (Ln , w ) = vec         wi gj (Xij )h(Yi , (Y1 , ..., Yn ))T   ∈ Rpj q
                           i=1
      h is called influence function, gj is transformation of Xj
      Choose gj , h depending on scale
      It’s possible to calculate µ and Σ of T
                                                              (t−µ)
      Standardized test statistic: c(t, µ, Σ) = maxk=1,...,pq √ k
                                                                           (Σ)kk

      Why so complex? ⇒ Cover all cases: Multicategorial X or Y ,
      different scales




                                                                                   16 / 36
End of excursion
Lets get back to business




                            17 / 36
Ctree algorithm with permutation tests


     1   Stop criterion
             Test global null hypothesis H0 of independence between Y and
             all Xj with
                           j      j
             H0 = ∩m H0 and H0 : D(Y|Xj ) = D(Y) (permutation tests
                     j=1
             for each Xj )
             If H0 not rejected (no significance for all Xj ) ⇒ Stop
     2   Select variable Xj∗ with strongest association (smallest
         p-value)
     3   Search best split point for Xj∗ (max. test statistic c) and
         partition data
     4   Repeat steps 1.), 2.) and 3.) for both of the new partitions



                                                                            18 / 36
Permutation tests for stop criterion




       Choose influence function h for Y
       Choose transformation function g for each Xj
       Test each variable Xj separately for association with Y
          j
       (H0 : D(Y |Xj ) = D(Y ) = Variable Xj has no influence on Y )
                            j
       Global H0 = ∩m H0 : No variable has influence on Y .
                        j=1
       Test global H0 : Multiple Testing ⇒ Adjust α (Bonferroni
       correction, ...)


                                                                      19 / 36
Permutation tests for variable selection




       Choose variable with smallest p-value for split
       Note: Switch to p-value comparison gets rid of scaling problem




                                                                        20 / 36
Test statistic for best split point




       Use test statistic instead of Gini/SSE for split point search
                              n
       TjA (Ln , w ) = vec         wi I (Xji ∈ A) · h(Yi , (Y1 , . . . , Yn ))T
                             i=1
                                                       (T A −µ)k
       Standardized test statistic: c = maxk √j
                                                          (Σ)kk

       Measures discrepancy between {Yi |Xji ∈ A} and {Yi |Xji ∈ A}
                                                               /
       Calculate c for all possible splits; Choose split point with
       maximal c
       Covers different scales of Y and X
                                                                                  21 / 36
Usage examples with R
  - Let’s get the party started -




                                    22 / 36
Bodyfat: example for continuous regression
       Example: bodyfat data
       Predict body fat with anthropometric measurements
       Data: Measurements of 71 healthy women
       Response Y : body fat measured by DXA (numeric)
       Covariables X : different body measurements (numeric)
       For example: waist circumference, breadth of the knee, ...
       h = Yi
       g = Xi
                           n
       Tj (Ln , w) =            wi Xij Yi
                          i=1

                                                             ¯ ¯
                                               Xij Yi −nnode Xj Y
       c=   | t−µ |
               σ      ∝              i :node
                                                                                (Pearson
                                               ¯
                                          (Yi −Y )2                     ¯
                                                                  (Xij −Xj )2
                                i :node                 i :node

       correlation coefficient)
                                                                                           23 / 36
Bodyfat: R-code




   library("party")
   library("rpart")
   library("rpart.plot")
   data(bodyfat, package = "mboost")
   ## conditional tree
   cond_tree <- ctree(DEXfat ~ ., data = bodyfat)
   ## normal tree
   classic_tree <- rpart(DEXfat ~ ., data = bodyfat)




                                                       24 / 36
Bodyfat: conditional tree

       plot(cond_tree)
                                                                                                                               1
                                                                                                                            hipcirc
                                                                                                                           p < 0.001



                                                                                         ≤ 108                                            > 108

                                                             2                                                                                            9
                                                         anthro3c                                                                                    kneebreadth
                                                         p < 0.001                                                                                    p = 0.006



                                            ≤ 3.76                     > 3.76

                           3                                                               6
                       anthro3c                                                        waistcirc
                                                                                                                                                  ≤ 10.6          > 10.6
                       p = 0.001                                                       p = 0.003



               ≤ 3.39              > 3.39                                       ≤ 86               > 86

     Node 4 (n = 13)                   Node 5 (n = 12)               Node 7 (n = 13)                      Node 8 (n = 7)               Node 10 (n = 19)               Node 11 (n = 7)

60                            60                                60                            60                                  60                         60

50                            50                                50                            50                                  50                         50

40                            40                                40                            40                                  40                         40

30                            30                                30                            30                                  30                         30

20                            20                                20                            20                                  20                         20

10                            10                                10                            10                                  10                         10



                                                                                                                                                                            25 / 36
Bodyfat: CART tree

   rpart.plot(classic_tree)



                                  yes   waistcir < 88   no




                    anthro3c < 3.4                           hipcirc < 110



               17              hipcirc < 101            35                   45



                          23                   30




   ⇒ Structurally different trees!
                                                                                  26 / 36
Glaucoma: example for classification

      Predict Glaucoma (= eye disease) based on laser scanning
      measurements
      Response Y : Binary, y ∈ {Glaucoma, normal}
      Covariables X : Different volumes and areas of the eye (all
      numeric)
                       (1, 0)T     Glaucoma
      h = eJ (Yi ) =
                       (0, 1)T     normal
      g (Xij ) = Xij
                           n
      Tj (Ln , w) = vec         wi Xij eJ (Yi )T   =
                          i=1
                    ¯              T
        nGlaucoma · Xj,Glaucoma
                    ¯j,normal
          nnormal · X
                           ¯       ¯
      c ∝ max ngroup · (Xj,group − Xj,node )           group ∈ {Glaucoma,
      normal}
                                                                            27 / 36
Glaucoma: R-code




  library("rpart")
  library("party")
  data("GlaucomaM", package = "ipred")
  cond_tree <- ctree(Class ~ ., data = GlaucomaM)
  classic_tree <- rpart(Class ~ ., data = GlaucomaM)




                                                       28 / 36
Glaucoma: conditional tree
                                                                             Node 1 (n = 196)
                                                                                             1




                                                                        normal glaucoma
                                                                                                                  0.8
                                                                                                                  0.6
                                                                                                                  0.4
                                                                                                                  0.2
                                                                                                                  0



                                  Node 2 (n = 87)                                                                            Node 5 (n = 109)
                                                                  1                                                                          1
                        normal glaucoma




                                                                                                                        normal glaucoma
                                                                  0.8                                                                                             0.8
                                                                  0.6                                                                                             0.6
                                                                  0.4                                                                                             0.4
                                                                  0.2                                                                                             0.2
                                                                  0                                                                                               0



               Node 3 (n = 79)                                Node 4 (n = 8)                              Node 6 (n = 65)                                 Node 7 (n = 44)
                                          1                                               1                                               1                                 1
     normal glaucoma




                                                normal glaucoma




                                                                                                normal glaucoma




                                                                                                                                                normal glaucoma
                                          0.8                                             0.8                                             0.8                               0.8
                                          0.6                                             0.6                                             0.6                               0.6
                                          0.4                                             0.4                                             0.4                               0.4
                                          0.2                                             0.2                                             0.2                               0.2
                                          0                                               0                                               0                                 0




   ## 1) vari <= 0.059; criterion = 1, statistic = 71.475
   ##   2) vasg <= 0.066; criterion = 1, statistic = 29.265
   ##     3)* weights = 79
   ##   2) vasg > 0.066
   ##     4)* weights = 8
   ## 1) vari > 0.059
   ##   5) tms <= -0.066; criterion = 0.951, statistic = 11.221
   ##     6)* weights = 65
   ##   5) tms > -0.066
   ##     7)* weights = 44
                                                                                                                                                                                  29 / 36
Glaucoma: CART tree

  rpart.plot(classic_tree, cex = 1.5)

      yes   varg < 0.21   no




    glaucoma       mhcg >= 0.17



            glaucoma           vars < 0.064



                     glaucoma         tms >= −0.066



                                eas < 0.45        normal



                       glaucoma          normal


                                                           30 / 36
Appendix: Examples of other scales
      Y categorial, X categorial
          h = eJ (Yi ), g = eK (Xij )
          ⇒ T is vectorized contingency table of Xj and Y
                        Xj
                    1        2     3
                                       Pearson
                                       residuals:
                                             1.64
            1




                                            0.00
           Y
           2




                                           −1.64
            3




                                           −2.08
                                       p−value =
                                       0.009




      Y and Xj numeric,h = rg (Yi ), g = rg (Xij ) ⇒ Spearman’s
      rho
      Flexible T for different situations: Multivariate regression,
      ordinal regression, censored regression, . . .
                                                                     31 / 36
Properties




       Prediction accuracy: Not better than normal trees, but not
       worse either
       Computational considerations: Same speed as normal trees.
       Two possible interpretations of significance level α:
             1. Pre-specified nominal level of underlying association tests
             2. Simple hyper parameter determining the tree size
             Low α yields smaller trees




                                                                             32 / 36
Summary conditional trees




      Not heuristics, but non-parametric models with well-defined
      theoretical background
      Suitable for regression with arbitrary scales of Y and X
      Unbiased variable selection
      No overfitting
      Conditional trees structurally different from trees partitioned
      with exhaustive search procedures




                                                                       33 / 36
Literature and Software
       J. Friedman, T. Hastie, and R. Tibshirani.
       The elements of statistical learning, volume 1.
       Springer Series in Statistics, 2001.
       T. Hothorn, K. Hornik, and A. Zeileis.
       Unbiased recursive partitioning: A conditional inference
       framework.
       Journal of Computational and Graphical Statistics, 15(3):
       651–674, 2006.
        H. Strasser and C. Weber.
        On the asymptotic theory of permutation statistics.
        1999.
   R-packages:
        rpart: Recursive partitioning
        rpart.plot: Plot function for rpart
        party: A Laboratory for Recursive Partytioning
   All available on CRAN
                                                                   34 / 36
Appendix: Competitors



   Other partitioning algorithms in this area:
       CHAID: Nominal response, χ2 test, multiway splits, nominal
       covariables
       GUIDE: Continuous response only, p-value from χ2 test,
       categorizes continuous covariables
       QUEST: ANOVA F-Test for continuous response, χ2 test for
       nominal, compare on p-scale ⇒ reduces selection bias
       CRUISE: Multiway splits, discriminant analysis in each node,
       unbiased variable selection




                                                                      35 / 36
Appendix: Properties of test statistic T
                                                       n
      µj = E(Tj (Ln , w)|S(Ln , w)) = vec                    wi gj (Xji ) E(h|S(Ln , w))T
                                                      i =1

      Σj = V(Tj (Ln , w)|S(Ln , w))
               w.
         =          V(h|S(Ln , w)) ⊗             wi gj (Xji ) ⊗ wi gj (Xji )T
             w. − 1
                                             i
                                                                                          T
              1
         −        V(h|S(Ln , w)) ⊗               wi gj (Xji )    ⊗         wi gj (Xji )
           w. − 1
                                             i                         i
              n
      w. =          wi
             i =1




        E(h|S(Ln , w)) = w.−1           wi h(Yi , (Y1 , . . . , Yn )) ∈ Rq
                                    i
                               −1
        V(h|S(Ln , w)) = w.             wi (h(Yi , (Y1 , . . . , Yn )) − E(h|S(Ln , w)))
                                    i

                         (h(Yi , (Y1 , . . . , Yn )) − E(h|S(Ln , w)))T
                                                                                              36 / 36

More Related Content

What's hot

Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys MeetupAlexandros Karatzoglou
 
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, MetricsOrdinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, MetricsFrancesco Casalegno
 
MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」moterech
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Qiskit advocate demo qsvm
Qiskit advocate demo qsvmQiskit advocate demo qsvm
Qiskit advocate demo qsvmYuma Nakamura
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesMarc Garcia
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
SMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するSMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するsleepy_yoshi
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7sleepy_yoshi
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 

What's hot (20)

Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, MetricsOrdinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
 
MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
KNN
KNN KNN
KNN
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
 
Qiskit advocate demo qsvm
Qiskit advocate demo qsvmQiskit advocate demo qsvm
Qiskit advocate demo qsvm
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression Trees
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
SMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するSMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装する
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 

Similar to Conditional trees

hypothesisTestPPT.pptx
hypothesisTestPPT.pptxhypothesisTestPPT.pptx
hypothesisTestPPT.pptxdangwalakash07
 
[The following information applies to the questions displayed belo.docx
[The following information applies to the questions displayed belo.docx[The following information applies to the questions displayed belo.docx
[The following information applies to the questions displayed belo.docxdanielfoster65629
 
Statistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdfStatistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdfYoursTube1
 
Statistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdfStatistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdfAkashyadav375896
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdfAhmadM65
 
Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Adrian Aley
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsFrank Kienle
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samplesshoffma5
 
chapter_8_20162.pdf
chapter_8_20162.pdfchapter_8_20162.pdf
chapter_8_20162.pdfSumitRijal1
 
Chi square distribution and analysis of frequencies.pptx
Chi square distribution and analysis of frequencies.pptxChi square distribution and analysis of frequencies.pptx
Chi square distribution and analysis of frequencies.pptxZayYa9
 
2D1431 Machine Learning
2D1431 Machine Learning2D1431 Machine Learning
2D1431 Machine Learningbutest
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Chiheb Ben Hammouda
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 

Similar to Conditional trees (20)

2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019
2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019 2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019
2019 PMED Spring Course - SMARTs-Part II - Eric Laber, April 10, 2019
 
hypothesisTestPPT.pptx
hypothesisTestPPT.pptxhypothesisTestPPT.pptx
hypothesisTestPPT.pptx
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
[The following information applies to the questions displayed belo.docx
[The following information applies to the questions displayed belo.docx[The following information applies to the questions displayed belo.docx
[The following information applies to the questions displayed belo.docx
 
Statistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdfStatistics_summary_1634533932.pdf
Statistics_summary_1634533932.pdf
 
Statistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdfStatistics_Cheat_sheet_1567847508.pdf
Statistics_Cheat_sheet_1567847508.pdf
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdf
 
Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)
 
Input analysis
Input analysisInput analysis
Input analysis
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samples
 
chapter_8_20162.pdf
chapter_8_20162.pdfchapter_8_20162.pdf
chapter_8_20162.pdf
 
Chi square distribution and analysis of frequencies.pptx
Chi square distribution and analysis of frequencies.pptxChi square distribution and analysis of frequencies.pptx
Chi square distribution and analysis of frequencies.pptx
 
2D1431 Machine Learning
2D1431 Machine Learning2D1431 Machine Learning
2D1431 Machine Learning
 
Benelearn2016
Benelearn2016Benelearn2016
Benelearn2016
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 

Recently uploaded

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 

Recently uploaded (20)

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 

Conditional trees

  • 1. Conditional Trees or Unbiased recursive partitioning A conditional inference framework Christoph Molnar Supervisor: Stephanie Möst Department of Statistics, LMU 18 December 2012 1 / 36
  • 2. Overview Introduction and Motivation Algorithm for unbiased trees Conditional inference with permutation tests Examples Properties Summary 2 / 36
  • 3. CART trees Model: Y = f (X ) Structure of decision trees Recursive partitioning of covariable space X Split optimizes criterion (Gini, information gain, sum of squares) depending on scale of Y Split point search: exhaustive search procedure Avoid overfitting: Early stopping or pruning Usage: prediction and explanation Other tree types: ID3, C4.5, CHAID, . . . 3 / 36
  • 4. What are conditional trees? Special kind of trees Recursive partitioning with binary splits and early stopping Constant models in terminal nodes Variable selection, early stopping and split point search based on conditional inference Uses permutation tests for inference Solves problems of CART trees 4 / 36
  • 5. Why conditional trees? Helps to overcome problems of trees: overfitting (can be solved with other techniques as well) Selection bias towards covariables with many possible splits (i.e. numeric, multi categorial) Difficult interpretation due to selection bias Variable selection: No concept of statistical significance Not all scales of Y and X covered (ID3, C4.5, ...) 5 / 36
  • 6. Simulation: selection bias Variable selection unbiased ⇔ Probability of selecting a covariable, which is independent from Y is the same for all independent covariables Measurement scale of covariable shouldn’t play a role Simulation illustrating the selection bias: Y ∼ N(0, 1) X1 ∼ M n, 1 , 1 2 2 X2 ∼ M n, 1 , 1 , 1 3 3 3 X3 ∼ M n, 1 , 1 , 1 , 1 4 4 4 4 6 / 36
  • 7. Simulation: results Selection frequencies for the first split: X1 : 0.128, X2 : 0.302, X3 : 0.556, none: 0.014 X1 X2 X3 none 0.0 0.2 0.4 0.6 0.8 1.0 Strongly biased towards variables with many possible splits Example of a tree: yes x3 = 1,2 no −0.19 x2 = 1,3 −0.098 0.36 Overfitting! (Note: complexity parameter not cross-validated) Desirable here: No split at all Problem source: Exhaustive search through all variables and all possible split points Numeric/multi-categorial categorial have more split options ⇒ Multiple comparison problem 7 / 36
  • 8. Idea of conditional trees Variable selection and search for split point ⇒ two steps Embed all decisions into hypothesis tests All tests with conditional inference (permutation tests) 8 / 36
  • 9. Ctree algorithm 1 Stop criterion Test global null hypothesis H0 of independence between Y and all Xj with j j H0 = ∩m H0 and H0 : D(Y|Xj ) = D(Y) j=1 If H0 not rejected ⇒ Stop 2 Select variable Xj∗ with strongest association 3 Search best split point for Xj∗ and partitionate data 4 Repeat steps 1.), 2.) and 3.) for both of the new partitions 9 / 36
  • 10. How can we test hypothesis of independence? Parametric tests depend on distribution assumptions Problem: Unknown conditional distribution D(Y |X ) = D(Y |X1 , ..., Xm ) = D(Y |f (X1 ), ..., f (Xm )) Need for a general framework, which can handle arbitrary scales Let the data speak: ⇒ permutation tests! 10 / 36
  • 12. Permutation tests: simple example Possible treatments for disease: A or B Numeric measurement (blood value) Question: Different blood values between treatment A and B? ⇔ µB = µA ? Test statistic: T0 = µA − µB ˆ ˆ H0 : µA − µB = 0, H 1 : µ A − µB = 0 Distribution unknown ⇒ Permutation test Treatment q q q q q q q q q q q A q B 1 2 y T0 = µA − µB = 2.06 - 1.2 = 0.86 ˆ ˆ 12 / 36
  • 13. Permute Original data: B B B B A A A A B A 0.5 0.9 1.1 1.2 1.5 1.9 2.0 2.1 2.3 2.8 One possible permutation: B B B B A A A A B A 2.8 2.3 1.1 1.9 1.2 2.1 1.5 0.5 0.9 2.0 Permute the labels (A and B) and the numeric measurement Calculate test statistic T for each permutation Do this with all possible permutations Result: Distribution of test statistic conditioned on sample 13 / 36
  • 14. P-value and decision k = {Permutation samples : |ˆA,perm − µB,perm | > |ˆA − µB |} µ ˆ µ ˆ k p-value = #Permutations p-value < α = 0.05? ⇒ If yes, H0 can be rejected 0.6 Test statistic of density 0.4 q original q permutation 0.2 q q qq q q q qqq q q q qq q qqq q q qq q qqq q q q qqq q qq q q qqq q 0.0 q q q −1.0 −0.5 0.0 0.5 1.0 Difference of means per treatment 14 / 36
  • 15. General algorithm for permutation tests Requirement: Under H0 response and covariables are exchangeable Do the following: 1 Calculate test statistic T0 2 Calculate test statistic T for all permutations of pairs Y , X 3 Compute nextreme : Count number of T which are more extreme than T0 nextreme 4 p-value p = npermutations 5 Reject H0 if p < α, with significance level α If # possible permutations too big, draw random permutations in 2.) (Monte Carlo sampling) 15 / 36
  • 16. Framework by Strasser and Weber General test statistic: n Tj (Ln , w ) = vec wi gj (Xij )h(Yi , (Y1 , ..., Yn ))T ∈ Rpj q i=1 h is called influence function, gj is transformation of Xj Choose gj , h depending on scale It’s possible to calculate µ and Σ of T (t−µ) Standardized test statistic: c(t, µ, Σ) = maxk=1,...,pq √ k (Σ)kk Why so complex? ⇒ Cover all cases: Multicategorial X or Y , different scales 16 / 36
  • 17. End of excursion Lets get back to business 17 / 36
  • 18. Ctree algorithm with permutation tests 1 Stop criterion Test global null hypothesis H0 of independence between Y and all Xj with j j H0 = ∩m H0 and H0 : D(Y|Xj ) = D(Y) (permutation tests j=1 for each Xj ) If H0 not rejected (no significance for all Xj ) ⇒ Stop 2 Select variable Xj∗ with strongest association (smallest p-value) 3 Search best split point for Xj∗ (max. test statistic c) and partition data 4 Repeat steps 1.), 2.) and 3.) for both of the new partitions 18 / 36
  • 19. Permutation tests for stop criterion Choose influence function h for Y Choose transformation function g for each Xj Test each variable Xj separately for association with Y j (H0 : D(Y |Xj ) = D(Y ) = Variable Xj has no influence on Y ) j Global H0 = ∩m H0 : No variable has influence on Y . j=1 Test global H0 : Multiple Testing ⇒ Adjust α (Bonferroni correction, ...) 19 / 36
  • 20. Permutation tests for variable selection Choose variable with smallest p-value for split Note: Switch to p-value comparison gets rid of scaling problem 20 / 36
  • 21. Test statistic for best split point Use test statistic instead of Gini/SSE for split point search n TjA (Ln , w ) = vec wi I (Xji ∈ A) · h(Yi , (Y1 , . . . , Yn ))T i=1 (T A −µ)k Standardized test statistic: c = maxk √j (Σ)kk Measures discrepancy between {Yi |Xji ∈ A} and {Yi |Xji ∈ A} / Calculate c for all possible splits; Choose split point with maximal c Covers different scales of Y and X 21 / 36
  • 22. Usage examples with R - Let’s get the party started - 22 / 36
  • 23. Bodyfat: example for continuous regression Example: bodyfat data Predict body fat with anthropometric measurements Data: Measurements of 71 healthy women Response Y : body fat measured by DXA (numeric) Covariables X : different body measurements (numeric) For example: waist circumference, breadth of the knee, ... h = Yi g = Xi n Tj (Ln , w) = wi Xij Yi i=1 ¯ ¯ Xij Yi −nnode Xj Y c= | t−µ | σ ∝ i :node (Pearson ¯ (Yi −Y )2 ¯ (Xij −Xj )2 i :node i :node correlation coefficient) 23 / 36
  • 24. Bodyfat: R-code library("party") library("rpart") library("rpart.plot") data(bodyfat, package = "mboost") ## conditional tree cond_tree <- ctree(DEXfat ~ ., data = bodyfat) ## normal tree classic_tree <- rpart(DEXfat ~ ., data = bodyfat) 24 / 36
  • 25. Bodyfat: conditional tree plot(cond_tree) 1 hipcirc p < 0.001 ≤ 108 > 108 2 9 anthro3c kneebreadth p < 0.001 p = 0.006 ≤ 3.76 > 3.76 3 6 anthro3c waistcirc ≤ 10.6 > 10.6 p = 0.001 p = 0.003 ≤ 3.39 > 3.39 ≤ 86 > 86 Node 4 (n = 13) Node 5 (n = 12) Node 7 (n = 13) Node 8 (n = 7) Node 10 (n = 19) Node 11 (n = 7) 60 60 60 60 60 60 50 50 50 50 50 50 40 40 40 40 40 40 30 30 30 30 30 30 20 20 20 20 20 20 10 10 10 10 10 10 25 / 36
  • 26. Bodyfat: CART tree rpart.plot(classic_tree) yes waistcir < 88 no anthro3c < 3.4 hipcirc < 110 17 hipcirc < 101 35 45 23 30 ⇒ Structurally different trees! 26 / 36
  • 27. Glaucoma: example for classification Predict Glaucoma (= eye disease) based on laser scanning measurements Response Y : Binary, y ∈ {Glaucoma, normal} Covariables X : Different volumes and areas of the eye (all numeric) (1, 0)T Glaucoma h = eJ (Yi ) = (0, 1)T normal g (Xij ) = Xij n Tj (Ln , w) = vec wi Xij eJ (Yi )T = i=1 ¯ T nGlaucoma · Xj,Glaucoma ¯j,normal nnormal · X ¯ ¯ c ∝ max ngroup · (Xj,group − Xj,node ) group ∈ {Glaucoma, normal} 27 / 36
  • 28. Glaucoma: R-code library("rpart") library("party") data("GlaucomaM", package = "ipred") cond_tree <- ctree(Class ~ ., data = GlaucomaM) classic_tree <- rpart(Class ~ ., data = GlaucomaM) 28 / 36
  • 29. Glaucoma: conditional tree Node 1 (n = 196) 1 normal glaucoma 0.8 0.6 0.4 0.2 0 Node 2 (n = 87) Node 5 (n = 109) 1 1 normal glaucoma normal glaucoma 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 Node 3 (n = 79) Node 4 (n = 8) Node 6 (n = 65) Node 7 (n = 44) 1 1 1 1 normal glaucoma normal glaucoma normal glaucoma normal glaucoma 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 0 0 ## 1) vari <= 0.059; criterion = 1, statistic = 71.475 ## 2) vasg <= 0.066; criterion = 1, statistic = 29.265 ## 3)* weights = 79 ## 2) vasg > 0.066 ## 4)* weights = 8 ## 1) vari > 0.059 ## 5) tms <= -0.066; criterion = 0.951, statistic = 11.221 ## 6)* weights = 65 ## 5) tms > -0.066 ## 7)* weights = 44 29 / 36
  • 30. Glaucoma: CART tree rpart.plot(classic_tree, cex = 1.5) yes varg < 0.21 no glaucoma mhcg >= 0.17 glaucoma vars < 0.064 glaucoma tms >= −0.066 eas < 0.45 normal glaucoma normal 30 / 36
  • 31. Appendix: Examples of other scales Y categorial, X categorial h = eJ (Yi ), g = eK (Xij ) ⇒ T is vectorized contingency table of Xj and Y Xj 1 2 3 Pearson residuals: 1.64 1 0.00 Y 2 −1.64 3 −2.08 p−value = 0.009 Y and Xj numeric,h = rg (Yi ), g = rg (Xij ) ⇒ Spearman’s rho Flexible T for different situations: Multivariate regression, ordinal regression, censored regression, . . . 31 / 36
  • 32. Properties Prediction accuracy: Not better than normal trees, but not worse either Computational considerations: Same speed as normal trees. Two possible interpretations of significance level α: 1. Pre-specified nominal level of underlying association tests 2. Simple hyper parameter determining the tree size Low α yields smaller trees 32 / 36
  • 33. Summary conditional trees Not heuristics, but non-parametric models with well-defined theoretical background Suitable for regression with arbitrary scales of Y and X Unbiased variable selection No overfitting Conditional trees structurally different from trees partitioned with exhaustive search procedures 33 / 36
  • 34. Literature and Software J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, volume 1. Springer Series in Statistics, 2001. T. Hothorn, K. Hornik, and A. Zeileis. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3): 651–674, 2006. H. Strasser and C. Weber. On the asymptotic theory of permutation statistics. 1999. R-packages: rpart: Recursive partitioning rpart.plot: Plot function for rpart party: A Laboratory for Recursive Partytioning All available on CRAN 34 / 36
  • 35. Appendix: Competitors Other partitioning algorithms in this area: CHAID: Nominal response, χ2 test, multiway splits, nominal covariables GUIDE: Continuous response only, p-value from χ2 test, categorizes continuous covariables QUEST: ANOVA F-Test for continuous response, χ2 test for nominal, compare on p-scale ⇒ reduces selection bias CRUISE: Multiway splits, discriminant analysis in each node, unbiased variable selection 35 / 36
  • 36. Appendix: Properties of test statistic T n µj = E(Tj (Ln , w)|S(Ln , w)) = vec wi gj (Xji ) E(h|S(Ln , w))T i =1 Σj = V(Tj (Ln , w)|S(Ln , w)) w. = V(h|S(Ln , w)) ⊗ wi gj (Xji ) ⊗ wi gj (Xji )T w. − 1 i T 1 − V(h|S(Ln , w)) ⊗ wi gj (Xji ) ⊗ wi gj (Xji ) w. − 1 i i n w. = wi i =1 E(h|S(Ln , w)) = w.−1 wi h(Yi , (Y1 , . . . , Yn )) ∈ Rq i −1 V(h|S(Ln , w)) = w. wi (h(Yi , (Y1 , . . . , Yn )) − E(h|S(Ln , w))) i (h(Yi , (Y1 , . . . , Yn )) − E(h|S(Ln , w)))T 36 / 36