Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

7

Share

Decision Tree Intro [의사결정나무]

이해하기 쉬운 의사결정나무 강의 자료입니다

Related Books

Free with a 30 day trial from Scribd

See all

Decision Tree Intro [의사결정나무]

  1. 1. 김현우 a.k.a 순록킴 yBigTa 9기 심리학 & 컴퓨터과학
  2. 2. Tree Python yBigTa
  3. 3. the Tree Contents { decision tree, RSS, Gini, pruning };
  4. 4. Random ForestTree
  5. 5. Structure
  6. 6. Node Root Node Leaf
  7. 7. * Tree as an Algorithm Tree as a Data Structure
  8. 8. Tree as a Data Structure
  9. 9. 6 2 8 1 94 Tree as a Data Structure Binary Search tree Red-Black tree AVL tree 2,4 tree Heap
  10. 10. * Tree as an Algorithm
  11. 11. Decision Tree Random Forest
  12. 12. Decision Tree Regression & Classification
  13. 13. Decision Tree Regression
  14. 14. Predicting Baseball players’ Salary (log transformed)
  15. 15. Predicting Baseball players’ Salary (log transformed)
  16. 16. Decision Tree a top-down, greedy approach
  17. 17. Decision Tree: Greedy algorithm
  18. 18. Decision Tree: Greedy algorithm
  19. 19. Decision Tree: Greedy algorithm
  20. 20. Decision ?
  21. 21. How it works R1 R2 R3 R1 R3 R2
  22. 22. Splitting Criterion for Regression
  23. 23. Splitting criterion, Regression RSS, Residual Sum of Squares : 잔차 제곱의 합 SSE, Sum of Squared Errors of prediction minimize
  24. 24. Splitting criterion, Regression RSS, Residual Sum of Squares: 잔차 제곱의 합 한 영역 안의 평균 데이터 하나 한 영역 안에서, 데이터들과 그 평균 간의 차이 의 합 모든 영역에서, 그 값들의 합들의 합 2
  25. 25. How it works R1 R2 R3 R1 R3 R2
  26. 26. Splitting criterion, Regression RSS, Residual Sum of Squares: 잔차 제곱의 합 한 영역 안의 평균 데이터 하나 한 영역 안에서, 데이터들과 그 평균 간의 차이 의 합 모든 영역에서, 그 값들의 합들의 합 2
  27. 27. Decision Tree, Regression RSS, Residual Sum of Squares: 잔차 제곱의 합 1. Select predictor(변수) X and cutpoint(분할 기준점) t that split the predictor space into the regions { X | X < t } and { X | X >= t } 2. Select the ones that leads to the greatest possible reduction in RSS RSS를 가장 크게 감소시키는 X와 t를 고르자 == Select the one among the resulting trees that has the lowest RSS 어떤 X와 어떤 t를 고르는지에 따라 다양한 tree시나리오가 만들어질텐데 그 중에서 RSS가 가장 작게 나오는 tree시나리오를 고르는 알고리즘
  28. 28. How it works R1 R2 R3 R4 R5
  29. 29. How it looks
  30. 30. Decision Tree Classification
  31. 31. Predicting Iris Data with 2 variables: Petal length, width
  32. 32. Predicting Iris Data with 2 variables: Petal length, width
  33. 33. Splitting Criterion for Classification
  34. 34. Splitting Criterion Gini Index Cross Entropy
  35. 35. Splitting criterion, Classification Classification error rate Resubstitution error
  36. 36. Splitting criterion, Classification Gini index: measure of impurity 불순도 The proportion of training observations in the m-th region that are from the k-th class ‘영역 m’에서 ‘분류 k’에 해당하는 데이터의 비율 ‘영역 m’에서 ‘분류 k’에 해당하지 않는 데이터의 비율
  37. 37. Splitting criterion, Classification Classification error rate Resubstitution error
  38. 38. Splitting criterion, Classification Gini index: measure of impurity 불순도
  39. 39. Splitting criterion, Classification Gini index: measure of impurity 불순도 minimize
  40. 40. Splitting criterion, Classification Gini index: measure of impurity 불순도 ‘영역 m’에서 ‘분류 k’에 해당하지 않는 데이터의 비율 The proportion of training observations in the m-th region that are from the k-th class ‘영역 m’에서 ‘분류 k’에 해당하는 데이터의 비율
  41. 41. Splitting criterion, Classification Gini index: measure of impurity 불순도 가 0이나 1에 가까울수록..?
  42. 42. Splitting criterion, Classification Gini index: measure of impurity 불순도 weight * 전체 데이터 개수 대비 노드 안에 있는 데이터 개수의 비율
  43. 43. 쉬는 시간
  44. 44. Splitting criterion, Classification Gini index: measure of impurity 불순도 minimize
  45. 45. Splitting criterion, Classification Gini index: measure of impurity 불순도 직접 해봅시다
  46. 46. Split on Gender Split on Class X / XI
  47. 47. Splitting criterion, Classification Gini index: measure of impurity 불순도 Split on Gender Students: 30 Play Overwatch: 15 (50%) Students: 10 Overwatch: 2 (20%) Students: 20 Overwatch: 13 (65%) Girl Boy
  48. 48. Splitting criterion, Classification Gini index: measure of impurity 불순도 Split on Gender Students: 30 Play Overwatch: 15 (50%) Students: 10 Overwatch: 2 (20%) Students: 20 Overwatch: 13 (65%) ( 0.2 * 0.8 + 0.8 * 0.2 ) * 0.33 ( 0.65 * 0.35 + 0.35 * 0.65 ) * 0.66 + = 0.4 Girl Boy
  49. 49. Splitting criterion, Classification Gini index: measure of impurity 불순도 Split on Class Students: 30 Play Overwatch: 15 (50%) Students: 14 Overwatch: 6 (43%) Students: 16 Overwatch: 9 (56%) 10th 11th
  50. 50. Splitting criterion, Classification Gini index: measure of impurity 불순도 Split on Class Students: 30 Play Overwatch: 15 (50%) Students: 14 Overwatch: 6 (43%) Students: 16 Overwatch: 9 (56%) ( 0.43 * 0.57 + 0.57 * 0.43 ) * 0.47 ( 0.56 * 0.44 + 0.44 * 0.56 ) * 0.53 + = 0.49 10th 11th
  51. 51. Splitting criterion, Classification Gini index: measure of impurity 불순도 Split on Gender Students: 30 Play Overwatch: 15 (50%) Students: 10 Overwatch: 2 (20%) Students: 20 Overwatch: 13 (65%) Girl Boy
  52. 52. Stopping Criterion
  53. 53. Stopping criterion 1. The node is pure 2. There are fewer observations than MinLeafSize 3. The algorithm splits MaxNumSplits
  54. 54. 작은 문제
  55. 55. Overfitting Algorithm becoming too specific to the data you used to train it. It cannot generalize very well to the data you haven’t given it before.
  56. 56. Overfitting Bias & Variance
  57. 57. Overfitting & Accuracy
  58. 58. Decision Tree Regression
  59. 59. Overfit: 과적합
  60. 60. 가지치기
  61. 61. Pruning Reduced Error Pruning Cost-complexity Pruning
  62. 62. Pruning methods Cost-complexity Pruning
  63. 63. Tree constructing → Stop Split Split Split SplitSplit Split Split Split Split Split Split → Prune
  64. 64. Model selection Training set Test set
  65. 65. Decision Trees CART Classification and Regression Tree C5.0 CHAID
  66. 66. Decision Trees CART CHAID
  67. 67. Tree & Linearity Unlike linear models, they map non-linear relationships quite well
  68. 68. Tree & Linearity
  69. 69. Tree & Non-linearity Set of Logistic regressions
  70. 70. Tree & advantages 1. 이해하기 쉽다: 씹고 뜯고 맛보고 즐기고 [White box] 2. 데이터 정제가 크게 필요하지 않다: 바로 넣자 3. numerical, categorical 가리지 않는다: 그냥 넣자 4. 데이터가 어떤 패턴인지 볼 때 편하다: 넣어봐
  71. 71. Tree & disadvantages 1. Overfitting 2. 연속된 수치 값에는 좀 약한 모습이…
  72. 72. Tree implementation
  73. 73. Tree 생각보다 이곳저곳 많이 쓰인다
  74. 74. Tree 머신러닝의 좋은 출발점 감사합니다
  • KimWonil2

    Jul. 18, 2018
  • fastball43

    Jul. 17, 2018
  • ssuser74f4f0

    Apr. 27, 2018
  • kabjinkwon

    Jan. 28, 2018
  • JeongKyunPark

    Dec. 28, 2017
  • hkson

    Nov. 14, 2017
  • lumiamitie

    Oct. 13, 2017

이해하기 쉬운 의사결정나무 강의 자료입니다

Views

Total views

923

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

0

Shares

0

Comments

0

Likes

7

×