Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

4. ML Scikit-learn

309 views

Published on

Scikit-learn

Published in: Engineering
  • Login to see the comments

  • Be the first to like this

4. ML Scikit-learn

  1. 1. 4. ML實作使用Scikit-learn Victory awaits him who has everything in order— luck, people call it. 62 14 April 2019
  2. 2. https://scikit-learn.org A python library that provides various implementation of machine learning/data mining algorithms About scikit-learn 63
  3. 3. About scikit-learn 64
  4. 4. 1. 資料前處理(Preprocessing) 2. 迴歸(Regression) 3. 分類(Classification) 4. 維度縮減(Dimensionality reduction) 5. 分群(Clustering) scikit-learn五大功能 65
  5. 5. Anaconda 已內建 Scikit learn Install Scikit learn 66 from sklearn.datasets import load_iris iris_data = load_iris() print(iris_data.keys()) print(iris_data.data)
  6. 6. 1. 讀取資料&pre-processing pd.read_csv(‘data.csv’) 2. 切分訓練集與測試集 X_train, X_test, y_train, y_test = train_test_split (x, y, test_size=0.2) 3. 模型配適 model = svm.SVC(…) model.fit(X_train,y_train) 4. 預測 pred_Y= model.predict(X_train) pred_Y= model.predict(X_test) 5. 評估(計算成績可能是誤差值或正確率或..) model.score(X_train,y_train) model.score(X_test,y_test) scikit-learn 的基本程式架構 67
  7. 7. Load iris dataset from sklearn 68 Example: /ML/sklearn/preprocess.ipynb
  8. 8. To randomly divide the data, sklearn provides a function called train_test_split() 切分訓練集與測試集 69
  9. 9. 運算的資料都只能是數值(numeric) 如何處理非數值資料? • Ordinal features(or label) • Nominal/Categorical features (or label) • 不同features 的數值範圍差異會有影響嗎? 1. 溫度: 最低0 度、最高40 度 2. 距離: 最近0 公尺、最遠10000 公尺 資料前處理 70
  10. 10. Ordinal features Create a new feature using mean or median Encoding Ordinal features (or label) 71 For example: {Low, Medium, High}  {0,1,2} or {5, 10, 15}
  11. 11. Categorical features  沒有次序或大小之分 Blood tyeps: {“A",“B",“O“,”AB”} One-hot encoding Encoding Categorical features (or label) 72 A [1,0,0,0] B [0,1,0,0] O [0,0,1,0] AB [0,0,0,1] A [ 1 0 0 0 ] RMSE= (1-1)^2 + (0-0)^2 + (0-0)^2 + (0-0)^2 = 0 B [ 0 1 0 0 ] RMSE= (0-0)^2 + (1-1)^2 + (0-0)^2 + (0-0)^2 = 0 O [0 0 1 0 ] RMSE= (0-0)^2 + (0-0)^2 + (1-1)^2 + (0-0)^2 = 0
  12. 12. Encoding categorical features 73 0 1 2  One-hot encoding LabelEncoder() to_categorical()
  13. 13. Data scaling-StandardScaler() 74
  14. 14. 5. 機器學習演算法 75
  15. 15. Supervised Learning – Regression 76
  16. 16. Linear Regression 77
  17. 17. Regression 迴歸 78
  18. 18. Single feature Regression 79 Ŷ = β0+ β1 X Ŷ = -0.5243+1.987X Ŷ (85)= 168.37 Single feature problem House Size (X) House Price (Y) 50 102 70 127 32 65 68 131 93 190 44 82 56 120 house size is the single feature -> X house price is the label -> Y
  19. 19. MultiRegression 80 Multi feature problem House Size (X1) Rooms (X2) Floor (X3) House Price (Y) 50 2 5 102 70 2 3 127 32 1 3 65 68 3 7 131 93 4 10 190 44 2 6 82 56 3 1 120 Ŷ = β0+ β1 X1+β2X2+β3X3 構建多元線性回歸模型時,隨著解釋變量數目的 增多,其中某兩個解釋變量之間產生多重共線性 的機會就會大增。此時就需要考慮是否將其中某 個變量從模型中剔除出去,甚至是重新考慮模型 的構建。 共線性: 若X1和X2 非獨立變數, 則 β1 也會影響 X2
  20. 20. 好的Regression Line 就是誤差最小的那㇐條線 如何找出誤差最小的那條線? L(yi, f(Xi)) 叫做Loss Function或是Cost Function. Loss越小,就代表模型的配適性越好 定義損失函數(Loss Function) 81 不同問題或模型, 都必須明確定義出好的Loss Function
  21. 21. • MAE (Mean Absolute Error) • MSE (Mean Squared Error) • RMSE (Root Mean Squared Error) 迴歸常用的Loss Function 82
  22. 22. 如何找出最佳參數使得Loss 最小 83 可以利用 Gradient Descent 方法找出最適參數B0,B1,B2…, 使得Loss 最小
  23. 23. Choose an initial vector of parameters and learning rate Repeat until an approximate minimum is obtained: • Randomly shuffle samples in the training set. • For each sample do 𝑳 𝑾 SGD(Stochastic Gradient Descent) 84 W W W1 W2
  24. 24. 資料 re-scale 對 Weight 影響 85 如果不re-scale , 值域特大的特徵會對Loss貢獻比較多 修正W2對LOSS影響比較大 有做re-scale 收斂速度也比較快
  25. 25. Polynomial Linear Regression 86 Polynomial regressionLinear Regression
  26. 26. sklearnRegression Linear Regression-1.ipynb sklearnRegression Linear Regression-2.ipynb Lab 87
  27. 27. Regularization 88
  28. 28. 為了限制weights 的大小,以避免落入Overfitting 狀態,我們將J(f)加入損失函數中, 我們叫做 Regularization Regularization 89 λ是用來調整regularization 的比重 小心顧此失彼(降低weights 的大小而犧牲模型準確性)
  29. 29. 90 λ要用多大? 若是λ很大的話,將會使得所有的W參數都變得很小而沒什麼影響力, 這會造成Underfitting的問題; 若是λ很小的則就像是λ根本沒加進去一樣, 仍然沒解決Overfitting的問題
  30. 30. Regularization又分兩種,第一種是L1正則化 (Lasso),第二種是L2正則化 (Ridge) L1 and L2 Regularizers 91 (Ridge Regression)  reduces the coefficients close to zero (Lasso Regression)  can also do feature selection
  31. 31. Lab: lasso_regression.ipynb 92
  32. 32. MSE, MAE, R2 (R-Square) Metrics are used to monitor and measure the performance of a model (during training, and test), and do not need to be differentiable. However if for some tasks the performance metric is differentiable, it can be used both as a loss function (perhaps with some regularizations added to it), and a metric, such as MSE Regression Metric 93
  33. 33. R2 94 結果是1  模型無錯誤 結果是0 模型跟瞎猜差不多 結果是0~1之間的數 即模型的好壞程度 結果是負數我們的模型還不如瞎猜 https://en.wikipedia.org/wiki/Coefficient_of_determination
  34. 34. APPENDIX Gradient Descent Algorithm 95
  35. 35. Choose an initial vector of parameters and learning rate Repeat until an approximate minimum is obtained: • Randomly shuffle samples in the training set. • For each sample do 𝑳 𝑾 SGD(Stochastic Gradient Descent) 96 W W W1 W2
  36. 36. Gradients 97 Y=XWT+b Bias 這一項其實可以直接在X加上一欄 “1” 2x3 np.c_[np.ones((len(X),1)),X] np.dot(X,w.T)+b np.dot(X,w.T)
  37. 37. Gradients 98 gradient Loss function: MSE X T Loss Gradient 3x2 2x1 3x1 loss=np.dot(X,w.T)-y 1/m*np.sum(np.square(loss) w=w- l*2/m*np.dot(X.T,loss)
  38. 38. Gradient Gradient : 它是一個向量, 包含所有變數的偏微分值, 而這 些值代表該函數中每個變數在特一定點所看到坡度 99 y z Gradient 描述了每一個人對於谷底位置的認知強度 最終決議的方向 Gradient is a derivative of a function at a certain point. 梯度大小是傾斜程度; 梯度方向是最陡的方向

×