Workshop 210417 dhlee

Dongheon Lee
Assistant Professor
Department of Biomedical Engineering
Chungnam National University
융합형 의사과학자 양성사업 1차 워크샵
Introduction to Deep Learning
& Machine Learning
with Practical Exercise
2021. 04. 17

https://github.com/dhlee-jubilee/Workshop_210417
이론, 실습 자료 다운로드

1950
Turing Test
1950
Turing Test

2017
EBS 인공지능 2부 - 이미테이션 게임

2020
Generative Pre-trained Transformer
Search Engine Watch / MIT Technology Review
• Dataset (3천억 개의 토큰)
• 1,750억개 Parameter
• 1회 학습 비용: 50억~150억 추산

2020
https://blog.pingpong.us/gpt3-review/

https://openai.com/blog/dall-e/ https://openai.com/blog/clip/
“an armchair in the shape of avocado. An armchair imitating an avocado.”

https://www.korea.kr/special/policyCurationView.do?newsId=148868542

“인공지능이 가장 중요한 세상(AI-first world)에서
우리는 모든 제품을 다시 생각하고 있다. (we are rethinking all our products)”
순다르 피차이 (구글 최고경영자) 2017.05.18
http://www.newsis.com/view/?id=NISX20170518_0014902945

https://medium.com/syncedreview/nips-tickets-sell-out-in-less-than-12-minutes-e3aab37ab36a

https://venturebeat.com/2018/05/10/carnegie-mellon-university-starts-first-ai-degree-program-in-u-s/

• 공개 소프트웨어 (AI)
• 공개 데이터
• 공개 논문
• 공개 경진대회
• 공동 프로젝트
• Open AI Promotion Community
AI 기술 진보는 어떻게 오는가?
AIRI 400, “인공지능의 개요, 가치, 그리고 한계”

https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/

https://hyeonjiwon.github.io/machine%20learning/ML-1/

• 인간이 연속된 경험을 통해 배워가는 일련의 과정 - David Kolb
• 기억(Memorization)하고 적응(Adaptation)하고, 이를 일반화(Generalization)하는 것
• 모든 것을 프로그래밍 할 수 없다.
• 모든 상황을 커버할 수 있는 룰을 만드는 것은 불가능하다.
• 알고리즘으로 정의하기 어려운 일들이 있다.
AIRI 400, “Machine Learning 기초”
Q. What is Learning?
Q. Why Machines need to Learn?

https://www.youtube.com/watch?v=1DmuFPVlITc

http://www.ciokorea.com/news/34370

1. 문제 정의
2. 데이터 수집 및 전처리
3. 특징 선택 및 추출
4. 알고리즘 선택
5. 학습
6. 평가

농어
연어
AIRI 400, 패턴인식-기계학습의 원리, 능력과 한계
농어와 연어를 구별하자!

대부분의 시간은 여기에서...

• 특징(Feature) 종류?
• 특징(Feature) 갯수?
https://brilliant.org/wiki/feature-vector

• 특징(Feature) 종류?
• 특징(Feature) 갯수?
Length
Lightness
Width
Numberandshapeoffins
Positionofthemouth
…
=
생선
Feature vector
https://brilliant.org/wiki/feature-vector

‘길이’

‘밝기’

‘길이 & 밝기’

• 전 세계 주요 국가의 100만명 당 연간 초콜릿 소비량과 노벨상 수상자 수와의 상관관계 분석에 대한 결과를 발표
(NEJM, 2012)
• 그 결과는 매우 강한 상관관계가 있음 (r=0.791; 통상 상관계수 r값이 0.7 이상이면 매우 강한 상관관계)
• 이 상관계수는 노벨위원회가 있는 스웨덴을 제외할 경우 0.862로 더 높아짐
The NEJM, “Chocolate Consumption, Cognitive Function, and Nobel Laureates”

• More features  better performance.
• Too many features  poor generalization capability.
→ ‘Curse of Dimensionality’
http://www.infme.com/curse-of-dimensionality-ml-big-data-ml-optimization-pca/

https://www.datasciencecentral.com/profiles/blogs/a-tour-of-machine-learning-algorithms-1?overrideMobileRedirect=1%20%EC%B6%9C%EC%B2%98:%20https://statwith.tistory.com/693%20[STATWITH]

1. Linear Model
2. Support Vector Machine (SVM)
3. Decision Tree  Random Forest  XGBoost
4. Artificial Neural Network (ANN)  Deep Learning
Machine Learning Algorithms

Continuous value Categorical value
e.g. Least Square Method e.g. Cross-entropy
• Loss function
• ‘학습’이란 곧, 최적의 파라미터를 찾는 과정
• 학습의 기준이 되는 함수 (Cost function)를 정의
• Cost 가 최소화되도록 학습
X: Data
h(X): Prediction
y: Label
Cost function

• Learning rate
• Regularization constant
• Weight Initialization
• Batch size
…
SVM Random Forest ANN
• Number of estimators
• Max depth
• Criterion (gini ,entropy)
…
• Regularization (C)
• Kernel type (RBF)
…
https://medium.com/@cjl2fv/an-intro-to-hyper-parameter-optimization-using-grid-search-and-random-search-d73b9834ca0a
Hyperparameter Selection

http://cs231n.stanford.edu/
Hyperparameter Tuning

• 학습하지 않은 데이터에서도 좋은 성능을 보임 (↔ Overfitting)
Generalization

(1) Regularization e.g. Weight decaying, Dropout
(2) Cross-Validation
http://cs231n.github.io/neural-networks-3/
Generalization

Model:
Cost function:
Cost function(일반식):
(1) Regularization - Weight Decaying
https://www.google.com/search?q=weight+decay&safe=active&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjoqLHNgMfpAhVJUd4KHULxDswQ_AUoAXoECA0QAw&biw=1280&bih=688#imgrc=VeKNwCxx9J4BzM

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
(2) Cross-Validation

• 학습된 Model을 Test dataset에 적용
• Task (Classification, Detection, Segmentation 등) 에 따라 다른 Metric 사용
Area Under the Curve (AUC)
Confusion Matrix
CS 229 - Machine Learning

1. 문제 정의
2. 데이터 수집 및 전처리
3. 특징 선택 및 추출
4. 알고리즘 선택
5. 학습
6. 평가
VS

Machine Learning
Algorithms
- Supervised Learning -

Linear Model
Linear Regression
Logistic Regression
Regression  Classification

Linear Model
Linear Regression
Non-linear Regression
Representation ↑
Complexity ↑
(Parameters 多)

Linear Model
Linear Regression
Non-linear Regression
Ridge Regression (L2 Reg.)
Lasso Regression (L1 Reg.)
ElasticNet (L1 Reg. + L2 Reg.)

Support Vector Machine *Kernel:1)고차원 매핑 2)내적 두가지를 동시에 계산

Decision Tree
• 정답에 가장 빨리 도달하는 예/아니오 질문 목록을 학습
• 예/아니오 기준: Impurity 지표 (Entropy, GINI 계수 등)
• Overfitting 방지
1) Pre-pruning : 트리 생성을 일찍 중단
(max depth, max/min leaf nodes)
2) Post-pruning : 트리를 만든 후 데이터 포인트가
적은 노드를 삭제하거나 병합
Introduction to Machine Learning with Python
 그럼에도 불구하고 Overfitting 발생

Decision Tree Ensemble
1) Random Forest
2) Gradient Boosting
• 조금씩 다른 여러 Decision Tree 묶음
: 잘 작동하되 서로 다른 방향으로 overfitting 된 트리를 많이 만들어
그 결과를 Voting/Average 하여 overfitting 양을 줄일 수 있음
• max depth, estimator #, max features (무작위성)
Introduction to Machine Learning with Python https://www.geeksforgeeks.org/ml-gradient-boosting/
• 이전 트리의 오차를 보완하여 순차적으로 트리를 만듬(무작위성 x)
• Pre-pruning으로 생성한 1~5 depth 의 week learner 을 많이 연결
• estimator #, learning rate
• Parameter tuning 어렵고 훈련 시간이 길지만 높은 성능을 보임
(e.g. XGBoost)

X1
X2
Xn
h1
h2
hn+1
y
h1 = f(w11X1 + w12X2 + w13X3)
h2 = f(w21X1 + w22X2 + w23X3)
…
hn+1 = f(w(n+1)1X1 + w(n+1)2X2 + w(n+1)3X3)
w11
w21
w22
w12
N x (N+1)
• f(x) : Activation function
w(n+1)n

X1
X2
Xn
h1
h2
hn+1
y
h1 = f(w11X1 + w12X2 + w13X3)
h2 = f(w21X1 + w22X2 + w23X3)
…
hn+1 = f(w(n+1)1X1 + w(n+1)2X2 + w(n+1)3X3)
y= f(wy1h1 + wy2h2 + … + wy(n+1)hn+1)
w11
w(n+1)n
w21
w22
w12
N x (N+1)
wy1
(N+1)
wy2
Wy(n+1)
• y: Prediction

X1
X2
Xn
h1
h2
hn+1
y
h1 = f(w11X1 + w12X2 + w13X3)
h2 = f(w21X1 + w22X2 + w23X3)
…
hn+1 = f(w(n+1)1X1 + w(n+1)2X2 + w(n+1)3X3)
y= f(wy1h1 + wy2h2 + … + wy(n+1)hn+1)
Loss = D(y, Y)
w11
w(n+1)n
w21
w22
w12
N x (N+1)
wy1
(N+1)
wy2
Wy(n+1)
• y: Prediction
• Y: Ground Truth
• D(x): Loss function
→ min(Loss)

ℒ
Backpropagation
Chain rule
http://tamaszilagyi.com/blog/2017/2017-11-11-animated_net/

https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html

1998
http://yann.lecun.com/exdb/lenet/

= Feature “Engineering”
= Feature “Learning”

Challenges in Visual Recognition

http://joelouismarino.github.io/blog_posts/blog_VAE.html

Classification
Gulshan, Varun, et al. Jama, 2016
Esteva, Andre, et al. Nature 2017
Jin, Eun Hyo, Lee, Dongheon, et al. Gastroenterology 2020
Hannun, Awni Y., et al. Nature medicine 2019
Khosravi, Pegah, et al. medRxiv 2019.
Iizuka, Osamu, et al. Scientific
reports 2020
X: Image y: Name
Lunit Insight

Detection X: Image y: Name, (Xmin, Ymin, Width, Height)
Yan, Chaochao, et al. International Conference on Bioinformatics,
Computational Biology, and Health Informatics. 2018
Liu, Ming, Jue Jiang, and Zenan Wang. IEEE Access 2019

Segmentation
Bejnordi, Babak Ehteshami, et al. Jama 2017
X: Image y: Mask (Binary Image)
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. Springer, Cham, 2015.
Havaei, Mohammad, et al. Medical image analysis 2017 https://www.rsipvision.com/airways-segmentation/

Pose Estimation X: Image y: Keypoints (X1, Y1), (X2, Y2) … (Xn, Yn)
Du, Xiaofei, et al. IEEE transactions on
medical imaging 2018
Martí
nez-González, Angel, et al. 2018 IROS
https://how2electronics.com/gesture-recognition-application-machine-learning/

Action Recognition X: Video y: Action
https://www.youtube.com/watch?v=hs_v3dv6OUI&ab_channel=PreferredNetworks%2CInc. https://endovissub2017-workflow.grand-challenge.org/

http://www.asiae.co.kr/news/view.htm?idxno=2018102511092600847
Unpaired image-to-image translation using cycle-consistent adversarial networks, ICCV, 2017
Glow: Generative flow with invertible 1x1 convolutions, NIPS, 2018
Generation (Image)

Generation (Image)
Rivenson, Yair, et al. Nature biomedical engineering 2019

https://www.youtube.com/watch?v=i2kqZXhA4Rw
Generation (Voice, Text)
https://dev.to/amananandrai/another-10-gems-of-gpt-3-2639

Generation (Voice, Text)
청각장애인 김소희씨가 지난 2월22일 서울 KT융합기술원에서 수화
통역을 통해 자신의 목소리 복원 과정에 대한 설명을 듣고 있다. KT 제공 VUNO Med-DeepASR
Speech to Text

Super Resolution
Kang, Eunhee, Junhong Min, and Jong Chul Ye. Medical physics 2017
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. CVPR. 2017

Softmax
Convolution Layer + Activation Function

*
Image Filter (= kernel)
‘Convolution’
Convolution

Convolution Layer

Activation Function

Zero padding
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks#ct-architectures
Zero Padding

Pooling Layer

Fully Connected Layer
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks#ct-architectures

Softmax
https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60

Softmax
Summary
Convolution Layer + Activation Function

Summary
e.g.
# CNN 모델 정의
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(layers.ReLU())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3),))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

Training Techniques
Image Normalization
Hyperparameter Optimization Augmentation
https://hugrypiggykim.com/2018/08/08/recurrent-batch-normalization/
Batch Normalization

CNN의 발전
VGG (2014) GoogLeNet (2014)
ResNet (2015) DenseNet (2016)

State of the Art CNN
EfficientNet-v2 (2021. 04)
AutoML - NASNet (2019)

http://www.yakup.com/news/index.html?mode=view&cat=14&nid=223555

Why Important?
Medical images account for
at least 90% of all medical data!
The largest data source in the health-care industry

Remarkably, A.I. "All eyes are on AI.” Nature Biomedical Engineering

Ophthalmology (JAMA, 2016)
1) 목표: 당뇨성 망막병증 자동 진단
2) 딥러닝 모델 : Inception-v3
3) 데이터셋
3-1) 학습 이미지 : 약 13만장
3-2) 평가 이미지
• 안저 이미지 공개 데이터셋에 대해 검증
① EyePACS-1: 4,997명의 환자들로부터 9,963장
② Messidor-2: 874명의 환자들로부터 1,748개
• Ophthalmologist 7-8명 검증
Gulshan, Varun, et al. "Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs." Jama, 2016

Dermatology (Nature, 2017)
Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 2017
1) 목표: 피부 질환 자동 진단
2) 딥러닝 모델 : Inception-v3
3) 데이터셋
3-1) 학습 이미지 : 약 13만장
① 표피세포 암 vs 지루각화증 (135장, 707장)
② 성 흑색종과 vs 양성 병변
(표준 이미지 데이터 기반 130장, 225장)
③ 성 흑색종과 vs 양성 병변
(더마토스코프로 찍은 이미지 기반 111장, 1,010장)

https://www.youtube.com/watch?v=toK1OSLep3s

1) 목표: 림프 노드 전이 자동 진단
2) 딥러닝 모델 : 다양한 모델
3) 데이터셋 (경진 대회)
3-1) 학습 이미지 : 270장
3-2) 평가 이미지 : 129 슬라이드
Pathology (JAMA, 2017)
Bejnordi, Babak Ehteshami, et al. "Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer." Jama 2017
① Metastasis Identification
② Whole-Slide Image Classification

Endoscopy (Nature Biomedical Eng., 2018)
Wang, Pu, et al. Nature biomedical engineering 2.10 (2018): 741-748.
Courtesy of William E. Karnes, MD, AGAF.
Courtesy of Michael Byrne, MD, MA (Cantab), MRCP, FRCPC.
Courtesy of Sophie Xiao and Jing Jiu Liu, Wision AI.
Sharma, Prateek, Anjali Pante, and Seth A. Gross. "Artificial intelligence in
endoscopy." Gastrointestinal endoscopy 91.4 (2020): 925-931.

Radiology
Montagnon, Emmanuel, et al. "Deep learning workflow in radiology: a primer." Insights into imaging 11.1 (2020): 22.
Sechopoulos, Ioannis, Jonas Teuwen, and Ritse Mann. Seminars in Cancer Biology. Academic Press,
2020.

Cardiology (Nature Medicine, 2019)
• 336개의 기록 (환자 328명)
• Cardiologists 6명 검증
Hannun, Awni Y., et al. "Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network." Nature medicine 2019
1) 목표: 패치형 심전도 기기 기반의 자동 부정맥 진단
2) 딥러닝 모델 : 1D-CNN (34-layer)
3) 데이터셋
3-1) 학습 이미지
• 부정맥 12가지 (환자 30,000명)
• ECG: Annotations / sec
(Sampling rate = 200Hz)

Cardiology (Nature, 2020)
• 1,277 test dataset, 2,895 external test dataset
Ouyang, David, et al. "Video-based AI for beat-to-beat assessment of cardiac function." Nature 580.7802 (2020): 252-256.
1) 목표: 심초음파 기반의 심장 기능 자동 평가
2) 딥러닝 모델 : EcoNet-Dynamic
3) 데이터셋
3-1) 학습 이미지
• 20,060 frames (during end systole and end diastole)

https://tkipf.github.io/graph-convolutional-networks/
Genomics (Nature Biotechnology, 2015)
Park, Yongjin, and Manolis Kellis. "Deep learning for regulatory genomics." Nature biotechnology 33.8 (2015): 825-826.

Drug Development (Nature Biotechnology, 2020)
Eisenstein, Michael. "Active machine learning helps drug hunters tackle biology."
Nature biotechnology 38.5 (2020): 512.
자료: 과기정통부

Solares, Jose Roberto Ayala, et al. Journal of biomedical informatics 101, 2020
EMR (Journal of Biomedical Informatics, 2020)
 Recurrent Neural Network (RNN) 강세
Shickel, Benjamin, et al. IEEE Journal of Biomedical and Health Informatics, 2017

의료 인공지능의 한계점

1) Dataset Bias
2) Lack of Labeled Dataset
3) Uncertainty
4) Explainability (Interpretability)
5) Convergence of Human & AI
AI Challenges in Medicine
6) Safety
7) Model Product Usability
8) Liability
9) Next Generation of AI
10) AI Education

1) Dataset Bias
• 특정 병원, 특정 기기로 획득한 데이터는 대표성, 정확성, 질적 균질성 등에서 여러가지 문제점을 가짐.
 빅데이터를 사용하여 모델을 개발하나, 갯수를 늘리는 것만으로도 해결되지 않는 문제들이 제기됨.
e.g. 흑인에서 흑색종을 잘 진단하지 못함
e.g. 미국에서 광범위하게 사용되는 건강 위험도 평가 알고리즘 (인종적 바이어스를 가진 것으로 보고)
 Prospective research, Multi-center / International Dataset
• AI가 데이터를 학습하는 과정에서 부수적 요소를 학습하여 예측하는 사례
e.g. 외과 수술용 마킹으로 흑색종의 위양성 판정
e.g. 사진 속에 자(ruler) 유무 같은 부수적 요인으로 질병을 진단
Winkler, Julia K., et al JAMA dermatology 2019
Lashbrook A. AI-driven dermatology could leave dark-skinned patients behind [Internet]: The Atlantic; 2018
Jung, Jin Sup. Korean Medical Education Review (2020)
Obermeyer Z, et.al, Science. 2019
Tomasev N, Glorot X, Rae JW, Zielinski M, Askham H, Saraiva A, et al. Nature. 2019
e.g. Google DeepMind의 급성신부전 예측 모델 (보훈병원의 특성상 학습 데이터에 여자 환자 비율이
6.38%로 낮아 여자 환자에 대한 예측률이 유의하게 떨어짐)

• Augmentation
: 학습용 데이터에 변형을 주어 증가시키는 방법
• Weekly-supervised Learning
: 소규모의 데이터로 학습한 모델로 예측한 결과를
다시 (선별하여) 학습데이터로 사용하는 방법
• Self-supervised Learning
: 정답이 없는 데이터의 주변 Context 정보를
활용하여 학습하는 방법
2) Lack of Labeled Datasets
Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. ICCV, 2015.
Jin, Eun Hyo, Lee, Dongheon, et al. Gastroenterology, 2020.

3) Uncertainty
Esteva, Andre, et al. Nature 2017 McGill, Sarah K., et al. Gut 62.12 (2013): 1704-1713.

 Bayesian Deep Learning
AngelosFilos, S., Gomez, A., & Rudner, T. J. Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis.
dog
cat
unknown
3) Uncertainty
http://taeoh-kim.github.io/blog/bayesian-deep-learning-introduction/

Source: Department of Defense, Advanced Research Projects Agency (DARPA)

"Grad-cam: Visual explanations from deep networks via gradient-based localization." ICCV, 2017. "Visualizing data using t-SNE." Journal of machine learning research 9.Nov, 2008
→ Explainable AI (XAI) techniques

5) Convergence of Human and AI
2nd Summer School on Deep Learning for Computer Vision Barcelona

Jin, Eun Hyo, Lee, Dongheon, et al. "Improved Accuracy in Optical Diagnosis of Colorectal Polyps Using Convolutional Neural Networks with Visual Explanations." Gastroenterology (2020).
Test 2
Test 1
(1 month)
Result of Grad-CAM
Result of CNN Diagnosis
(Confidence value)
‘Shuffle’

6) Safety
• AI 학습을 위해 의무기록이 익명화 처리되었으나, 병원에서 진료받은 날짜 및 구글의 위치정보 등 다른 정
보로 개인을 특정하는 것이 가능할 수 있음
e.g. Google, UCLA, Chicago 의과대학 EMR 예측연구 개인정보 보호 위반으로 고소
e.g. Singapore health 150만의 개인정보가 유출
 병원에서 개발 회사의 서버로 공유되는 과정에서 개인정보유출에 대한 보호관리정책과 보안 중요
 ‘합성데이터’ 활용 방안
Wakabayashi D. Google and the University of Chicago are sued
over data sharing [Internet]. The New York Times; 2019
Singapore healthcare cyberattack: officials will not name hackers who
targeted Prime Minister Lee Hsien Loong [Internet]. Hong Kong: South China Morning Post; 2019

6) Safety
Explaining and Harnessing Adversarial Examples, ICLR 2015
Adversarial Attack

Han, Xintian, et al. "Deep learning models for electrocardiograms are
susceptible to adversarial attack." Nature Medicine (2020): 1-4.
 Adversarial Attack Strategies
Chakraborty, Anirban, et al. "Adversarial attacks and
defences: A survey." arXiv preprint arXiv:1810.00069 (2018).

7) AI Product Usability
• 2019년 미국 FDA에서 SaMD 제품 하나하나에 대한 평가보다 개발업체의 제품 개발과정에 대한 신뢰도를 평가하여
사전승인(pre-certification)을 부여하고 시판 후 성능을 지속적으로 평가하는 테스트 프로그램을 9개 회사를 대상으
로 운영하고 있음.
• 그러나 현재 FDA에서 승인한 AI 관련 제품 상당수가 관련 기술의 근거를 기술보호라는 명분 아래 논문으로 발표하지
않거나 기술에 대한 전문가적인 평가가 미흡한 상태로 시장에 출시되고 있음.
 사후 평가하는 제도에 대해 우려가 제기되고 있어 AI 제품의 안전성 및 효능 검정기준 검토가 필요함.
• SaMD (Software as a Medical Device)는 하드웨어보다
빠르고 업데이트가 쉬운 소프트웨어 솔루션을 통해 기존 의료
기기의 기능을 높일 수 있으며, SaMD를 사용하거나 개발하는
회사의 경우 사용자의 빠른 피드백을 통해 제품 기능을 높이고
시장 출시 기간을 단축 할 수 있음.
• 또한 AI 학습의 특징상 지속적인 학습을 통하여
알고리즘이 변화할 수 있으므로, 한 번 허가받은
제품에서 알고리즘의 지속적인 변경을 허용할
경우 이를 평가할 새로운 기준이 필요함.
Digital Health Software Precertification (Pre-Cert) Program [Internet]. Silver Spring (MD): U.S. Food & Drug Administration; 2019
Szabo L. Artificial intelligence is rushing into patient care: and could raise risks [Internet]. New York (NY): Scientific American; 2019

8) Liability
Price, W. Nicholson, Sara Gerke, and I. Glenn Cohen. "Potential liability for physicians using artificial intelligence." Jama 322.18 (2019): 1765-1766.
• AI 기기들이 의료에 다양하게 적용이 되면 AI 오류에 따른 다양한 법적인 문제가 제기될 수 있음
 의사와 환자, AI 기기 도입 결정에 관여한 병원, AI기기 제조사, 의료사고 관련 보험사 등도 연관될 수 있는 복잡한 문제임.
• AI 기기가 최종 판단을 하는 경우 (e.g. IDx-DR): 기술적 오류에 의한 책임은 제조사에 있음.
• AI 기기가 확률적으로 추천을 하고 의사가 최종 판단을 하는 경우: 의사가 법적으로 문제가 되지 않으려면 일반적으로
통용되는 표준기준에 따라 의료행위를 했는지가 현재는 중요한 판단 기준임. Price WN 2nd, Gerke S, Cohen IG. Potential liability for
physicians using artificial intelligence. JAMA. 2019

AlphaFold2

Multi-task Learning Meta Learning
https://www.kakaobrain.com/blog/48
Multi-modal Learning
e.g.

Transformer
Self-supervised Learning Federated Learning
e.g.
Han, Kai, et al. "A Survey on Visual Transformer." arXiv preprint arXiv:2012.12556 (2020).

 Standford 의과대학에서 의사, 전공의 및 의과대학 학생을 대상으로 설문조사
- 의료의 새로운 기술에 대비하여 현재 받은 교육이 매우 도움됨 (학생, 전공의 18%)
- 미래 의료혁신에 더 잘 대비하기 위하여 추가적인 교육이 필요함 (학생 73%)
- 교육이 필요하다고 생각하는 분야: 고급 통계학 및 데이터 사이언스 (44%)
10) 의학교육 변화의 필요성
 미국 의학원(National Academy of Medicine), 미국의사협회 대의원회(House of
Delegates)와 영국 NHS 등에서 인공지능기술 발전에 따른 의료 변화에 대응하기 위해
교육내용을 제시 (7가지 중 4가지가 AI 관련)
1. 기계학습의 원리를 이해하여 기계학습이 제공하는 정보를 해석하고 이의 한계점을 인지하며
이를 효과적이며 비판적으로 환자에게 전달하는 통계학적 지식을 포함한 역량
2. 인공지능모델, 실시간 모니터링 등을 통해 얻은 많은 데이터를 기반으로 보다 정밀한 의료를 제공하는 역량
3. 다양한 의료현장에서 적절한 기계학습모델을 취사 선택하고 적용하는 역량
4. 인공지능모델을 포함한 의료시스템의 구성원들과 소통하고 이들 사이의 복잡성을 관리하며 협업하는 역량
Minor LB. The rise of the data-driven physician [Internet].
Stanford (CA): Stanford Medicine; 2020
Murphy B. AMA: take extra care when applying AI in medical education [Internet]. Chicago (IL): American Medical Association; 2019 NHS England.
 부산지역 2개 의과대학 대상으로 인공지능 교육의 필요성 설문조사: 필요하다 (응답자의 97% (149/153명)

10) 의학교육 변화의 방향 (1/2)
 데이터 사이언스 교육
: 의과대학 학부교육과정에서 기계학습에 능숙해지는 것이 목표가 아니라, 기계학습을 이해하고 활용하는 디지털 문해력(digital
-literacy)을 목표로 해야 함. 의과대학의 제한된 교육시간 등을 감안하면 기계학습의 알고리즘을 구체적으로 교육하는 것보다,
1. 의료에 활용되는 기계학습의 작동원리
2. 기계학습의 기초가 되는 데이터의 특성, 검색, 유의한 정보 추출원리
3. 기계학습이 제안하는 확률을 비판적으로 해석하고 환자에게 정확한 정보를 전달할 수 있는 지식
4. 인구집단건강(population health)과 질병예측, 위험도 평가, 관리 등에 정보기술의 활용
5. 기계학습의 의료에서의 활용범위를 이해하고 기술적, 윤리적, 법적 문제점 등에 초점을 맞추어 교육내용을 편성하고 실제
이미 개발된 인공지능 수단을 활용하는 실습을 통하여 향후 어떤 기준으로 인공지능기술을 의료에 사용할지, 사용할 때 유
의해야 할 점이 무엇인지를 파악할 수 있도록 교육해야 함.
 미국 의과대학(Boston, Harvard, Pittsburg, Standford 등)과 국내 (서울대, 연세대, 울산대 등)에는 선택과목으로 개설
된 사례가 보고됨.
 아직 인공지능 관련 교육이 체계적으로 이루어지지 못하는 이유 중의 하나는 의과대학의 교원만으로는 관련 내용을 교육할 전
문가가 부족하기 때문이며, 미국 의사협회는 체계적인 인공지능교육을 위하여 의과대학에 데이터 사이언스나 소프트웨어 전
문가를 영입할 것을 권고함.
Kolachalama VB, Garg PS. Machine learning and medical education. NPJ Digit Med. 2018;1:54.
Murphy B. AMA: take extra care when applying AI in medical education [Internet]. Chicago (IL): American Medical Association; 2019

10) 의학교육 변화의 방향 (2/2)
 졸업 후 교육
e.g. 영국에서는 디지털 펠로우쉽 과정을 개설하였으며 이 과정에서는 임상의사, 약사,
병원행정가, 물리치료사, 임상연구자, 간호사 등 의료분야 종사자 중 17명을 훈련대상자로
선발하여 평소 담당 업무와 함께 해당 기관의 디지털화 개선과 혁신을 위한 프로그램을
수행하고 관련 종사자를 교육하는 업무를 하는 데 필요한 교육을 실시함.
e.g. 미국에는 비영리법인이 설립되어 의사를 포함한 의료 관계 인력에 대해 의료 관련
AI교육과 시험을 통해 미국 의료인공지능사(American Board of Artificial Intelligence
in Medicine) 자격증을 부여하는 제도가 도입됨.
 다양한 임상환경에서 어떤 종류의 인공지능 알고리즘을 어떻게 적용할지를 판단하고, 실제 환자와 가족에게 관련 내용을
적절히 전달 및 토의하고, 인공지능 실제 적용 시의 유용성을 평가하고 발생할 수 있는 여러 가지 문제점을 파악하며 적절
히 해결할 수 있는 역량을 갖추어야 함. 이를 위해서는 졸업 후에도 이와 관련된 체계적인 교육이 필요함.
Matheny M, Israni ST, Auerbach A, Beam A, Bleicher P, Chapman W, et al. Artificial intelligence in health care:
the hope, the hype, the promise, the peril [Internet]. Washington (DC): National Academy of Medicine; 2019
NHS England. The Topol Review [Internet]. Leeds: NHS England
Boyko O, Chang A. American Board of Artificial Intelligence in Medicine (ABAIM) aims to educate and certify
healthcare professionals in AI, and related technologies [Internet]. Beltsville (MD): CISION PRWeb; 2020

Summary
1) Dataset Bias
2) Lack of Labeled Dataset
3) Uncertainty
5) Convergence of Human & AI
6) Safety
7) Model Product Usability
8) Liability
10) AI Education
→ Augmentation, Weekly-supervised Learning, Self-supervised Learning
→ 보안, 합성데이터, Adversarial Attack Strategies
→ Bayesian Deep Learning
→ Explainable AI (XAI) techniques
→ Controversy (ongoing)
→ e.g. Multi-task, Meta, Multi-modal,
Self-supervised, Federated Learning, Transformer
→ SaMD, Pre-certification (검토 필요)
→ 데이터 사이언스 교육 등
→ Prospective research, Multi-center / International Dataset

Python Packages
• numpy: 행렬 및 다차원 배열 처리
• matplotlib: 시각화
• pandas: 테이블 및 시계열 데이터 처리
• scikit-learn: 머신러닝 라이브러리
• tensorflow: 딥러닝 라이브러리

Google Colaboratory
개발 환경 병렬처리 리소스
클라우드 기반
데이터 저장소
: GPU, TPU 사용
(무료 12시간)
: 고용량 데이터
: 기본 Package 제공
• Tutorial 및 Prototype 개발에 적합한 환경

Google Colaboratory 경로
내 드라이브
> Workshop_210417-master
> Tutorial1
> Tutorial2
> Tutorial3
Tutorial1
> Tutorial 1-1. Python_numpy.ipynb
> Tutorial 1-2. pandas.ipynb
> Dataset
> COVID-19
> covid_19_data.csv
Tutorial2
> Tutorial 2-1. Heart Attack Prediction.ipynb
> Tutorial 2-2. Gene Expression Prediction.ipynb
> Dataset
> Heart_Disease
> heart_disease_dataset.csv
> mRNA
> cancerType-s-Categorical.csv
> microRNAScore-x.csv
> mRNA-y-poor-explained.csv
> mRNA-y-well-explained.csv
Tutorial3
> Tutorial 3-1. X-ray Classification.ipynb
> Tutorial 3-2. Spine Segmentation.ipynb
> Unet_upgrade.py
> Prediction_Check  빈 폴더
> Augmentation_Check  빈 폴더
> Model
> SpineCT_Unet_pretrained.h5
> Xray_Inceptv3_pretrained.h5
> Dataset
> CheXpert-v1.0-small
> train.csv
> test.csv
> train (sample)  zip 풀기
> test  zip 풀기
> Spineweb
> Sample
> X_train.npy
> X_test.npy
> y_train.npy
> y_test.npy

Workshop 210417 dhlee

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Workshop 210417 dhlee

Similar to Workshop 210417 dhlee (20)

More from Dongheon Lee

More from Dongheon Lee (9)

Workshop 210417 dhlee