SlideShare a Scribd company logo
1 of 23
Download to read offline
19 대 국회 뽀개기
이홍주
Aug 14, 2016
Who am I?
● Research Engineer at fintech startup.
– Fraud Detection System based on Machine
Learning techniques.
● Previous Employment
– Software Center at LG Electronics
– IT headquarters at KT
– Daum Communications
– NURI Telecom
Aug 14, 2016
질문
● 내가 만든 Python 코드는 대게 일회용이다 .
● Python 또는 다른 언어로 ML 프로그래밍을 해봤다 .
● Numpy / Pandas 의 slicing, indexing 에 익숙하다 .
● lambda expression 을 많이 사용하고 있다 .
Aug 14, 2016
Machine Learning
● Algorithm → learn data → model → prediction
– “The ability to learn without being explicitly
programmed”. (Authur Lee Samuel)
– Set of algorithms that can learn from and
perform predictive analysis on data.
– Such algorithms operate by building a model
from an example training set of input
observations in order to make data-driven
predictions or decisions expressed as outputs,
rather than following strictly static program
instructions.
Aug 14, 2016
ML Process
● Featurization → Feature Vectors
– Analyze Data
● Understand the information available that will be used to
develop a model.
– Prepare Data
● Discover and expose the structure in the dataset.
● Train Model → Model
● Evaluate Model
– Develop a robust test harness and baseline accuracy from
which to improve and spot check algorithms.
● Improve Results
– Leverage results to develop more accurate models.
Aug 14, 2016
ML Process
이름 특징 (feature) 성별
철수 넥타이 , 안경 , 수염 , 대머리 남
영희 스커트 , 안경 , 핸드백 , 긴머리 여
근혜 스커트 , 핸드백 , 닭머리 여
두환 군복 , 대머리 , 안경 남
홍주 긴머리 , 수염 남
… … …
Feature Vectors
(1, 0, 0, 0, 1, 0, 1, 1, 0)
(0, 1, 0, 1, 1, 1, 0, 0, 0)
(0, 1, 0, 0, 0, 1, 0, 0, 1)
(0, 1, 1, 0, 1, 0, 1, 0, 0)
(0, 0, 0, 1, 0, 0, 0, 1, 0)
(…)
Data Feature vectors Model(featurizing) (training)
Aug 14, 2016
What we don’t
● !Crawling
– already done and ready
– Not in this scope
– Thanks to 참여연대 , 팀포퐁 , OhmyNews
● !Evaluation
– We are not focusing on results but the
process.
● !Improving
– Just one cycle with no turning back.
Aug 14, 2016
What we do
● Featurization
– Analyze / Prepare data
– Pandas
● Train a model
– Unsupervised Learning : K-Means Clustering
– Scikit Learn, PySpark
● Presentation
– Matplotlib, LightingViz
● Subject
– 19 대 국회 표결 결과 / 정치자금 사용내역 / 국회 회의록
Aug 14, 2016
19 대 국회 의안표결
● Featurization
● Train model
– K-Means Clustering
● Presentation
Aug 14, 2016
19 대 국회 의안표결
● Train model
– K-Means Clustering
Aug 14, 2016
19 대 국회 의안표결
● https://github.com/midnightradio/pycon-apac-2016
Aug 14, 2016
19 대 국회 정치자금 사용내역
● Featurization
– Normalization / Standardizing
– Skewness
– Outliers
● Train model
● Presentation
Aug 14, 2016
19 대 국회 정치자금 사용내역
● Featurization
– Normalization / Standardizing
● Distance computation in k-means weights
each dimension equally and hence care must
be taken to ensure that unit of dimension
shouldn’t distort relative nearness of
observations.
Aug 14, 2016
19 대 국회 정치자금 사용내역
● Featurization
– Skewness
● Leads bias
Aug 14, 2016
19 대 국회 정치자금 사용내역
● Featurization
– Outliers
● Weakness of centroid based clustering (k-
means)
outlier outlier
Aug 14, 2016
19 대 국회 정치자금 사용내역
● Featurization
– Outliers
● IQR (Inter Quartile Range)
● 2-5 Standard Deviation
– Chicken and Eggs
● MAD (Median Absolute Deviation)
– Christophe Leys (2014). Detecting outliers: Do
not use standard deviation around the mean, use
absolute deviation around the median. Journal of
Experimental Social Psychology
Aug 14, 2016
19 대 국회 정치자금 사용내역
● https://github.com/midnightradio/pycon-apac-2016
Aug 14, 2016
19 대 국회 회의록
● Featurization
– Preprocessing
● Sentence Segmentation
– Can skip to tokenziation depending on use case
● Tokenization
– Find individual words
● Lemmatization
– Find the base of words (stemming)
● Removing stop words
– “the”, “a”, “is”, “at”, etc
● Some other refinements are needed occasionally.
● Vectorization
– We take an array of floating point values
Aug 14, 2016
19 대 국회 회의록
● Featurization
– Text vectorization
● How can we compose a feature vector of numbers
representing a text document of words?
– Bag of Words
– TF-IDF
● Each word in a text would be a dimension.
● How big would that vector be?
– Would it be as big as vocabulary of an article?
– Would it be as big as vocabulary of “all” article?
– Why not all vocabulary in the language?
Aug 14, 2016
19 대 국회 회의록
● Featurization
– Word vectorization
● Vector representation of words in a latent
context
● Learn semantic relations of words in documents
● ex) “I ate ____ in Korea”
– with a context “eat” it can be DimSum, Spagetti, or
KimChi without considering the place Korea.
– with a context “Korea” it can be Seoul, K-pop, which
represents the place better without considering
context “eating”.
Aug 14, 2016
19 대 국회 회의록
● https://github.com/midnightradio/pycon-apac-2016
Aug 14, 2016
Ending...
Aug 14, 2016
Contact me!
● lee.hongjoo@yandex.com
● http://www.linkedin.com/in/hongjoo-lee

More Related Content

Viewers also liked

Easy contributable internationalization process with Sphinx @ PyCon APAC 2016
Easy contributable internationalization process with Sphinx @ PyCon APAC 2016Easy contributable internationalization process with Sphinx @ PyCon APAC 2016
Easy contributable internationalization process with Sphinx @ PyCon APAC 2016Takayuki Shimizukawa
 
PyCon APAC 2016 Regular Expression[A-Z]+
PyCon APAC 2016 Regular Expression[A-Z]+PyCon APAC 2016 Regular Expression[A-Z]+
PyCon APAC 2016 Regular Expression[A-Z]+Minji Yang
 
Automating Django Functional Tests Using Selenium on Cloud
Automating Django Functional Tests Using Selenium on CloudAutomating Django Functional Tests Using Selenium on Cloud
Automating Django Functional Tests Using Selenium on CloudJonghyun Park
 
메일플러그 기업보안메일 〈메일 아카이빙〉
메일플러그 기업보안메일 〈메일 아카이빙〉메일플러그 기업보안메일 〈메일 아카이빙〉
메일플러그 기업보안메일 〈메일 아카이빙〉MAILPLUG
 
TOROS: Python Framework for Recommender System
TOROS: Python Framework for Recommender SystemTOROS: Python Framework for Recommender System
TOROS: Python Framework for Recommender SystemKwangseob Kim
 
Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기
Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기
Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기Ki-Hwan Kim
 
[NEXT] Flask 로 Restful API 서버 만들기
[NEXT] Flask 로 Restful API 서버 만들기 [NEXT] Flask 로 Restful API 서버 만들기
[NEXT] Flask 로 Restful API 서버 만들기 YoungSu Son
 
[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안
[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안
[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안K data
 
[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819
[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819
[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819uEngine Solutions
 
파이썬을 배워야하는 이유 발표자료 - 김연수
파이썬을 배워야하는 이유 발표자료 - 김연수파이썬을 배워야하는 이유 발표자료 - 김연수
파이썬을 배워야하는 이유 발표자료 - 김연수Yeon Soo Kim
 
Python과 flask 입문(1)
Python과 flask 입문(1)Python과 flask 입문(1)
Python과 flask 입문(1)성천 이
 
PyCon 2015 - 업무에서 빠르게 활용하는 PyQt
PyCon 2015 - 업무에서 빠르게 활용하는 PyQtPyCon 2015 - 업무에서 빠르게 활용하는 PyQt
PyCon 2015 - 업무에서 빠르게 활용하는 PyQt덕규 임
 
파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)성일 한
 
파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)성일 한
 
파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)성일 한
 
React Native를 사용한
 초간단 커뮤니티 앱 제작
React Native를 사용한
 초간단 커뮤니티 앱 제작React Native를 사용한
 초간단 커뮤니티 앱 제작
React Native를 사용한
 초간단 커뮤니티 앱 제작Taegon Kim
 
Ionic으로 모바일앱 만들기 #1
Ionic으로 모바일앱 만들기 #1Ionic으로 모바일앱 만들기 #1
Ionic으로 모바일앱 만들기 #1성일 한
 
Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기Joonsung Lee
 

Viewers also liked (19)

Easy contributable internationalization process with Sphinx @ PyCon APAC 2016
Easy contributable internationalization process with Sphinx @ PyCon APAC 2016Easy contributable internationalization process with Sphinx @ PyCon APAC 2016
Easy contributable internationalization process with Sphinx @ PyCon APAC 2016
 
PyCon APAC 2016 Regular Expression[A-Z]+
PyCon APAC 2016 Regular Expression[A-Z]+PyCon APAC 2016 Regular Expression[A-Z]+
PyCon APAC 2016 Regular Expression[A-Z]+
 
Automating Django Functional Tests Using Selenium on Cloud
Automating Django Functional Tests Using Selenium on CloudAutomating Django Functional Tests Using Selenium on Cloud
Automating Django Functional Tests Using Selenium on Cloud
 
메일플러그 기업보안메일 〈메일 아카이빙〉
메일플러그 기업보안메일 〈메일 아카이빙〉메일플러그 기업보안메일 〈메일 아카이빙〉
메일플러그 기업보안메일 〈메일 아카이빙〉
 
TOROS: Python Framework for Recommender System
TOROS: Python Framework for Recommender SystemTOROS: Python Framework for Recommender System
TOROS: Python Framework for Recommender System
 
Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기
Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기
Python의 계산성능 향상을 위해 Fortran, C, CUDA-C, OpenCL-C 코드들과 연동하기
 
[NEXT] Flask 로 Restful API 서버 만들기
[NEXT] Flask 로 Restful API 서버 만들기 [NEXT] Flask 로 Restful API 서버 만들기
[NEXT] Flask 로 Restful API 서버 만들기
 
[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안
[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안
[2016 데이터 그랜드 컨퍼런스] 2 5(빅데이터). 유비원 비정형데이터 중심의 big data 활용방안
 
[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819
[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819
[9]ix.유엔진 사용자매뉴얼 gw리뉴얼_2차_110819
 
업무가 빨라지는 그룹웨어, '다우오피스'
업무가 빨라지는 그룹웨어, '다우오피스'업무가 빨라지는 그룹웨어, '다우오피스'
업무가 빨라지는 그룹웨어, '다우오피스'
 
파이썬을 배워야하는 이유 발표자료 - 김연수
파이썬을 배워야하는 이유 발표자료 - 김연수파이썬을 배워야하는 이유 발표자료 - 김연수
파이썬을 배워야하는 이유 발표자료 - 김연수
 
Python과 flask 입문(1)
Python과 flask 입문(1)Python과 flask 입문(1)
Python과 flask 입문(1)
 
PyCon 2015 - 업무에서 빠르게 활용하는 PyQt
PyCon 2015 - 업무에서 빠르게 활용하는 PyQtPyCon 2015 - 업무에서 빠르게 활용하는 PyQt
PyCon 2015 - 업무에서 빠르게 활용하는 PyQt
 
파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #3 (ABCD)
 
파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #4 (ABCD)
 
파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)
파이썬 플라스크로 배우는 웹프로그래밍 #2 (ABCD)
 
React Native를 사용한
 초간단 커뮤니티 앱 제작
React Native를 사용한
 초간단 커뮤니티 앱 제작React Native를 사용한
 초간단 커뮤니티 앱 제작
React Native를 사용한
 초간단 커뮤니티 앱 제작
 
Ionic으로 모바일앱 만들기 #1
Ionic으로 모바일앱 만들기 #1Ionic으로 모바일앱 만들기 #1
Ionic으로 모바일앱 만들기 #1
 
Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기
 

Similar to Python 으로 19대 국회 뽀개기 (PyCon APAC 2016)

Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Cole Napper: Orgnostic's people analytics and employee listening strategy to ...
Cole Napper: Orgnostic's people analytics and employee listening strategy to ...Cole Napper: Orgnostic's people analytics and employee listening strategy to ...
Cole Napper: Orgnostic's people analytics and employee listening strategy to ...Edunomica
 
Introduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonIntroduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonSubhendu Dey
 
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci....NET Conf UY
 
Datascope runs on python
Datascope runs on pythonDatascope runs on python
Datascope runs on pythonbo_p
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaEugene Yan Ziyou
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicRaúl Garreta
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learningTheodoros Vasiloudis
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
 
Datascope Runs on Python - Chipy February 2016
Datascope Runs on Python - Chipy February 2016Datascope Runs on Python - Chipy February 2016
Datascope Runs on Python - Chipy February 2016Brian Lange
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015NoSQLmatters
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaMichael Mior
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
 
Monitoria@reversim
Monitoria@reversimMonitoria@reversim
Monitoria@reversimSoluto
 
Building an Analytics CoE (Center of Excellence)
Building an Analytics CoE (Center of Excellence)Building an Analytics CoE (Center of Excellence)
Building an Analytics CoE (Center of Excellence)Rahul Saxena
 
Influence mapping Toolbox Presentation London 2015
Influence mapping Toolbox Presentation London 2015Influence mapping Toolbox Presentation London 2015
Influence mapping Toolbox Presentation London 2015Jun Julien Matsushita
 
Cole Napper: Are you ready for generative AI in people analytics?
Cole Napper: Are you ready for generative AI in people analytics?Cole Napper: Are you ready for generative AI in people analytics?
Cole Napper: Are you ready for generative AI in people analytics?Edunomica
 
xAPI: The Landscape
xAPI: The LandscapexAPI: The Landscape
xAPI: The LandscapeMegan Bowe
 
xAPI Camp - Learning Solutions
xAPI Camp - Learning SolutionsxAPI Camp - Learning Solutions
xAPI Camp - Learning SolutionsAaron Silvers
 

Similar to Python 으로 19대 국회 뽀개기 (PyCon APAC 2016) (20)

Analyze this
Analyze thisAnalyze this
Analyze this
 
Cole Napper: Orgnostic's people analytics and employee listening strategy to ...
Cole Napper: Orgnostic's people analytics and employee listening strategy to ...Cole Napper: Orgnostic's people analytics and employee listening strategy to ...
Cole Napper: Orgnostic's people analytics and employee listening strategy to ...
 
Introduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonIntroduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM Watson
 
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
 
Datascope runs on python
Datascope runs on pythonDatascope runs on python
Datascope runs on python
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
Datascope Runs on Python - Chipy February 2016
Datascope Runs on Python - Chipy February 2016Datascope Runs on Python - Chipy February 2016
Datascope Runs on Python - Chipy February 2016
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academia
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
Monitoria@reversim
Monitoria@reversimMonitoria@reversim
Monitoria@reversim
 
Building an Analytics CoE (Center of Excellence)
Building an Analytics CoE (Center of Excellence)Building an Analytics CoE (Center of Excellence)
Building an Analytics CoE (Center of Excellence)
 
Influence mapping Toolbox Presentation London 2015
Influence mapping Toolbox Presentation London 2015Influence mapping Toolbox Presentation London 2015
Influence mapping Toolbox Presentation London 2015
 
Cole Napper: Are you ready for generative AI in people analytics?
Cole Napper: Are you ready for generative AI in people analytics?Cole Napper: Are you ready for generative AI in people analytics?
Cole Napper: Are you ready for generative AI in people analytics?
 
xAPI: The Landscape
xAPI: The LandscapexAPI: The Landscape
xAPI: The Landscape
 
xAPI Camp - Learning Solutions
xAPI Camp - Learning SolutionsxAPI Camp - Learning Solutions
xAPI Camp - Learning Solutions
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Python 으로 19대 국회 뽀개기 (PyCon APAC 2016)

  • 1. 19 대 국회 뽀개기 이홍주
  • 2. Aug 14, 2016 Who am I? ● Research Engineer at fintech startup. – Fraud Detection System based on Machine Learning techniques. ● Previous Employment – Software Center at LG Electronics – IT headquarters at KT – Daum Communications – NURI Telecom
  • 3. Aug 14, 2016 질문 ● 내가 만든 Python 코드는 대게 일회용이다 . ● Python 또는 다른 언어로 ML 프로그래밍을 해봤다 . ● Numpy / Pandas 의 slicing, indexing 에 익숙하다 . ● lambda expression 을 많이 사용하고 있다 .
  • 4. Aug 14, 2016 Machine Learning ● Algorithm → learn data → model → prediction – “The ability to learn without being explicitly programmed”. (Authur Lee Samuel) – Set of algorithms that can learn from and perform predictive analysis on data. – Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.
  • 5. Aug 14, 2016 ML Process ● Featurization → Feature Vectors – Analyze Data ● Understand the information available that will be used to develop a model. – Prepare Data ● Discover and expose the structure in the dataset. ● Train Model → Model ● Evaluate Model – Develop a robust test harness and baseline accuracy from which to improve and spot check algorithms. ● Improve Results – Leverage results to develop more accurate models.
  • 6. Aug 14, 2016 ML Process 이름 특징 (feature) 성별 철수 넥타이 , 안경 , 수염 , 대머리 남 영희 스커트 , 안경 , 핸드백 , 긴머리 여 근혜 스커트 , 핸드백 , 닭머리 여 두환 군복 , 대머리 , 안경 남 홍주 긴머리 , 수염 남 … … … Feature Vectors (1, 0, 0, 0, 1, 0, 1, 1, 0) (0, 1, 0, 1, 1, 1, 0, 0, 0) (0, 1, 0, 0, 0, 1, 0, 0, 1) (0, 1, 1, 0, 1, 0, 1, 0, 0) (0, 0, 0, 1, 0, 0, 0, 1, 0) (…) Data Feature vectors Model(featurizing) (training)
  • 7. Aug 14, 2016 What we don’t ● !Crawling – already done and ready – Not in this scope – Thanks to 참여연대 , 팀포퐁 , OhmyNews ● !Evaluation – We are not focusing on results but the process. ● !Improving – Just one cycle with no turning back.
  • 8. Aug 14, 2016 What we do ● Featurization – Analyze / Prepare data – Pandas ● Train a model – Unsupervised Learning : K-Means Clustering – Scikit Learn, PySpark ● Presentation – Matplotlib, LightingViz ● Subject – 19 대 국회 표결 결과 / 정치자금 사용내역 / 국회 회의록
  • 9. Aug 14, 2016 19 대 국회 의안표결 ● Featurization ● Train model – K-Means Clustering ● Presentation
  • 10. Aug 14, 2016 19 대 국회 의안표결 ● Train model – K-Means Clustering
  • 11. Aug 14, 2016 19 대 국회 의안표결 ● https://github.com/midnightradio/pycon-apac-2016
  • 12. Aug 14, 2016 19 대 국회 정치자금 사용내역 ● Featurization – Normalization / Standardizing – Skewness – Outliers ● Train model ● Presentation
  • 13. Aug 14, 2016 19 대 국회 정치자금 사용내역 ● Featurization – Normalization / Standardizing ● Distance computation in k-means weights each dimension equally and hence care must be taken to ensure that unit of dimension shouldn’t distort relative nearness of observations.
  • 14. Aug 14, 2016 19 대 국회 정치자금 사용내역 ● Featurization – Skewness ● Leads bias
  • 15. Aug 14, 2016 19 대 국회 정치자금 사용내역 ● Featurization – Outliers ● Weakness of centroid based clustering (k- means) outlier outlier
  • 16. Aug 14, 2016 19 대 국회 정치자금 사용내역 ● Featurization – Outliers ● IQR (Inter Quartile Range) ● 2-5 Standard Deviation – Chicken and Eggs ● MAD (Median Absolute Deviation) – Christophe Leys (2014). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology
  • 17. Aug 14, 2016 19 대 국회 정치자금 사용내역 ● https://github.com/midnightradio/pycon-apac-2016
  • 18. Aug 14, 2016 19 대 국회 회의록 ● Featurization – Preprocessing ● Sentence Segmentation – Can skip to tokenziation depending on use case ● Tokenization – Find individual words ● Lemmatization – Find the base of words (stemming) ● Removing stop words – “the”, “a”, “is”, “at”, etc ● Some other refinements are needed occasionally. ● Vectorization – We take an array of floating point values
  • 19. Aug 14, 2016 19 대 국회 회의록 ● Featurization – Text vectorization ● How can we compose a feature vector of numbers representing a text document of words? – Bag of Words – TF-IDF ● Each word in a text would be a dimension. ● How big would that vector be? – Would it be as big as vocabulary of an article? – Would it be as big as vocabulary of “all” article? – Why not all vocabulary in the language?
  • 20. Aug 14, 2016 19 대 국회 회의록 ● Featurization – Word vectorization ● Vector representation of words in a latent context ● Learn semantic relations of words in documents ● ex) “I ate ____ in Korea” – with a context “eat” it can be DimSum, Spagetti, or KimChi without considering the place Korea. – with a context “Korea” it can be Seoul, K-pop, which represents the place better without considering context “eating”.
  • 21. Aug 14, 2016 19 대 국회 회의록 ● https://github.com/midnightradio/pycon-apac-2016
  • 23. Aug 14, 2016 Contact me! ● lee.hongjoo@yandex.com ● http://www.linkedin.com/in/hongjoo-lee