From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Pyconmini Hiroshima 2018
1. Introduction to Data Science in Python
(Visualization ver.)
廻船孝行 (KAISEN Takayuki)
PyCon mini Hiroshima 2018
Contact: ksn0215@gmail.com
Ver. 3.1
Repository: https://github.com/ksnt/pycon_hiro_2018
2. Overview
1. Geographic Data Analysis
2. Interactive Graph and Application
3. Discussion
Python 3.6.3
Pip 10.0.1
Development Environment
Ubuntu 17.10
3. ・ Injury due to an accident
Notice
・ Hard to pronounce some
words
・ Appreciate for
your cooperation in advance!
4. 0. WHO ARE YOU?
⚫ Favorite Languages
⚪ Python, R, Scala
⚫ Interests
⚪SNA, CSS, CMC, ABM,
Complex Systems,
Data Science, ...
⚫ Python Conference Attendance
⚪PyCon mini JP (2010?)
⚪PyConJP 2011
⚪Tokyo.Scipy (2011?)
⚫ Love Online Learning
⚫ Oct 2017 - HiroshimaFreeman, L. (2004). The development of social network analysis. A Study in the
Sociology of Science, 1.
1. Social network analysis is motivated by a structural in-
tuition based on ties linking social actors,
2. It is grounded in systematic empirical data,
3. It draws heavily on graphic imagery, and
4. It relies on the use of mathematical and/or computation-
al models.
5. Takeaways: All I am Talking to You
1. It is incredibly easy to make use of
geographic data with Python
(Folium)
2. It is incredibly easy to develop data
driven web application with Python
(Plotly and Dash)
6. 1. Geographic Data Analysis
2. Interactive Graph and Application
3. Discussion
7. How to use geographic data
GIS = Geographic Information System
“A geographic information system (GIS) is a system designed to capture,
store, manipulate, analyze, manage, and present spatial or geographic
data. “ Wikipedia
YOU DON’T HAVE TO USE GIS!
Reference: “Python Geospatial Development – Third Edition” Chapter2
8. How to make use of Geographic
data in Python
・ArcGIS, QGIS (PyQGIS)
・Geopandas
・Folium (Leaflet)
・Geopy
・And so forth…
(reference) https://qiita.com/aimof/items/b4e4551d27abaf5bb258
⚪ Do not need knowledge on GIS
⚪ Can easily create Web map
9. How to install folium
$ pip install folium
$ conda install -c conda-forge folium
or
12. Data & Visualization(3)
from folium.plugins import HeatMap
from IPython.display import HTML
import folium
map_detroit = folium.Map(location=(42.40,-83.01))
data = []
for i in range(len(df)):
data.append((df['LAT'][i],df['LON'][i]))
HeatMap(data,radius=9).add_to(map_detroit)
map_detroit.save('crimes.html')
HTML(r'<iframe width="800" height="500"
frameborder="0" allowfullscreen
src="./crimes.html"></iframe>')
https://nbviewer.jupyter.org/github/ksnt/Predictor-of-blights-in-
Detroit/blob/master/Final_Report_1.1.ipynb
13. Data & Visualization(4)
How to visualize geographical data with folium
1. Data cleaning
2. Create map
3. Put the data into the map!
(4. Save the map as a HTML file )
(5. Draw the HTML file )
HeatMap(data,radius=9).add_to(map_detroit)
14. 1. Geographic Data Analysis
2. Interactive Graph and Application
3. Discussion
16. How to Use Plotly and Dash
$ pip install plotly
$ pip install dash
$ python
>>> import plotly
>>> import dash
17. Interactive Graph For Scatter Plot
(Optional)
https://nbviewer.jupyter.org/gist/ksnt/340910aae39670202e4f790213e7afdc
18. Interactive Graph for Bar Plot
import pandas as pd
import plotly
import plotly.graph_objs as go
df2 = pd.read_excel('globalterrorismdb_0616dist.xlsx',header=0)
data = [go.Bar(
x=df2["country_txt"].value_counts()[:20].index,
y=df2["country_txt"].value_counts()[:20]
)]
layout = go.Layout(
title="Top 20 Frequency of Terrorism Incidents 1970 - 2015",
xaxis={"title":"Country"},
yaxis={"title":"Occurence of terrorism"},
)
fig = go.Figure(data=data, layout=layout) # Preparation of plot by Plotly
plotly.offline.iplot(fig, filename='basic-bar') # not online figure
You have to prepare for this data!
https://nbviewer.jupyter.org/gist/ksnt/eb8ac99dd69ecc5dc5774bf673977ceb
19. Interactive Graph for Time Series Plot
layout = plotly.graph_objs.Layout(
title="Occurence of Terrorism Incidents",
xaxis={"title":"Year"},
yaxis={"title":"Occurence of terrorism"},
)
iraq_incidents = df2[df2["country_txt"] == "Iraq"]
iraq_incidents_count = iraq_incidents['iyear'].value_counts()
iraq_incidents_count = iraq_incidents_count.sort_index()
iraq = go.Scatter(
x=iraq_incidents_count.index,
y=iraq_incidents_count,
name = "Iraq",
line = dict(color="black"),
Opacity = 0.8)
year = [i for i in range(1970,2016)]
data = [iraq,us,pakistan,india,afghanistan,colombia,peru,phil,el,uk,turkey,spain,sri,somalia,nigeria,algeria,
france,yemen,lebanon]
fig = plotly.graph_objs.Figure(data=data, layout=layout)
plotly.offline.iplot(fig, show_link=False,config={"displaylogo":False, "modeBarButtonsToRemove":
["sendDataToCloud"]})
https://nbviewer.jupyter.org/gist/ksnt/eb8ac99dd69ecc5dc5774bf673977ceb
20. Additional Example – MonteCarlo
Simulation (Optional)
https://nbviewer.jupyter.org/gist/ksnt/101a44cc21b0eb990f96dc1d640dbd42
21. Plotly Dash
“”” Dash is Shiny for Python “””
― Chris Parmer, Dash: Shiny for Python
https://youtu.be/5BAthiN0htc
“ Dash is a Python framework for building web application”
・ Flask
・ React.js
・ Ideal for building data visualization apps
23. Monte Carlo Simulator (2)
Krauth, W. (2006). Statistical mechanics: algorithms and computations (Vol. 13). OUP Oxford.
Covered by points
Num of all points →S(□) = 4
r=1 r=1
Num of point in the circle
→ S(○) = π
Num of all points
Num of point in the circle
S(○)
S(□)
=
4
π
Count up these points!
Points = {x,y}, x,y 〜 U(-1,1)
Useful article about Monte Carlo Simulation in Japanese is:
—モンテカルロ法の前線 サイコロを振って積分する方法 福島 孝治
https://www.smapip.is.tohoku.ac.jp/~smapip/2003/tutorial/presentation/koji-hukushima.pdf
Institution
26. Python vs R
as Data Visualization Tool
Speed Extensibility Price Packages/ Libraries
(for Data Analysis)
Python ○ ◎ Free ML
R △ △
OOP
(S3,S4,R5(>=2.12))
Free Statistics
Tableau ? △?
For EDA?
¥18000/year
¥51000/year
¥102000/year
¥0/year students
and teachers
?
Kibana, Superset, Redash, Metabase, Splunk, KNIME, Google Charts, etc…
Julia, Matlab, Scilab, Octave, etc...
28. 3 – ε. I am looking for new
opportunities!
・ Places: Anywhere
・ Salary & Benefits: Negotiable
・ Feel free to talk to me!
・ Like: International, Diverse, Python,
Data analysis
29. 1. Geographic Data Analysis
2. Interactive Graph and Application
3. Discussion