SlideShare a Scribd company logo
1 of 38
Download to read offline
PyData &
Apache Spark
2017 / 2 / 10
Sapporo TechBar #7
@
▸ facebook : Ryuji Tamagawa
▸ Twitter : tamagawa_ryuji
▸ FB
techbar
▸ FB
▸ Twitter
5


Python
PyData
Apache
Spark
Jupyter
Notebook
2017
and the
future
Pandas
PyData
1 / 5 : PyData
1 / 5 : PyData
PyData.org
1 / 5 : PyData
PyData
Anaconda Python
Blaze NumPy and pandas interface to Big Data'. dask
Bokeh
Canopy Python
IPython
matplotlib PyData
nose
numba JIT
NumPy PyData
Scipy PyData
Statsmodels
SymPy
pandas NumPy SciPy
scikit-image
scikit-learn PyData


pandas
2 / 5 : pandas
pandas
▸ NumPy SciPy 

▸ DataFrame
▸
2 / 5 : pandas
pandas 

Wes McKinney
2 / 5 : pandas
DataFrame
2 / 5 : pandas
2 / 5 : pandas
▸ 

Python
▸
▸ PyData pandas


Jupyter Notebook
3 /5 : Jupyter Notebook
IPython Notebook
▸ Jupyter Notebook
▸ Julia Python R
▸ JupyterCon
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
pandas / matplotlib
3 /5 : Jupyter Notebook
Interactive Widget
3 /5 : Jupyter Notebook
▸ Learning Jupyter
Apache Spark
4 / 5 : Apache Spark
Hadoop
▸ MapReduce Spark
▸ 2010 Hadoop = MapReduce + HDFS
▸ Hadoop
OS
HDFS
Hive e.t.c.
HBaseMapReduce
YARN
Impala
e.t.c in-
memory SQL
engine
Spark
Spark Streaming, MLlib,
GraphX, Spark SQL)
Hadoop
HDFS S3 

YARN Mesos 

/
4 / 5 : Apache Spark
Apache Spark PyData pandas
Apache Spark pandas
JVM Python
× dask
I/O
Scala Java Python R

JVM
Python
4 / 5 : Apache Spark
Spark
▸
▸
▸ 1 PC 

Hadoop / MapReduce
4 / 5 : Apache Spark
DataFrame
4 / 5 : Apache Spark
▸
▸ SSD
▸ Spark Parquet
▸ Performance comparison of different file formats
and storage engines in the Hadoop ecosystem
▸ Parquet Python
4 / 5 : Apache Spark
Apache Spark
▸
▸ Parquet
▸
▸
Machine Learning
Machine Learning
▸
▸ scikit-learn
▸ Spark MLlib / ML
▸
▸ TensorFlow
▸ Python
2017 and the future
5/5 : 2017 and the future
PyData
▸
▸ Spark - pandas
▸ pandas → Spark …
5/5 : 2017 and the future
Wes blog
▸ pandas Apache Arrow
▸ Blog
▸ PyData Blog


Wes OK
▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis

http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
5/5 : 2017 and the future
High speed Apache Parquet for Python
▸ Parquet
▸ Spark
▸ Python
▸ Fastparquet
▸ pyarrow
5/5 : 2017 and the future
: apache arrow
▸ apache arrow
▸ PyData / OSS
▸ /
20170210 sapporotechbar7

More Related Content

What's hot

Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Jeremy Hanna
 
Big Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopBig Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopIMC Institute
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stackAbhra Pal
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architectureSandeep Patil
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyFirman Gautama
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Thomas Vanhove
 
Papyri.info's Linked Data Story
Papyri.info's Linked Data StoryPapyri.info's Linked Data Story
Papyri.info's Linked Data StoryHugh Cayless
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseFujio Turner
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAmir Sedighi
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Datasamuel shamiri
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs HadoopFujio Turner
 

What's hot (20)

Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
 
Watch Your Log!
Watch Your Log!Watch Your Log!
Watch Your Log!
 
Big Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopBig Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop Workshop
 
Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stack
 
Mapreduce Tutorial
Mapreduce TutorialMapreduce Tutorial
Mapreduce Tutorial
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architecture
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop 101 v2
Hadoop 101 v2Hadoop 101 v2
Hadoop 101 v2
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Alluxio
AlluxioAlluxio
Alluxio
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Papyri.info's Linked Data Story
Papyri.info's Linked Data StoryPapyri.info's Linked Data Story
Papyri.info's Linked Data Story
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + Couchbase
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Data
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs Hadoop
 

Viewers also liked

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...MLconf
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks
 
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...Insight Technology, Inc.
 
PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)Kenta IDA
 
PYNQ祭りLT todotani
PYNQ祭りLT todotaniPYNQ祭りLT todotani
PYNQ祭りLT todotaniKenshi Kamiya
 
Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出marsee101
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
 
PYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミングPYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミングryos36
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
 
PYNQで○○してみた!
PYNQで○○してみた!PYNQで○○してみた!
PYNQで○○してみた!aster_ism
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learninghoondong kim
 
コンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめコンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめTakeshi HASEGAWA
 

Viewers also liked (15)

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
Pynq祭り資料
Pynq祭り資料Pynq祭り資料
Pynq祭り資料
 
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
 
PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)
 
PYNQ祭りLT todotani
PYNQ祭りLT todotaniPYNQ祭りLT todotani
PYNQ祭りLT todotani
 
Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
PYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミングPYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミング
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 
PYNQで○○してみた!
PYNQで○○してみた!PYNQで○○してみた!
PYNQで○○してみた!
 
PYNQ祭り
PYNQ祭りPYNQ祭り
PYNQ祭り
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learning
 
コンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめコンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめ
 

Similar to 20170210 sapporotechbar7

Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Spark Summit
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityWes McKinney
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Peadar Coyle
 
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...eNovance
 
Harry Potter & Apache iceberg format
Harry Potter & Apache iceberg formatHarry Potter & Apache iceberg format
Harry Potter & Apache iceberg formatTaras Fedorov
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Holden Karau
 
Hadoop at ayasdi
Hadoop at ayasdiHadoop at ayasdi
Hadoop at ayasdiMohit Jaggi
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosDiscover Pinterest
 
Python in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas LuPython in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas LuPYCON MY PLT
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySarah Guido
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...Deep Learning Italia
 
Contributing to pandas (Korean)
Contributing to pandas (Korean)Contributing to pandas (Korean)
Contributing to pandas (Korean)Younggun Kim
 
Making the big data ecosystem work together with python apache arrow, spark,...
Making the big data ecosystem work together with python  apache arrow, spark,...Making the big data ecosystem work together with python  apache arrow, spark,...
Making the big data ecosystem work together with python apache arrow, spark,...Holden Karau
 
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Holden Karau
 
Distributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right WayDistributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right Waymmasdeu
 
Lightning Fast Dataframes with Polars
Lightning Fast Dataframes with PolarsLightning Fast Dataframes with Polars
Lightning Fast Dataframes with PolarsAlberto Danese
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!Edureka!
 
Putting data science to work
Putting data science to workPutting data science to work
Putting data science to workAlex Breeze
 

Similar to 20170210 sapporotechbar7 (20)

Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.
 
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
 
Harry Potter & Apache iceberg format
Harry Potter & Apache iceberg formatHarry Potter & Apache iceberg format
Harry Potter & Apache iceberg format
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Hadoop at ayasdi
Hadoop at ayasdiHadoop at ayasdi
Hadoop at ayasdi
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
 
Python in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas LuPython in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas Lu
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
 
Contributing to pandas (Korean)
Contributing to pandas (Korean)Contributing to pandas (Korean)
Contributing to pandas (Korean)
 
Making the big data ecosystem work together with python apache arrow, spark,...
Making the big data ecosystem work together with python  apache arrow, spark,...Making the big data ecosystem work together with python  apache arrow, spark,...
Making the big data ecosystem work together with python apache arrow, spark,...
 
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
 
Distributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right WayDistributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right Way
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Lightning Fast Dataframes with Polars
Lightning Fast Dataframes with PolarsLightning Fast Dataframes with Polars
Lightning Fast Dataframes with Polars
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
 
Putting data science to work
Putting data science to workPutting data science to work
Putting data science to work
 

More from Ryuji Tamagawa

hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineeringhbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability EngineeringRyuji Tamagawa
 
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京Ryuji Tamagawa
 
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌Ryuji Tamagawa
 
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのsparkRyuji Tamagawa
 
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquetRyuji Tamagawa
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIRyuji Tamagawa
 
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える足を地に着け落ち着いて考える
足を地に着け落ち着いて考えるRyuji Tamagawa
 
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践Ryuji Tamagawa
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかRyuji Tamagawa
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Ryuji Tamagawa
 
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferencelessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferenceRyuji Tamagawa
 
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみましたRyuji Tamagawa
 
Mongo dbを知ろう devlove関西
Mongo dbを知ろう   devlove関西Mongo dbを知ろう   devlove関西
Mongo dbを知ろう devlove関西Ryuji Tamagawa
 
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話Ryuji Tamagawa
 
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島  mongodbデータベース勉強会 In 広島  mongodb
データベース勉強会 In 広島 mongodbRyuji Tamagawa
 
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalkInvitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalkRyuji Tamagawa
 

More from Ryuji Tamagawa (20)

hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineeringhbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
 
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
 
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
 
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
 
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
 
Apache Sparkの紹介
Apache Sparkの紹介Apache Sparkの紹介
Apache Sparkの紹介
 
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
 
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
 
Google Big Query
Google Big QueryGoogle Big Query
Google Big Query
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
 
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferencelessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
 
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
 
Mongo dbを知ろう devlove関西
Mongo dbを知ろう   devlove関西Mongo dbを知ろう   devlove関西
Mongo dbを知ろう devlove関西
 
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話
 
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島  mongodbデータベース勉強会 In 広島  mongodb
データベース勉強会 In 広島 mongodb
 
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalkInvitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalk
 
MongoDB tuning on AWS
MongoDB tuning on AWSMongoDB tuning on AWS
MongoDB tuning on AWS
 

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

20170210 sapporotechbar7

  • 1. PyData & Apache Spark 2017 / 2 / 10 Sapporo TechBar #7 @
  • 2. ▸ facebook : Ryuji Tamagawa ▸ Twitter : tamagawa_ryuji ▸ FB techbar ▸ FB ▸ Twitter
  • 3.
  • 4. 5
  • 5.
  • 8. 1 / 5 : PyData
  • 9. 1 / 5 : PyData PyData.org
  • 10. 1 / 5 : PyData PyData Anaconda Python Blaze NumPy and pandas interface to Big Data'. dask Bokeh Canopy Python IPython matplotlib PyData nose numba JIT NumPy PyData Scipy PyData Statsmodels SymPy pandas NumPy SciPy scikit-image scikit-learn PyData 

  • 12. 2 / 5 : pandas pandas ▸ NumPy SciPy 
 ▸ DataFrame ▸
  • 13. 2 / 5 : pandas pandas 
 Wes McKinney
  • 14. 2 / 5 : pandas DataFrame
  • 15. 2 / 5 : pandas
  • 16. 2 / 5 : pandas ▸ 
 Python ▸ ▸ PyData pandas 

  • 18. 3 /5 : Jupyter Notebook IPython Notebook ▸ Jupyter Notebook ▸ Julia Python R ▸ JupyterCon
  • 19. 3 /5 : Jupyter Notebook
  • 20. 3 /5 : Jupyter Notebook
  • 21. 3 /5 : Jupyter Notebook pandas / matplotlib
  • 22. 3 /5 : Jupyter Notebook Interactive Widget
  • 23. 3 /5 : Jupyter Notebook ▸ Learning Jupyter
  • 25. 4 / 5 : Apache Spark Hadoop ▸ MapReduce Spark ▸ 2010 Hadoop = MapReduce + HDFS ▸ Hadoop OS HDFS Hive e.t.c. HBaseMapReduce YARN Impala e.t.c in- memory SQL engine Spark Spark Streaming, MLlib, GraphX, Spark SQL) Hadoop HDFS S3 
 YARN Mesos 
 /
  • 26. 4 / 5 : Apache Spark Apache Spark PyData pandas Apache Spark pandas JVM Python × dask I/O Scala Java Python R
 JVM Python
  • 27. 4 / 5 : Apache Spark Spark ▸ ▸ ▸ 1 PC 
 Hadoop / MapReduce
  • 28. 4 / 5 : Apache Spark DataFrame
  • 29. 4 / 5 : Apache Spark ▸ ▸ SSD ▸ Spark Parquet ▸ Performance comparison of different file formats and storage engines in the Hadoop ecosystem ▸ Parquet Python
  • 30. 4 / 5 : Apache Spark Apache Spark ▸ ▸ Parquet ▸ ▸
  • 32. Machine Learning ▸ ▸ scikit-learn ▸ Spark MLlib / ML ▸ ▸ TensorFlow ▸ Python
  • 33. 2017 and the future
  • 34. 5/5 : 2017 and the future PyData ▸ ▸ Spark - pandas ▸ pandas → Spark …
  • 35. 5/5 : 2017 and the future Wes blog ▸ pandas Apache Arrow ▸ Blog ▸ PyData Blog 
 Wes OK ▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis
 http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
  • 36. 5/5 : 2017 and the future High speed Apache Parquet for Python ▸ Parquet ▸ Spark ▸ Python ▸ Fastparquet ▸ pyarrow
  • 37. 5/5 : 2017 and the future : apache arrow ▸ apache arrow ▸ PyData / OSS ▸ /