Submit Search
Upload
20161215 python pandas-spark四方山話
•
7 likes
•
1,234 views
Ryuji Tamagawa
Follow
2016/12/15 インサイトテクノロジーさんの三木会でお話しした内容のスライドです。PythonとかPandasとかSparkとか。
Read less
Read more
Technology
Report
Share
Report
Share
1 of 26
Download now
Download to read offline
Recommended
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
Ryuji Tamagawa
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
Ryuji Tamagawa
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
Ryuji Tamagawa
20171012 found IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
20170210 sapporotechbar7
20170210 sapporotechbar7
Ryuji Tamagawa
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
Big Data Ecosystem after Spark
Big Data Ecosystem after Spark
bigdata trunk
Cpu analysis with flamegraphs
Cpu analysis with flamegraphs
Vinicius M Grippa
Recommended
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
Ryuji Tamagawa
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
Ryuji Tamagawa
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
Ryuji Tamagawa
20171012 found IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
20170210 sapporotechbar7
20170210 sapporotechbar7
Ryuji Tamagawa
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
Big Data Ecosystem after Spark
Big Data Ecosystem after Spark
bigdata trunk
Cpu analysis with flamegraphs
Cpu analysis with flamegraphs
Vinicius M Grippa
Beginner Apache Spark Presentation
Beginner Apache Spark Presentation
Nidhin Pattaniyil
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
Yoshiyasu SAEKI
Brug af Solr i IMPACT
Brug af Solr i IMPACT
IMPACT
Growing a Data Pipeline for Analytics
Growing a Data Pipeline for Analytics
Roberto Agostino Vitillo
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Karanjeet Singh
Денис Головняк - Продвинутый поиск с помощью Search API
Денис Головняк - Продвинутый поиск с помощью Search API
LEDC 2016
Final_show
Final_show
Nitay Alon
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
Yoshiyasu SAEKI
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
Jeremy Hanna
Introduing spark
Introduing spark
Taotao Li
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
台灣資料科學年會
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
Rafał Wojdyła
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
MongoDB
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービス
mosa siru
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Uwe Korn
Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
Holden Karau
Go, memcached, microservices
Go, memcached, microservices
mosa siru
Microsoft Azure + R
Microsoft Azure + R
Dmitry Petukhov
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng
Contributing to pandas (Korean)
Contributing to pandas (Korean)
Younggun Kim
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
Raj Singh
More Related Content
What's hot
Beginner Apache Spark Presentation
Beginner Apache Spark Presentation
Nidhin Pattaniyil
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
Yoshiyasu SAEKI
Brug af Solr i IMPACT
Brug af Solr i IMPACT
IMPACT
Growing a Data Pipeline for Analytics
Growing a Data Pipeline for Analytics
Roberto Agostino Vitillo
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Karanjeet Singh
Денис Головняк - Продвинутый поиск с помощью Search API
Денис Головняк - Продвинутый поиск с помощью Search API
LEDC 2016
Final_show
Final_show
Nitay Alon
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
Yoshiyasu SAEKI
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
Jeremy Hanna
Introduing spark
Introduing spark
Taotao Li
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
台灣資料科學年會
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
Rafał Wojdyła
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
MongoDB
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービス
mosa siru
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Uwe Korn
Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
Holden Karau
Go, memcached, microservices
Go, memcached, microservices
mosa siru
Microsoft Azure + R
Microsoft Azure + R
Dmitry Petukhov
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng
What's hot
(20)
Beginner Apache Spark Presentation
Beginner Apache Spark Presentation
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
Brug af Solr i IMPACT
Brug af Solr i IMPACT
Growing a Data Pipeline for Analytics
Growing a Data Pipeline for Analytics
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Денис Головняк - Продвинутый поиск с помощью Search API
Денис Головняк - Продвинутый поиск с помощью Search API
Final_show
Final_show
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
Introduing spark
Introduing spark
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービス
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
Go, memcached, microservices
Go, memcached, microservices
Microsoft Azure + R
Microsoft Azure + R
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Similar to 20161215 python pandas-spark四方山話
Contributing to pandas (Korean)
Contributing to pandas (Korean)
Younggun Kim
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
Raj Singh
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
Uwe Korn
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
Holden Karau
Apache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache Spark
Takuya UESHIN
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely chen
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Databricks
Spark7
Spark7
poovarasu maniandan
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
N Masahiro
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
Yoshiyasu SAEKI
Docker and Fluentd
Docker and Fluentd
N Masahiro
Hands on with Apache Spark
Hands on with Apache Spark
Dan Lynn
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Databricks
Big data beyond the JVM - DDTX 2018
Big data beyond the JVM - DDTX 2018
Holden Karau
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23
BIWUG
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
Uwe Korn
OSINT tools for security auditing with python
OSINT tools for security auditing with python
Jose Manuel Ortega Candel
Similar to 20161215 python pandas-spark四方山話
(20)
Contributing to pandas (Korean)
Contributing to pandas (Korean)
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
Apache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache Spark
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Spark7
Spark7
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
Docker and Fluentd
Docker and Fluentd
Hands on with Apache Spark
Hands on with Apache Spark
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Big data beyond the JVM - DDTX 2018
Big data beyond the JVM - DDTX 2018
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
OSINT tools for security auditing with python
OSINT tools for security auditing with python
More from Ryuji Tamagawa
hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
Ryuji Tamagawa
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
Ryuji Tamagawa
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
Ryuji Tamagawa
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Ryuji Tamagawa
Apache Sparkの紹介
Apache Sparkの紹介
Ryuji Tamagawa
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
Ryuji Tamagawa
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
Ryuji Tamagawa
Google Big Query
Google Big Query
Ryuji Tamagawa
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
Ryuji Tamagawa
You might be paying too much for BigQuery
You might be paying too much for BigQuery
Ryuji Tamagawa
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
Ryuji Tamagawa
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
Ryuji Tamagawa
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
Ryuji Tamagawa
Mongo dbを知ろう devlove関西
Mongo dbを知ろう devlove関西
Ryuji Tamagawa
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話
Ryuji Tamagawa
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島 mongodb
Ryuji Tamagawa
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalk
Ryuji Tamagawa
MongoDB tuning on AWS
MongoDB tuning on AWS
Ryuji Tamagawa
初めてのMongo db
初めてのMongo db
Ryuji Tamagawa
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
Ryuji Tamagawa
More from Ryuji Tamagawa
(20)
hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Apache Sparkの紹介
Apache Sparkの紹介
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
Google Big Query
Google Big Query
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
You might be paying too much for BigQuery
You might be paying too much for BigQuery
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
Mongo dbを知ろう devlove関西
Mongo dbを知ろう devlove関西
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島 mongodb
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalk
MongoDB tuning on AWS
MongoDB tuning on AWS
初めてのMongo db
初めてのMongo db
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
Recently uploaded
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
apidays
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Principled Technologies
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
The Digital Insurer
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Boston Institute of Analytics
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Recently uploaded
(20)
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
20161215 python pandas-spark四方山話
1.
Python, Pandas, Spark
2.0 Sky
2.
3.
• • Python 2000 (**) •
db tech showcase MongoDB • • FB: Ryuji Tamagawa • Twitter : tamagawa_ryuji
4.
5.
2017
6.
• Python Spark •
7.
• • Python /
Pandas • Spark 2.0
8.
Part 1 :
9.
• • • csv
10.
Python Pandas Python Jupyter Notebook Jenkins Spark
2.0
11.
• Spark API
RDD ~1.3 DataFrame / DataSet 1.4~ • DataFrame API RDD API Python Spark
12.
DataFrame • RDB / •
R Pandas Spark Spark R / Pandas Spark +
13.
Part 2 :
14.
CSV zip RDB Parquet Excel CSV Feather Spark Pandas / Spark
15.
• • CPU • • Pandas
read_csv zip CSV Pandas
16.
2 • CSV CPU Pandas
zip CSV CPU … • Parquet ! •
17.
: Parquet I/O • • Spark
Parquet • Python Parquet
18.
HDFS / S3 Parquet
Parquet
19.
SSD Parquet Parquet
20.
Parquet No No Yes HDD
21.
• • I/O Pandas •
Spark • DataFrame Pandas → Spark Spark → Pandas Pandas → Spark • Apache Arrow
22.
CPU ~2010 2010~ SSD CPU
23.
Apache Spark 2.0 •
1.x • 2.0 1.x • DataFrame API Python • databricks http://go.databricks.com/mastering-apache-spark-2.0 •
24.
Spark 2.0 • CPU •
CPU • SQL DataFrame • + SSD • CSV zip Pandas read_csv
25.
Python + Spark •
Python serialize • DataFrame API UDF UDF Scala/Java • http://www.slideshare.net/dragan10/performant-data-processing-with-pyspark-sparkr- and-dataframe-api Executor JVM DataFrame, Cached Python lambda items: items[0] == ‘abc’ transfer DataFrame, result transfer Driver
Download now