More Related Content Similar to Why i love Apache Spark? (20) More from Jean-Georges Perrin (20) Why i love Apache Spark?2. CONFIDENTIAL © 2019
Why I love Spark
(and all it does for IBM
products)
Jean Georges “JG" Perrin
February 11th 2019
v100
3. CONFIDENTIAL © 2019
JGP • Jean Georges Perrin
• @jgperrin
• Chapel Hill, NC
• I ! SW since 1983
• #Knowledge =
𝑓 ( ∑ (#SmallData, #BigData), #DataScience)
& #Software
• #IBMChampion x11 • #KeepLearning
• @ http://jgp.net
13. CONFIDENTIAL © 2019
An analytics operating system?
HardwareHardware
OS OS
Distributed OS
Analytics OS
Apps
{
14. CONFIDENTIAL © 2019
An analytics operating system?
HardwareHardware
OS OS
Distributed OS
Analytics OS
Apps
{
15. CONFIDENTIAL © 2019
An analytics operating system?
HardwareHardware
OS OS
Distributed OS
Analytics OS
Apps
{
16. CONFIDENTIAL © 2019
There are two kinds of data
scientists:
1) Those who can extrapolate
from incomplete data.
-The Internet
17. CONFIDENTIAL © 2019
Unified API
Data Science Data Engineering
InfoSphere
Information AnalyzerDb2 Event Store
Watson Knowledge
CatalogWatson Data Studio
DataStage Flow
Designer…
Watson Knowledge
Catalog
Cloud Private for Data
…
SparkBench
What kind of applications?
18. CONFIDENTIAL © 2018
DATA
Engineer
DATA
Scientist
Adapted from: https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
Develop, build, test, and operationalize
datastores and large-scale processing
systems.
DataOps is the new DevOps.
Clean, massage, and organize data.
Perform statistics and analysis to develop
insights, build models, and search for
innovative correlations.
Match architecture
with business needs.
Develop processes
for data modeling,
mining, and
pipelines.
Improve data
reliability and quality.
Prepare data for
predictive models.
Explore data to find
hidden gems and
patterns.
Tells stories to key
stakeholders.
19. CONFIDENTIAL © 2018
Adapted from: https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
DATA
Engineer
DATA
Scientist
SQL
20. CONFIDENTIAL © 2019
Difference between machine learning and AI:
If it is written in Python,
it’s probably machine learning
If it is written in PowerPoint,
it’s probably AI
-Curt Simon Harlinghausen
21. CONFIDENTIAL © 2019
IBM’s communities and CODAIT
• IBM’s investment is not limited to products
• CODAIT (formerly Spark Technology
Center)
• IBM Communities
22. CONFIDENTIAL © 2019
Key takeaways
• IBM contributed to building a new kind of Operating System.
• IBM builds its new generation of data products on this
Operating System.
• Share the love.
• Use Java.
23. CONFIDENTIAL © 2019
Going even further
Spark in Action (MEAP)
by Jean Georges Perrin (@jgperrin)
published by Manning
http://jgp.net/sia
sprkact-8D74
sprkact-2C72 ctwthink19
One two free books
40% off
24. CONFIDENTIAL © 2019
Links
• Apache Spark
• http://spark.apache.org
• Spark in Action, 2e
• http://jgp.net/sia
• IBM Products
• https://dataplatform.cloud.ibm.com/docs/content/catalog/overview-wkc.html
• https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.7.0/com.ibm.swg.im.iis.ds.fd.doc/topics/
t_config_spark.html
• https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.7.0/com.ibm.swg.im.iis.ia.administer.doc/topics/
t_spark_job.html
• https://dataplatform.cloud.ibm.com/docs/content/catalog/overview-wkc.html
• https://www.ibm.com/products/db2-event-store
• https://www.ibm.com/analytics/cloud-private-for-data
• https://developer.ibm.com/open/projects/spark-bench/, https://research.spec.org/fileadmin/user_upload/
documents/wg_bd/BD-20150401-spark_benchmark-v1.3-spec.pdf
• IBM Center for Open-Source Data & AI Technologies (Spark Technology Center)
• https://developer.ibm.com/code/open/centers/codait/about/