SlideShare a Scribd company logo
1 of 30
Download to read offline
}  8:15 arrive, network, register for tutorial and camp
}  8:50-10:50 Tutorial: Introduction to R for Machine
Learning
}  11:00 Camp Kickoff
}  Sponsors: ACM SIGKDD, PayPal, UCSC
}  11:25 Keynote: Spark for Data Science, Big & Small
}  12:25 Propose Sessions
Ask for a “show of hands for interest” à Room Size
}  1:15 Lunch, post Session Matrix
}  2:00 Session 1
: (50 min for session, 10 min break)
}  5:00 Session 4
}  6:00 Session Summary
◦  8:50 – 10:50am by
–  Joseph Rickert (Program Manager, Microsoft)
–  Robert Horton (Data Scientist, Microsoft)
◦  Rapid introduction to the R language – in
depth enough to build machine learning
models
–  RandomForest, kernlab, caret
◦  Exploratory analysis, visualize, clustering,
classification
◦  How to find R help and additional resources
◦  Big data capabilities of Microsoft’s RRE
distribution of R
Morning Tutorial Starts Now
An ACM SF Bay Area Professional Chapter Event
Saturday, October 24, 2015
SFbayACM.org/event/silicon-valley-data-science-camp-2015
WiFi: conference Password: (none)
Twitter Tag #DSCAMP
Association of Computing Machinery (ACM)
◦  Principal technical, educational, scientific society for
computing professionals world-wide
–  Chapter representing SF Bay Area since 1957
◦  Membership/volunteer led, local dues only $20/yr
◦  Members get discounts with publishers, conferences
◦  Produces monthly free meetings
–  3rd Wed on General Computing topics
–  4th Mon on Data Science
◦  Details at www.SFbayACM.org
–  Suggest, Volunteer, Donate: humphrey@SFBayACM.org
}  10 Year Anniversary of Data Science SIG
}  Monday night, November 30 at ebay, San Jose
◦  Online Controlled Experiments: Lessons from Running
A/B/n Tests for 12 Years
◦  Ronny Kohavi, Distinguished Engineer & General
Manager, Analysis & Experimentation, Microsoft
}  Scala Professional Development Seminar
◦  Date: Sat, Nov 7, 8am-5pm
◦  Location: PayPal Town Hall (here)
◦  Speaker: Cay Horstmann, Computer Science,
San Jose State University
◦  Author of “Scala for the Impatient”
◦  Interactive crash course into this language
◦  Bring your laptop (w/ Scala pre-loaded)
◦  Presentation / lab format
Q) What is Scala?
A) Object Oriented Meets Functional
http://www.scala-lang.org/
}  How many have been to an un-conference?
}  Goals and context of the un-conference
◦  Informal
◦  Share enthusiasm, curiosity, knowledge, questions
◦  Participate, make it happen!
◦  Share responsibility (i.e. leave session room after 50 min)
◦  Encourage session note takers to blog & share at end
◦  http://www.campsite.org/list/733
◦  Respect others – questions & brainstorms are “safe”
◦  Have FUN!
Twitter Tag #DSCAMP
◦  Greg Makowski – DS SIG & Conference Chair
◦  Bill Bruns – SF bay ACM Chair
◦  Stephen McInerney – DS SIG
◦  Steve Lazarus – web registration
◦  Seeking replacement before retirement
◦  Greg Weinstein - general
◦  Liana Ye – volunteers, food, registration
◦  Liz Fraley – ACM Treasurer
Bill
Liana
Greg W
Liz
Steve
Greg M
Stephen
}  8:15 arrive, network, register for tutorial and camp
}  8:50-10:50 Tutorial: Introduction to R for Machine
Learning
}  11:00 Camp Kickoff
}  Sponsors: ACM SIGKDD, PayPal, UCSC
}  11:25 Keynote: Spark for Data Science, Big & Small
}  12:25 Propose Sessions
Ask for a “show of hands for interest” à Room Size
}  1:15 Lunch, post Session Matrix
}  2:00 Session 1
: (50 min for session, 10 min break)
}  5:00 Session 4
}  6:00 Session Summary
}  SIGKDD: ACM SIG on Knowledge Discovery
and Data Mining.
◦  Home of data miners, data scientists, and analytics
professionals
}  KDD: the premier conference of the field
◦  Research Track, Industry/Government Track, Industry
Practice Expo, Tutorials, Workshops, Invited Talks,
Panels, KDD Cups
Expect 2,000 – 2,500
attendees
KDD Cup competition
has been going since
2009
}  General Chairs
}  Program Committee Chairs
}  Industry Chairs
Balaji
Krishnapuram
(IBM)
Mohak Shah
(Bosch, USA)
Alex Smola
(CMU)
Charu Aggarwal
(IBM)
Rajeev
Rastogi
(Amazon)
Dou Shen
(Baidu)
Shipeng Yu
Associate GC
David Hazel, Derek
Young
Web Chairs
Ron Bekkerman
Social Network Chair
Romer Rosales
Proceedings Chair
Hanghang Tong, Vishy Vishwanathan
Tutorials Chairs
Andrei Broder
Panels Chair
Quoc Le, Zhi-Hua
Zhou
Workshops Chairs
Shou-De Lin
KDD Cup co- chair
Gabor Melli, Ankur Teredesai
Media & Publicity Chairs
Ying Li
Treasurer
Joaquin Quinonero Candela, Olivier Chapelle
Local Arrangements Chairs
Sofus Macskassy
Student Travel Awards
Chair
2505 Augustine Drive, Santa Clara, CA 95054 

(near Freeway 101 off Great American Parkway)
http://www.ucsc-extension.edu/
◦  UCSC Extension offers professional technology
courses for software, hardware, IT and Web
professionals. Over 100 courses are available for
enrollment each quarter.
◦  Has a certificate program on “Database and Data
Analytics” is the fastest growing certificate in UCSC
Extension. Courses cover big data, data science and
database applications.


Annual Sponsor
Thank PayPal for use of the location
Soren Archibald
www.KDnuggets.com
A primary hub for data mining
Co-marketing sponsor
Gregory Piatetsky-Shapiro
STRONG FOUNDATION STRONG MOMENTUM
169 Million
Active Customer Accounts
$8 Billion
Revenue
4 Billion
Payment Transactions
+19 Million
Active Customer Accounts Gained in 2014
+17%
Total Revenue Growth YoY
+24%
Payment Transactions Growth YoY
$235 Billion
Total Payment Volume
+25%
Total Payment Volume Growth YoY
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
KEY ENABLER
OF OUR
BUSINESS
SUPPORTS THE
PAYPAL BRAND
PROMISE
MAKES PAYPAL
UNIQUE
19
Invest in Growth & Innovation
Improve Experience & Increase
Revenue Simultaneously
Lowest Loss Rates
Secure
Customer Champion
Simple
Onboard Underserved Merchants
New Markets,
Multiple Funding Types
Enroll Users Easily
Ongoing Innovation
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Strong Foundation
Strong Front Door
11.5 MILLION PAYMENTS
processed daily by PayPal
Next-level encryption on every
PayPal transaction
PayPal never shares financial
information with merchants
PayPal always verifies a person’s
identity for payments
24/7 data analytics combined with
human oversight to accurately and
quickly spot suspicious activity
Constant innovation to advance
our machine learning/data mining
techniques
Seller and buyer protection offered
for eligible transactions
Security & Fraud Services
Consistently ranked among the top in consumer trust & security
20
Financial Information
Consumer Privacy
Consumers Trust
PayPal to Help Protect
Their Information
% of consumers who trust these companies to
protect their financial data and private
information such as passwords or birthday
Javelin Strategy & Research: Gang of Five: Apple,
Google, Amazon, Facebook, and PayPal-eBay:
Threat of the Mobile Wallet Disruptors, 2013.
1%
1%
4%
3%
4%
4%
4%
4%
4%
4%
6%
6%
10%
7%
8%
7%
10%
10%
10%
8%
12%
13%
14%
14%
15%
15%
16%
15%
17%
17%
18%
21%
28%
29%
34%
34% Industry Engagement
Founding member
of the FIDO alliance
PayPal chairs the DMARC
initiative to reduce phishing
attacks against all Internet users
PayPal has been doing
tokenization for 15+ years,
securely storing customers’
financial information in the
cloud.
}  Joseph Bradley is a Spark Committer
working on MLlib at DataBricks
}  Ph.D. in Machine Learning from Carnegie
Mellon University in 2013
}  Spark allows fast, iterative analysis on laptop & cluster
}  Spark DataFrames, allow manipulation of an API inspired
by R & Python Pandas
}  ML Pipelines facilitate ML workflows and model tuning
}  Spark R provides an API for R users to work with
distributed data
}  Initial PMML support to export models to other tools
Keynote Starts Now
}  8:15 arrive, network, register for tutorial and camp
}  8:50-10:50 Tutorial: Introduction to R for Machine
Learning
}  11:00 Camp Kickoff
}  Sponsors: ACM SIGKDD, PayPal, UCSC
}  11:25 Keynote: Spark for Data Science, Big & Small
}  12:25 Propose Sessions
Ask for a “show of hands for interest” à Room Size
}  1:15 Lunch, post Session Matrix
}  2:00 Session 1
: (50 min for session, 10 min break)
}  5:00 Session 4
}  6:00 Session Summary
WiFi: conference Password: (none)
Town Square
A
Main auditorium
Largest sessions
Summary session
Town Square
C
Coffee
Food
Sponsors
bathrooms
Entrance
Registration
Join
ACM
Courtyard
Eat Lunch
Fireside
A
Fireside
B
Fireside
C
Fireside
D
Powwow
Talk Soup
Stairs
WiFi: conference Password: (none) www.SFbayACM.org
WiFi: conference Password: (none) www.SFbayACM.org
}  Write a topic on a sheet of paper
◦  Facilitators name
}  60 seconds per suggestion!
◦  Ask for people to show hands for interest, count
◦  Ask for a time keeper (50 minutes for a session)
◦  Ask for a blogger, note taker or person to report
◦  http://www.campsite.org/list/733
}  Based on interest amount, pick a session
location and one of the 4 time frames
}  Pick what to attend per session:
◦  2:00 3:00 4:00 5:00
WiFi: conference Password: (none)
Twitter Tag #DSCAMP
Session Proposals Start Now
Concurrent
Sessions 1-3
for the Camp
Concurrent
Sessions 4-6
for the Camp

More Related Content

Viewers also liked

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)Greg Makowski
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysisGreg Makowski
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Greg Makowski
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsGreg Makowski
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectGreg Makowski
 
360-Degree Leadership
360-Degree Leadership360-Degree Leadership
360-Degree LeadershipChuck Terrell
 
Microsoft Power BI and Cortana Analytics user group meetings with Alteryx
Microsoft Power BI and Cortana Analytics user group meetings with AlteryxMicrosoft Power BI and Cortana Analytics user group meetings with Alteryx
Microsoft Power BI and Cortana Analytics user group meetings with AlteryxHåkan Söderbom
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Edureka!
 
Cluster analysis for market segmentation
Cluster analysis for market segmentationCluster analysis for market segmentation
Cluster analysis for market segmentationVishal Tandel
 

Viewers also liked (17)

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysis
 
Social media strategy
Social media strategySocial media strategy
Social media strategy
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot Project
 
360-Degree Leadership
360-Degree Leadership360-Degree Leadership
360-Degree Leadership
 
Microsoft Power BI and Cortana Analytics user group meetings with Alteryx
Microsoft Power BI and Cortana Analytics user group meetings with AlteryxMicrosoft Power BI and Cortana Analytics user group meetings with Alteryx
Microsoft Power BI and Cortana Analytics user group meetings with Alteryx
 
360 Degree Leader - Ayub Jake Salik
360 Degree Leader - Ayub Jake Salik360 Degree Leader - Ayub Jake Salik
360 Degree Leader - Ayub Jake Salik
 
360 Degree Leadership
360 Degree Leadership360 Degree Leadership
360 Degree Leadership
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Cluster analysis for market segmentation
Cluster analysis for market segmentationCluster analysis for market segmentation
Cluster analysis for market segmentation
 

Similar to Spark for Data Science Camp

8base Hyperledger Miami Meetup Presentation
8base Hyperledger Miami Meetup Presentation8base Hyperledger Miami Meetup Presentation
8base Hyperledger Miami Meetup Presentation8base
 
DEV Meet-Up Q2 2022 Amsterdam Slides.pdf
DEV Meet-Up Q2 2022 Amsterdam Slides.pdfDEV Meet-Up Q2 2022 Amsterdam Slides.pdf
DEV Meet-Up Q2 2022 Amsterdam Slides.pdfCristina Vidu
 
Collaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochureCollaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochureAndrews Raj
 
Collaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochureCollaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochureAndrews Raj
 
Collaborate 16 oaug forum brochure
Collaborate 16  oaug forum brochureCollaborate 16  oaug forum brochure
Collaborate 16 oaug forum brochureAndrews Raj
 
OOW 2016 Slides
OOW 2016 SlidesOOW 2016 Slides
OOW 2016 SlidesRob Gregg
 
IBM and OpenStack: Collaboration Beyond the Code
IBM and OpenStack: Collaboration Beyond the CodeIBM and OpenStack: Collaboration Beyond the Code
IBM and OpenStack: Collaboration Beyond the CodeDaniel Krook
 
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...SAP OEM
 
Microsoft SQL Server PASS News July 2010
Microsoft SQL Server PASS News July 2010Microsoft SQL Server PASS News July 2010
Microsoft SQL Server PASS News July 2010Mark Ginnebaugh
 
Microsoft Teams and Planner Global Azure Bootcamp
Microsoft Teams and Planner Global Azure BootcampMicrosoft Teams and Planner Global Azure Bootcamp
Microsoft Teams and Planner Global Azure BootcampHeather Newman
 
Alpha Five: Rapid Application Development System
Alpha Five: Rapid Application Development System Alpha Five: Rapid Application Development System
Alpha Five: Rapid Application Development System TechSoup
 
Chapter Deck 2010 April
Chapter Deck 2010 AprilChapter Deck 2010 April
Chapter Deck 2010 AprilfwPASS
 
TechSoup Tour: How to Access Donations, Discounts, and Services
TechSoup Tour: How to Access Donations, Discounts, and ServicesTechSoup Tour: How to Access Donations, Discounts, and Services
TechSoup Tour: How to Access Donations, Discounts, and ServicesTechSoup
 
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...SAP OEM
 
Chapter deck 2010-may
Chapter deck 2010-mayChapter deck 2010-may
Chapter deck 2010-mayfwPASS
 
Oow soa governance v1 0
Oow   soa governance v1 0Oow   soa governance v1 0
Oow soa governance v1 0saalbers
 

Similar to Spark for Data Science Camp (20)

8base Hyperledger Miami Meetup Presentation
8base Hyperledger Miami Meetup Presentation8base Hyperledger Miami Meetup Presentation
8base Hyperledger Miami Meetup Presentation
 
DEV Meet-Up Q2 2022 Amsterdam Slides.pdf
DEV Meet-Up Q2 2022 Amsterdam Slides.pdfDEV Meet-Up Q2 2022 Amsterdam Slides.pdf
DEV Meet-Up Q2 2022 Amsterdam Slides.pdf
 
SITHYD 2014 Overview
SITHYD 2014 OverviewSITHYD 2014 Overview
SITHYD 2014 Overview
 
Collaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochureCollaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochure
 
Collaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochureCollaborate 16 oaug forum brochure
Collaborate 16 oaug forum brochure
 
Collaborate 16 oaug forum brochure
Collaborate 16  oaug forum brochureCollaborate 16  oaug forum brochure
Collaborate 16 oaug forum brochure
 
OOW 2016 Slides
OOW 2016 SlidesOOW 2016 Slides
OOW 2016 Slides
 
2011 Summer Conference Brochure
2011 Summer Conference Brochure2011 Summer Conference Brochure
2011 Summer Conference Brochure
 
IBM and OpenStack: Collaboration Beyond the Code
IBM and OpenStack: Collaboration Beyond the CodeIBM and OpenStack: Collaboration Beyond the Code
IBM and OpenStack: Collaboration Beyond the Code
 
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Internet of Thi...
 
Pre Briefing Pr
Pre Briefing PrPre Briefing Pr
Pre Briefing Pr
 
ODSA Workshop
ODSA WorkshopODSA Workshop
ODSA Workshop
 
Microsoft SQL Server PASS News July 2010
Microsoft SQL Server PASS News July 2010Microsoft SQL Server PASS News July 2010
Microsoft SQL Server PASS News July 2010
 
Microsoft Teams and Planner Global Azure Bootcamp
Microsoft Teams and Planner Global Azure BootcampMicrosoft Teams and Planner Global Azure Bootcamp
Microsoft Teams and Planner Global Azure Bootcamp
 
Alpha Five: Rapid Application Development System
Alpha Five: Rapid Application Development System Alpha Five: Rapid Application Development System
Alpha Five: Rapid Application Development System
 
Chapter Deck 2010 April
Chapter Deck 2010 AprilChapter Deck 2010 April
Chapter Deck 2010 April
 
TechSoup Tour: How to Access Donations, Discounts, and Services
TechSoup Tour: How to Access Donations, Discounts, and ServicesTechSoup Tour: How to Access Donations, Discounts, and Services
TechSoup Tour: How to Access Donations, Discounts, and Services
 
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...
SAP TechED 2015- Las Vegas. OEM Partners – Innovation Agenda: Database & Tech...
 
Chapter deck 2010-may
Chapter deck 2010-mayChapter deck 2010-may
Chapter deck 2010-may
 
Oow soa governance v1 0
Oow   soa governance v1 0Oow   soa governance v1 0
Oow soa governance v1 0
 

More from Greg Makowski

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxGreg Makowski
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxGreg Makowski
 
A Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsA Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsGreg Makowski
 
Kdd 2019: Standardizing Data Science to Help Hiring
Kdd 2019:  Standardizing Data Science to Help HiringKdd 2019:  Standardizing Data Science to Help Hiring
Kdd 2019: Standardizing Data Science to Help HiringGreg Makowski
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareGreg Makowski
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 

More from Greg Makowski (6)

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
A Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsA Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data Scientists
 
Kdd 2019: Standardizing Data Science to Help Hiring
Kdd 2019:  Standardizing Data Science to Help HiringKdd 2019:  Standardizing Data Science to Help Hiring
Kdd 2019: Standardizing Data Science to Help Hiring
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and software
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Spark for Data Science Camp

  • 1. }  8:15 arrive, network, register for tutorial and camp }  8:50-10:50 Tutorial: Introduction to R for Machine Learning }  11:00 Camp Kickoff }  Sponsors: ACM SIGKDD, PayPal, UCSC }  11:25 Keynote: Spark for Data Science, Big & Small }  12:25 Propose Sessions Ask for a “show of hands for interest” à Room Size }  1:15 Lunch, post Session Matrix }  2:00 Session 1 : (50 min for session, 10 min break) }  5:00 Session 4 }  6:00 Session Summary
  • 2. ◦  8:50 – 10:50am by –  Joseph Rickert (Program Manager, Microsoft) –  Robert Horton (Data Scientist, Microsoft) ◦  Rapid introduction to the R language – in depth enough to build machine learning models –  RandomForest, kernlab, caret ◦  Exploratory analysis, visualize, clustering, classification ◦  How to find R help and additional resources ◦  Big data capabilities of Microsoft’s RRE distribution of R
  • 3.
  • 5. An ACM SF Bay Area Professional Chapter Event Saturday, October 24, 2015 SFbayACM.org/event/silicon-valley-data-science-camp-2015 WiFi: conference Password: (none) Twitter Tag #DSCAMP
  • 6. Association of Computing Machinery (ACM) ◦  Principal technical, educational, scientific society for computing professionals world-wide –  Chapter representing SF Bay Area since 1957 ◦  Membership/volunteer led, local dues only $20/yr ◦  Members get discounts with publishers, conferences ◦  Produces monthly free meetings –  3rd Wed on General Computing topics –  4th Mon on Data Science ◦  Details at www.SFbayACM.org –  Suggest, Volunteer, Donate: humphrey@SFBayACM.org
  • 7. }  10 Year Anniversary of Data Science SIG }  Monday night, November 30 at ebay, San Jose ◦  Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years ◦  Ronny Kohavi, Distinguished Engineer & General Manager, Analysis & Experimentation, Microsoft
  • 8. }  Scala Professional Development Seminar ◦  Date: Sat, Nov 7, 8am-5pm ◦  Location: PayPal Town Hall (here) ◦  Speaker: Cay Horstmann, Computer Science, San Jose State University ◦  Author of “Scala for the Impatient” ◦  Interactive crash course into this language ◦  Bring your laptop (w/ Scala pre-loaded) ◦  Presentation / lab format Q) What is Scala? A) Object Oriented Meets Functional http://www.scala-lang.org/
  • 9. }  How many have been to an un-conference? }  Goals and context of the un-conference ◦  Informal ◦  Share enthusiasm, curiosity, knowledge, questions ◦  Participate, make it happen! ◦  Share responsibility (i.e. leave session room after 50 min) ◦  Encourage session note takers to blog & share at end ◦  http://www.campsite.org/list/733 ◦  Respect others – questions & brainstorms are “safe” ◦  Have FUN! Twitter Tag #DSCAMP
  • 10. ◦  Greg Makowski – DS SIG & Conference Chair ◦  Bill Bruns – SF bay ACM Chair ◦  Stephen McInerney – DS SIG ◦  Steve Lazarus – web registration ◦  Seeking replacement before retirement ◦  Greg Weinstein - general ◦  Liana Ye – volunteers, food, registration ◦  Liz Fraley – ACM Treasurer Bill Liana Greg W Liz Steve Greg M Stephen
  • 11. }  8:15 arrive, network, register for tutorial and camp }  8:50-10:50 Tutorial: Introduction to R for Machine Learning }  11:00 Camp Kickoff }  Sponsors: ACM SIGKDD, PayPal, UCSC }  11:25 Keynote: Spark for Data Science, Big & Small }  12:25 Propose Sessions Ask for a “show of hands for interest” à Room Size }  1:15 Lunch, post Session Matrix }  2:00 Session 1 : (50 min for session, 10 min break) }  5:00 Session 4 }  6:00 Session Summary
  • 12. }  SIGKDD: ACM SIG on Knowledge Discovery and Data Mining. ◦  Home of data miners, data scientists, and analytics professionals }  KDD: the premier conference of the field ◦  Research Track, Industry/Government Track, Industry Practice Expo, Tutorials, Workshops, Invited Talks, Panels, KDD Cups
  • 13. Expect 2,000 – 2,500 attendees KDD Cup competition has been going since 2009
  • 14. }  General Chairs }  Program Committee Chairs }  Industry Chairs Balaji Krishnapuram (IBM) Mohak Shah (Bosch, USA) Alex Smola (CMU) Charu Aggarwal (IBM) Rajeev Rastogi (Amazon) Dou Shen (Baidu)
  • 15. Shipeng Yu Associate GC David Hazel, Derek Young Web Chairs Ron Bekkerman Social Network Chair Romer Rosales Proceedings Chair Hanghang Tong, Vishy Vishwanathan Tutorials Chairs Andrei Broder Panels Chair Quoc Le, Zhi-Hua Zhou Workshops Chairs Shou-De Lin KDD Cup co- chair Gabor Melli, Ankur Teredesai Media & Publicity Chairs Ying Li Treasurer Joaquin Quinonero Candela, Olivier Chapelle Local Arrangements Chairs Sofus Macskassy Student Travel Awards Chair
  • 16. 2505 Augustine Drive, Santa Clara, CA 95054 
 (near Freeway 101 off Great American Parkway) http://www.ucsc-extension.edu/ ◦  UCSC Extension offers professional technology courses for software, hardware, IT and Web professionals. Over 100 courses are available for enrollment each quarter. ◦  Has a certificate program on “Database and Data Analytics” is the fastest growing certificate in UCSC Extension. Courses cover big data, data science and database applications. 
 Annual Sponsor
  • 17. Thank PayPal for use of the location Soren Archibald www.KDnuggets.com A primary hub for data mining Co-marketing sponsor Gregory Piatetsky-Shapiro
  • 18. STRONG FOUNDATION STRONG MOMENTUM 169 Million Active Customer Accounts $8 Billion Revenue 4 Billion Payment Transactions +19 Million Active Customer Accounts Gained in 2014 +17% Total Revenue Growth YoY +24% Payment Transactions Growth YoY $235 Billion Total Payment Volume +25% Total Payment Volume Growth YoY
  • 19. © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. KEY ENABLER OF OUR BUSINESS SUPPORTS THE PAYPAL BRAND PROMISE MAKES PAYPAL UNIQUE 19 Invest in Growth & Innovation Improve Experience & Increase Revenue Simultaneously Lowest Loss Rates Secure Customer Champion Simple Onboard Underserved Merchants New Markets, Multiple Funding Types Enroll Users Easily Ongoing Innovation
  • 20. © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. Strong Foundation Strong Front Door 11.5 MILLION PAYMENTS processed daily by PayPal Next-level encryption on every PayPal transaction PayPal never shares financial information with merchants PayPal always verifies a person’s identity for payments 24/7 data analytics combined with human oversight to accurately and quickly spot suspicious activity Constant innovation to advance our machine learning/data mining techniques Seller and buyer protection offered for eligible transactions Security & Fraud Services Consistently ranked among the top in consumer trust & security 20 Financial Information Consumer Privacy Consumers Trust PayPal to Help Protect Their Information % of consumers who trust these companies to protect their financial data and private information such as passwords or birthday Javelin Strategy & Research: Gang of Five: Apple, Google, Amazon, Facebook, and PayPal-eBay: Threat of the Mobile Wallet Disruptors, 2013. 1% 1% 4% 3% 4% 4% 4% 4% 4% 4% 6% 6% 10% 7% 8% 7% 10% 10% 10% 8% 12% 13% 14% 14% 15% 15% 16% 15% 17% 17% 18% 21% 28% 29% 34% 34% Industry Engagement Founding member of the FIDO alliance PayPal chairs the DMARC initiative to reduce phishing attacks against all Internet users PayPal has been doing tokenization for 15+ years, securely storing customers’ financial information in the cloud.
  • 21. }  Joseph Bradley is a Spark Committer working on MLlib at DataBricks }  Ph.D. in Machine Learning from Carnegie Mellon University in 2013 }  Spark allows fast, iterative analysis on laptop & cluster }  Spark DataFrames, allow manipulation of an API inspired by R & Python Pandas }  ML Pipelines facilitate ML workflows and model tuning }  Spark R provides an API for R users to work with distributed data }  Initial PMML support to export models to other tools
  • 23. }  8:15 arrive, network, register for tutorial and camp }  8:50-10:50 Tutorial: Introduction to R for Machine Learning }  11:00 Camp Kickoff }  Sponsors: ACM SIGKDD, PayPal, UCSC }  11:25 Keynote: Spark for Data Science, Big & Small }  12:25 Propose Sessions Ask for a “show of hands for interest” à Room Size }  1:15 Lunch, post Session Matrix }  2:00 Session 1 : (50 min for session, 10 min break) }  5:00 Session 4 }  6:00 Session Summary
  • 25. Town Square A Main auditorium Largest sessions Summary session Town Square C Coffee Food Sponsors bathrooms Entrance Registration Join ACM Courtyard Eat Lunch Fireside A Fireside B Fireside C Fireside D Powwow Talk Soup Stairs WiFi: conference Password: (none) www.SFbayACM.org
  • 26. WiFi: conference Password: (none) www.SFbayACM.org
  • 27. }  Write a topic on a sheet of paper ◦  Facilitators name }  60 seconds per suggestion! ◦  Ask for people to show hands for interest, count ◦  Ask for a time keeper (50 minutes for a session) ◦  Ask for a blogger, note taker or person to report ◦  http://www.campsite.org/list/733 }  Based on interest amount, pick a session location and one of the 4 time frames }  Pick what to attend per session: ◦  2:00 3:00 4:00 5:00 WiFi: conference Password: (none) Twitter Tag #DSCAMP