The document provides an agenda for a data science camp event. It includes details on registration from 8:15-8:50am, a tutorial on machine learning with R from 8:50-10:50am, a keynote on Spark for data science from 11:25-12:25pm, proposing and scheduling sessions from 12:25-1:15pm, and 4 concurrent sessions throughout the afternoon until a summary session at 6:00pm. The event is sponsored by ACM SIGKDD, PayPal, and UCSC and will be held at PayPal's offices.
1. } 8:15 arrive, network, register for tutorial and camp
} 8:50-10:50 Tutorial: Introduction to R for Machine
Learning
} 11:00 Camp Kickoff
} Sponsors: ACM SIGKDD, PayPal, UCSC
} 11:25 Keynote: Spark for Data Science, Big & Small
} 12:25 Propose Sessions
Ask for a “show of hands for interest” à Room Size
} 1:15 Lunch, post Session Matrix
} 2:00 Session 1
: (50 min for session, 10 min break)
} 5:00 Session 4
} 6:00 Session Summary
2. ◦ 8:50 – 10:50am by
– Joseph Rickert (Program Manager, Microsoft)
– Robert Horton (Data Scientist, Microsoft)
◦ Rapid introduction to the R language – in
depth enough to build machine learning
models
– RandomForest, kernlab, caret
◦ Exploratory analysis, visualize, clustering,
classification
◦ How to find R help and additional resources
◦ Big data capabilities of Microsoft’s RRE
distribution of R
5. An ACM SF Bay Area Professional Chapter Event
Saturday, October 24, 2015
SFbayACM.org/event/silicon-valley-data-science-camp-2015
WiFi: conference Password: (none)
Twitter Tag #DSCAMP
6. Association of Computing Machinery (ACM)
◦ Principal technical, educational, scientific society for
computing professionals world-wide
– Chapter representing SF Bay Area since 1957
◦ Membership/volunteer led, local dues only $20/yr
◦ Members get discounts with publishers, conferences
◦ Produces monthly free meetings
– 3rd Wed on General Computing topics
– 4th Mon on Data Science
◦ Details at www.SFbayACM.org
– Suggest, Volunteer, Donate: humphrey@SFBayACM.org
7. } 10 Year Anniversary of Data Science SIG
} Monday night, November 30 at ebay, San Jose
◦ Online Controlled Experiments: Lessons from Running
A/B/n Tests for 12 Years
◦ Ronny Kohavi, Distinguished Engineer & General
Manager, Analysis & Experimentation, Microsoft
8. } Scala Professional Development Seminar
◦ Date: Sat, Nov 7, 8am-5pm
◦ Location: PayPal Town Hall (here)
◦ Speaker: Cay Horstmann, Computer Science,
San Jose State University
◦ Author of “Scala for the Impatient”
◦ Interactive crash course into this language
◦ Bring your laptop (w/ Scala pre-loaded)
◦ Presentation / lab format
Q) What is Scala?
A) Object Oriented Meets Functional
http://www.scala-lang.org/
9. } How many have been to an un-conference?
} Goals and context of the un-conference
◦ Informal
◦ Share enthusiasm, curiosity, knowledge, questions
◦ Participate, make it happen!
◦ Share responsibility (i.e. leave session room after 50 min)
◦ Encourage session note takers to blog & share at end
◦ http://www.campsite.org/list/733
◦ Respect others – questions & brainstorms are “safe”
◦ Have FUN!
Twitter Tag #DSCAMP
10. ◦ Greg Makowski – DS SIG & Conference Chair
◦ Bill Bruns – SF bay ACM Chair
◦ Stephen McInerney – DS SIG
◦ Steve Lazarus – web registration
◦ Seeking replacement before retirement
◦ Greg Weinstein - general
◦ Liana Ye – volunteers, food, registration
◦ Liz Fraley – ACM Treasurer
Bill
Liana
Greg W
Liz
Steve
Greg M
Stephen
11. } 8:15 arrive, network, register for tutorial and camp
} 8:50-10:50 Tutorial: Introduction to R for Machine
Learning
} 11:00 Camp Kickoff
} Sponsors: ACM SIGKDD, PayPal, UCSC
} 11:25 Keynote: Spark for Data Science, Big & Small
} 12:25 Propose Sessions
Ask for a “show of hands for interest” à Room Size
} 1:15 Lunch, post Session Matrix
} 2:00 Session 1
: (50 min for session, 10 min break)
} 5:00 Session 4
} 6:00 Session Summary
12. } SIGKDD: ACM SIG on Knowledge Discovery
and Data Mining.
◦ Home of data miners, data scientists, and analytics
professionals
} KDD: the premier conference of the field
◦ Research Track, Industry/Government Track, Industry
Practice Expo, Tutorials, Workshops, Invited Talks,
Panels, KDD Cups
13. Expect 2,000 – 2,500
attendees
KDD Cup competition
has been going since
2009
14. } General Chairs
} Program Committee Chairs
} Industry Chairs
Balaji
Krishnapuram
(IBM)
Mohak Shah
(Bosch, USA)
Alex Smola
(CMU)
Charu Aggarwal
(IBM)
Rajeev
Rastogi
(Amazon)
Dou Shen
(Baidu)
15. Shipeng Yu
Associate GC
David Hazel, Derek
Young
Web Chairs
Ron Bekkerman
Social Network Chair
Romer Rosales
Proceedings Chair
Hanghang Tong, Vishy Vishwanathan
Tutorials Chairs
Andrei Broder
Panels Chair
Quoc Le, Zhi-Hua
Zhou
Workshops Chairs
Shou-De Lin
KDD Cup co- chair
Gabor Melli, Ankur Teredesai
Media & Publicity Chairs
Ying Li
Treasurer
Joaquin Quinonero Candela, Olivier Chapelle
Local Arrangements Chairs
Sofus Macskassy
Student Travel Awards
Chair
16. 2505 Augustine Drive, Santa Clara, CA 95054
(near Freeway 101 off Great American Parkway)
http://www.ucsc-extension.edu/
◦ UCSC Extension offers professional technology
courses for software, hardware, IT and Web
professionals. Over 100 courses are available for
enrollment each quarter.
◦ Has a certificate program on “Database and Data
Analytics” is the fastest growing certificate in UCSC
Extension. Courses cover big data, data science and
database applications.
Annual Sponsor
17. Thank PayPal for use of the location
Soren Archibald
www.KDnuggets.com
A primary hub for data mining
Co-marketing sponsor
Gregory Piatetsky-Shapiro
18. STRONG FOUNDATION STRONG MOMENTUM
169 Million
Active Customer Accounts
$8 Billion
Revenue
4 Billion
Payment Transactions
+19 Million
Active Customer Accounts Gained in 2014
+17%
Total Revenue Growth YoY
+24%
Payment Transactions Growth YoY
$235 Billion
Total Payment Volume
+25%
Total Payment Volume Growth YoY
21. } Joseph Bradley is a Spark Committer
working on MLlib at DataBricks
} Ph.D. in Machine Learning from Carnegie
Mellon University in 2013
} Spark allows fast, iterative analysis on laptop & cluster
} Spark DataFrames, allow manipulation of an API inspired
by R & Python Pandas
} ML Pipelines facilitate ML workflows and model tuning
} Spark R provides an API for R users to work with
distributed data
} Initial PMML support to export models to other tools
23. } 8:15 arrive, network, register for tutorial and camp
} 8:50-10:50 Tutorial: Introduction to R for Machine
Learning
} 11:00 Camp Kickoff
} Sponsors: ACM SIGKDD, PayPal, UCSC
} 11:25 Keynote: Spark for Data Science, Big & Small
} 12:25 Propose Sessions
Ask for a “show of hands for interest” à Room Size
} 1:15 Lunch, post Session Matrix
} 2:00 Session 1
: (50 min for session, 10 min break)
} 5:00 Session 4
} 6:00 Session Summary
25. Town Square
A
Main auditorium
Largest sessions
Summary session
Town Square
C
Coffee
Food
Sponsors
bathrooms
Entrance
Registration
Join
ACM
Courtyard
Eat Lunch
Fireside
A
Fireside
B
Fireside
C
Fireside
D
Powwow
Talk Soup
Stairs
WiFi: conference Password: (none) www.SFbayACM.org
27. } Write a topic on a sheet of paper
◦ Facilitators name
} 60 seconds per suggestion!
◦ Ask for people to show hands for interest, count
◦ Ask for a time keeper (50 minutes for a session)
◦ Ask for a blogger, note taker or person to report
◦ http://www.campsite.org/list/733
} Based on interest amount, pick a session
location and one of the 4 time frames
} Pick what to attend per session:
◦ 2:00 3:00 4:00 5:00
WiFi: conference Password: (none)
Twitter Tag #DSCAMP