SlideShare a Scribd company logo
1 of 67
Download to read offline
Robert	Hryniewicz
Data	Advocate
Twitter:	@RobH8z
Email:				rhryniewicz@hortonworks.com
Apache	Spark	Crash	Course	
Hadoop	Summit	Tokyo	2016
2 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Agenda
• Background
• Spark	Overview
• Zeppelin	Overview
• Components	of	HDP
• Lab	~	45min
3 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Data	Sources
à Internet	of	Anything	(IoAT)
– Wind	Turbines,	Oil	Rigs,	Cars
– Weather	Stations,	Smart	Grids
– RFID	Tags,	Beacons,	Wearables
à User	Generated	Content	(Web	&	Mobile)
– Twitter,	Facebook,	Snapchat,	YouTube
– Clickstream,	Ads,	User	Engagement
– Payments:	Paypal,	Venmo
44ZB	in	2020
4 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
The	“Big	Data”	Problem
à A	single	machine	cannot	process	or	even	store	all	the	data!
Problem
Solution
à Distribute	data	over	large	clusters
Difficulty
à How	to	split	work	across	machines?
à Moving	data	over	network	is	expensive
à Must	consider	data	&	network	locality
à How	to	deal	with	failures?
à How	to	deal	with	slow	nodes?
5 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Background
6 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
History	of	Hadoop &	Spark
7 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Access	Rates
At	least	an	order	of	magnitude	difference	between	memory	and	hard	drive	/	network	speed
FAST slower slowest
8 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What	Is	Apache	Spark?
à Apache	open	source	project	
originally	developed	at	AMPLab
(University	of	California	Berkeley)
à Unified	data	processing	engine	that	
operates	across	varied	data	
workloads	and	platforms
9 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Why	Apache	Spark?
à Elegant	Developer	APIs
– Single	environment	for	data	munging,	data	wrangling,	and	Machine	Learning	(ML)
à In-memory	computation	model	– Fast!
– Effective	for	iterative	computations	and	ML
à Machine	Learning
– Implementation	of	distributed	ML	algorithms
– Pipeline	API	(Spark	ML)
10 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Ecosystem
Spark	Core
Spark	SQL Spark	Streaming Spark	MLlib GraphX
11 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Spark	Basics
12 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Context
à Main	entry	point	for	Spark	functionality
à Represents	a	connection	to	a	Spark	cluster
à Represented	as		sc in	your	code	(in	Zeppelin)
What	is	it?
13 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL
14 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL	Overview
à Spark	module	for	structured	data	processing	(e.g.	DB	tables,	JSON	files,	CSV)
à Three	ways	to	manipulate	data:
– DataFrames API
– SQL	queries
– Datasets	API
15 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
DataFrames
à Distributed collection of	data organized into	named	
columns
à Conceptually	equivalent	to	a	table	in	relational	DB	or	
a	data	frame	in	R/Python
à API	available	in	Scala,	Java,	Python,	and	R
Col1 Col2 … … ColN
DataFrame
Column
Row
Data	is	described	as	a	DataFrame
with	rows,	columns,	and	a	schema
16 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
DataFrames
CSVAvro
HIVE
Spark	SQL
Text
Col1 Col2 … … ColN
DataFrame
Column
Row
Created	from	Various	Sources
à DataFrames from	HIVE:
– Reading	and	writing	HIVE	tables
à DataFrames from	files:
– Built-in:	JSON,	JDBC,	ORC,	Parquet,	HDFS
– External	plug-in:	CSV,	HBASE,	Avro
JSON
17 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
SQL	Context
à Entry	point	into	all	functionality	in	Spark	SQL
à All	you	need	is	SparkContext
val sqlContext = SQLContext(sc)
SQLContext
à Superset	of	functionality	provided	by	basic	SQLContext
– Read	data	from	Hive	tables
– Access	to	Hive	Functions	à UDFs
HiveContext
val hc = HiveContext(sc)
Use	when	your	
data	resides	in	
Hive
18 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL	Examples
19 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Setting	up	DataFrame API
val flightsDF = … ç Create from CSV, JSON, Hive etc.
Example:
val path = "examples/flights.json"
val flightsDF = sqlContext.read.json(path)
Create	a	DataFrame
20 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Setting	up	SQL	API
Register	a	Temporary	Table
flightsDF.registerTempTable("flights")
21 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Two	API	Examples:	DataFrame and	SQL	APIs
flightsDF.select("Origin", "Dest", "DepDelay”)
.filter($"DepDelay" > 15).show(5)
Results
+------+----+--------+
|Origin|Dest|DepDelay|
+------+----+--------+
| IAD| TPA| 19|
| IND| BWI| 34|
| IND| JAX| 25|
| IND| LAS| 67|
| IND| MCO| 94|
+------+----+--------+
SELECT Origin, Dest, DepDelay
FROM flights
WHERE DepDelay > 15 LIMIT 5
SQL	API
DataFrame API
22 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
23 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What	is	Stream	Processing?
Batch	Processing
• Ability	to	process	and	analyze	data	at-rest	(stored	data)
• Request-based,	bulk	evaluation	and	short-lived	processing
• Enabler	for	Retrospective,	Reactive	and	On-demand	Analytics
Stream	Processing
• Ability	to	ingest,	process	and	analyze	data	in-motion	in	real- or	near-real-time
• Event	or	micro-batch	driven,	continuous	evaluation	and	long-lived	processing
• Enabler	for	real-time	Prospective,	Proactive	and	Predictive	Analytics	 for	Next	Best	
Action
Stream	Processing	 +		Batch	Processing	 =			All	Data	Analytics
real-time (now) historical (past)
24 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Next Generation Analytics
Iterative & Exploratory
Data is the structure
Traditional Analytics
Structured & Repeatable
Structure built to store data
24
Modern	Data	Applications	approach	to	Insights
Start with hypothesis
Test against selected data
Data leads the way
Explore all data, identify correlations
Analyze after landing… Analyze in motion…
25 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
à Extension	of	Spark	Core	API
à Stream	processing	of	live	data	streams
– Scalable
– High-throughput
– Fault-tolerant
Overview
ZeroMQ
MQTT
26 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
27 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
Discretized	Streams	(DStreams)
à High-level	abstraction	representing	continuous	stream	of	data
à Internally	represented	as	a	sequence	of	RDDs
à Operation	applied	on	a	DStream translates	to	operations	on	the	underlying	RDDs
28 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
Example:	flatMap operation
29 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
à Apply	transformations	over	a	sliding	window	of	data,	e.g.	rolling	average
Window	Operations
30 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	MLlib
31 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Where Can We Use Machine Learning (Data Science)
Healthcare
• Predict	diagnosis
• Prioritize	screenings
• Reduce	re-admittance	rates
Financial	services
• Fraud	Detection/prevention
• Predict	underwriting	risk
• New	account	risk	screens
Public	Sector
• Analyze	public	sentiment
• Optimize	resource	allocation
• Law	enforcement	&	security	
Retail
• Product	recommendation
• Inventory	management
• Price	optimization
Telco/mobile
• Predict	customer	churn
• Predict	equipment	failure
• Customer	behavior	analysis
Oil	&	Gas
• Predictive	maintenance
• Seismic	data	management
• Predict	well	production	levels
32 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Scatter 2D Data Visualized
scatterData ç DataFrame
+-----+--------+
|label|features|
+-----+--------+
|-12.0| [-4.9]|
| -6.0| [-4.5]|
| -7.2| [-4.1]|
| -5.0| [-3.2]|
| -2.0| [-3.0]|
| -3.1| [-2.1]|
| -4.0| [-1.5]|
| -2.2| [-1.2]|
| -2.0| [-0.7]|
| 1.0| [-0.5]|
| -0.7| [-0.2]|
...
...
...
33 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Linear Regression Model Training (one feature)
Coefficients:	2.81				Intercept:	3.05
y	=	2.81x	+	3.05
Training
Result
34 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Linear Regression (two features)
Coefficients: [0.464, 0.464]
Intercept: 0.0563
35 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark API for building ML pipelines
Feature	
transform	
1
Feature	
transform	
2
Combine	
features
Linear
Regression
Input
DataFrame
Input
DataFrame
Output
DataFrame
Pipeline
Pipeline	Model
Train
Predict
Export	Model
36 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	GraphX
37 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
GraphX
à Page	Rank
à Topic	Modeling	(LDA)
à Community	Detection
Source:	ampcamp.berkeley.edu
38 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Zeppelin	&	HDP	Sandbox
39 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What’s Apache Zeppelin?
Web-based notebook
that enables interactive
data analytics.
You can make beautiful
data-driven, interactive
and collaborative
documents with SQL,
Scala and more
40 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What is a Note/Notebook?
• A	web	based	GUI	for	small	code	snippets
• Write	code	snippets	in	browser
• Zeppelin	sends	code	to	backend	for	execution
• Zeppelin	gets	data	back	from	backend
• Zeppelin	visualizes	data
• Zeppelin	Note	=	Set	of	(Paragraphs/Cells)
• Other	Features	- Sharing/Collaboration/Reports/Import/Export
41 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Big	Data	Lifecycle
Collect
ETL	/
Process
Analysis
Report
Data
Product
Business	user
Customer
Data	ScientistData	Engineer
All	in	one	place	in	Zeppelin!
42 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
How	does	Zeppelin	work?
Notebook	
Author
Collaborators/
Report	viewers
Zeppelin
Cluster
Spark	|	Hive	|	HBase
Any	of	30+	back	ends
43 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
HDP	Sandbox
What’s	included	in	the	HDP	Sandbox?
à Zeppelin
à Spark
à YARN	à Resource	Management
à HDFS	à Distributed	Storage	Layer
à And	many	more	components: Hive,	Solr etc. YARN
Scala
Java
Python
R
APIs
Spark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
N
HDFS
44 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Access patterns enabled by YARN
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
°N
HDFS
Hadoop Distributed File System
Interactive Real-TimeBatch
Applications Batch
Needs to happen but, no
timeframe limitations
Interactive
Needs to happen at
Human time
Real-Time
Needs to happen at
Machine Execution time.
45 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Why	Apache	Spark	on	YARN?
à Resource	management	
– Share	Spark	workloads	with	other	
workloads	(HIVE,	Solr,	etc.)
à Utilizes	existing	HDP	cluster	
infrastructure
à Scheduling	and	queues
Spark	Driver
Client
Spark
Application	Master
YARN	container
Spark	Executor
YARN	container
Task Task
Spark	Executor
YARN	container
Task Task
Spark	Executor
YARN	container
Task Task
46 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Why HDFS?
Fault Tolerant Distributed Storage
• Divide	files	into	big	blocks	and	distribute	3	copies	randomly across	the	cluster
• Processing	Data	Locality
• Not	Just	storage	but	computation
10110100101
00100111001
11111001010
01110100101
00101100100
10101001100
01010010111
01011101011
11011011010
10110100101
01001010101
01011100100
11010111010
0
Logical File
1
2
3
4
Blocks
1
Cluster
1
1
2
2
2
3
3
34
4
4
47 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
There’s more to HDP
YARN : Data Operating System
DATA ACCESS SECURITY
GOVERNANCE &
INTEGRATION
OPERATIONS
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
N
Data Lifecycle &
Governance
Falcon
Atlas
Administration
Authentication
Authorization
Auditing
Data Protection
Ranger
Knox
Atlas
HDFS	EncryptionData Workflow
Sqoop
Flume
Kafka
NFS
WebHDFS
Provisioning,
Managing, &
Monitoring
Ambari
Cloudbreak
Zookeeper
Scheduling
Oozie
Batch
MapReduce
Script
Pig
Search
Solr
SQL
Hive
NoSQL
HBase
Accumulo
Phoenix
Stream
Storm
In-memory Others
ISV Engines
Tez Tez Slider Slider
DATA MANAGEMENT
Hortonworks	Data	Platform	2.4.x
Deployment	ChoiceLinux	 Windows	 On-Premise	 Cloud
HDFS Hadoop Distributed File System
48 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Data	Cloud
49 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
50 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
51 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Bringing	Multitenancy	to	Apache	Zeppelin
52 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Introducing	Livy
à Livy	is	the	open	source	REST	interface	for	interacting	with	Apache	Spark	from	anywhere	
à Installed	as	Spark	Ambari Service
Livy Client
HTTP HTTP	(RPC)
Spark	Interactive	Session
SparkContext
Spark	Batch	Session
SparkContext
Livy Server
53 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Security	Across	Zeppelin-Livy-Spark
Shiro
Ispark	Group	Interpreter
SPNego:	Kerberos Kerberos
Livy	APIs
Spark	on	YARN
Zeppelin
Driver
LDAP
Livy Server
54 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Reasons	to	Integrate	with	Livy
à Bring	Sessions	to	Apache	Zeppelin
– Isolation
– Session	sharing	
à Enable	efficient	cluster	resource	utilization
– Default	Spark	interpreter	keeps	YARN/Spark	job	running	forever
– Livy	interpreter	recycled	after	60	minutes	of	inactivity	
(controlled	by	livy.server.session.timeout )
à To	Identity	Propagation
– Send	user	identity	from	Zeppelin		>	Livy		>	Spark	on	YARN
55 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Livy Server
SparkContext	Sharing
Session-2
Session-1
SparkSession-1
SparkContext
SparkSession-2
SparkContext
Client	1
Client	2
Client	3
Session-1
Session-1
Session-2
56 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Sample	Architecture
57 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Managed	Dataflow
SOURCES
REGIONAL	
INFRASTRUCTURE
CORE	
INFRASTRUCTURE
58 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
High-Level	Overview
IoT Edge
(single	node)
IoT Edge
(single	node)
IoT Devices
IoT Devices
NiFi Hub Data	Broker
Column	
DB
Data	
Store
Live	Dashboard
Data	Center
(on	prem/cloud)
HDFS/S3 HBase/Cassandra
59 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What’s	new	in	Spark	2.0
60 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	2.0
à API	Improvements
– SparkSession (spark)	– new	entry	point	 (Replaces	SQLContext and	HiveContext)
– Unified	DataFrame &	DataSet API	 (DataFrame à alias	for	DataSet[Row])
– Structured	Streaming/Continuous	Application		 (Concept	of	an	infinite	DataFrame)
– Temporary	Table	à Temporary	View
à Performance	Improvements
– Tungsten	Phase	2	- Multi	stage	code	gen
– ORC	&	Parquet	file	improvements
à Machine	Learning	
– ML	pipeline	the	new	API,	MLlib deprecated
– Distributed	R	algorithms	(GLM,	Naïve	Bayes,	K-Means,	Survival	Regression)
à SparkSQL
– More	SQL	support	(new	ANSI	SQL	parser,	subquery	support)
61 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What’s	the	latest	at	Hortonworks?
à HDP	2.5
– Batch	Processing
à HDF	2.0
– Streaming	Apps
DATA	AT
REST
DATA	IN	
MOTION
ACTIONABLE
INTELLIGENCE
Modern	Data	Applications
62 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Lab	Preview
63 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Lab	Setup	Instructions
http://tinyurl.com/hwx-spark-intro
Lab	Options
- Local	Sandbox	(8GB	RAM	memory	required):
- VirtualBox or	Vmware
- Amazon	AWS	Cloud:
- Hortonworks	Data	Cloud	
è Setup	info:	http://hortonworks.github.io/hdp-aws/index.html
http://hortonworks.github.io/hdp-aws/index.html
http://hortonworks.github.io/hdp-aws/index.html
64 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Community	Connection
65 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Community	Engagement
Participate now at: community.hortonworks.com©	Hortonworks	Inc.	2011	– 2015.	All	Rights	Reserved
9,500+
Registered	Users
21,000+
Answers
32,500+
Technical	Assets
One Website!
66 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Community	Connection
Read access for everyone, join to participate and be recognized
• Full	Q&A	Platform	(like	StackOverflow)
• Knowledge	Base	Articles
• Code	Samples	and	Repositories
Robert	Hryniewicz
E:	rhryniewicz@hortonworks.com
T:	@RobH8z
Thanks!

More Related Content

What's hot

Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHortonworks
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANDataWorks Summit/Hadoop Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with ZeppelinHortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin DataWorks Summit/Hadoop Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 

What's hot (19)

Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJIntro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache deep learning 101
Apache deep learning 101Apache deep learning 101
Apache deep learning 101
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 

Viewers also liked

Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...DataWorks Summit/Hadoop Summit
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...DataWorks Summit/Hadoop Summit
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...DataWorks Summit/Hadoop Summit
 
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...DataWorks Summit/Hadoop Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
 
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile gamesSEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
 
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 

Similar to #HSTokyo16 Apache Spark Crash Course

Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXKirk Haslbeck
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationAbdelkrim Hadjidj
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationIsheeta Sanghi
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming dataCarolyn Duby
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionMilind Pandit
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real WorldDataWorks Summit
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureMats Johansson
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJDaniel Madrigal
 
Time-series data analysis and persistence with Druid
Time-series data analysis and persistence with DruidTime-series data analysis and persistence with Druid
Time-series data analysis and persistence with DruidRaúl Marín
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 

Similar to #HSTokyo16 Apache Spark Crash Course (20)

Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Apache Metron in the Real World
Apache Metron in the Real WorldApache Metron in the Real World
Apache Metron in the Real World
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
 
Time-series data analysis and persistence with Druid
Time-series data analysis and persistence with DruidTime-series data analysis and persistence with Druid
Time-series data analysis and persistence with Druid
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

#HSTokyo16 Apache Spark Crash Course