SlideShare a Scribd company logo
1 of 72
Download to read offline
Hive	Bucketing	in
Apache	Spark
Tejas	Patil
Facebook
Agenda
• Why	bucketing	?
• Why	is	shuffle	bad	?
• How	to	avoid	shuffle	?
• When	to	use	bucketing	?
• Spark’s	bucketing	support
• Bucketing	semantics	of	Spark	vs	Hive
• Hive	bucketing	support	in	Spark
• SQL	Planner	improvements
Why	bucketing	?
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	JOIN
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	JOIN
Broadcast	hash	join
- Ship	smaller	table	to	all	nodes
- Stream	the	other	table
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	JOIN
Broadcast	hash	join Shuffle	hash	join
- Ship	smaller	table	to	all	nodes
- Stream	the	other	table
- Shuffle	both	tables,
- Hash	smaller	one,	stream	the	
bigger	one
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	JOIN
Broadcast	hash	join Shuffle	hash	join
- Ship	smaller	table	to	all	nodes
- Stream	the	other	table
- Shuffle	both	tables,
- Hash	smaller	one,	stream	the	
bigger	one
Sort	merge	join
- Shuffle	both	tables,
- Sort	both	tables,	buffer	one,	
stream	the	bigger	one
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	JOIN
Broadcast	hash	join Shuffle	hash	join Sort	merge	join
- Ship	smaller	table	to	all	nodes
- Stream	the	other	table
- Shuffle	both	tables,
- Hash	smaller	one,	stream	the	
bigger	one
- Shuffle	both	tables,
- Sort	both	tables,	buffer	one,	
stream	the	bigger	one
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	
Sort	Merge	Join
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	
Shuffle ShuffleShuffleShuffle
Sort	Merge	Join
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	
Sort Sort Sort Sort
Shuffle ShuffleShuffleShuffle
Sort	Merge	Join
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	
Sort
Join
Sort
Join
Sort
Join
Sort
Join
Shuffle ShuffleShuffleShuffle
Sort	Merge	Join
Sort	Merge	Join
Why	is	shuffle	bad	?
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	
Disk Disk
Disk	IO	for	shuffle	outputs
Why	is	shuffle	bad	?
Table	A Table	B
.	.	.	.	.	.	 .	.	.	.	.	.	
Shuffle Shuffle Shuffle Shuffle
M	*	R	network	connections
Why	is	shuffle	bad	?
How	to	avoid	shuffle	?
student	s JOIN	attendance	a	
ON	s.id =	a.student_id
student	s	JOIN	results	r	
ON	s.id =	r.student_id
student	s	JOIN	course_registrationc
ON	s.id =	c.student_id
……..
……..
student	s	JOIN	attendance	a	
ON	s.id =	a.student_id
student	s	JOIN	results	r	
ON	s.id =	r.student_id
student	s	JOIN	attendance	a	
ON	s.id =	a.student_id
student	s	JOIN	results	r	
ON	s.id =	r.student_id
Pre-compute	at	
table	creation
Bucketing	=
pre-(shuffle	+	sort)	inputs
on	join	keys
Without	
bucketing
Job(s)	populating	input	table
Without	
bucketing
With
bucketing
Job(s)	populating	input	table
Performance	comparison
0
100
200
300
400
500
600
700
800
900
Before After	Bucketing
Reserved	CPU	time	(days)
0
1
2
3
4
5
6
7
8
Before After	Bucketing
Latency	(hours)
When	to	use	bucketing	?
• Tables	used	frequently	in	JOINs	with	same	key
• Student	->	student_id
• Employee	->	employee_id
• Users	->	user_id
When	to	use	bucketing	?
• Tables	used	frequently	in	JOINs	with	same	key
• Student	->	student_id
• Employee	->	employee_id
• Users	->	user_id
….	=>	Dimension	tables
When	to	use	bucketing	?
• Tables	used	frequently	in	JOINs	with	same	key
• Student	->	student_id
• Employee	->	employee_id
• Users	->	user_id
• Loading	daily	cumulative	tables
• Both	base	and	delta	tables	could	be	bucketed	on	a	common	column
When	to	use	bucketing	?
• Tables	used	frequently	in	JOINs	with	same	key
• Student	->	student_id
• Employee	->	employee_id
• Users	->	user_id
• Loading	daily	cumulative	tables
• Both	base	and	delta	tables	could	be	bucketed	on	a	common	column
• Indexing	capability
Spark’s	bucketing	support
Creation	of	bucketed	tables
df.write
.bucketBy(numBuckets,	 "col1",	...)
.sortBy("col1",	...)
.saveAsTable("bucketed_table")
via	Dataframe API
Creation	of	bucketed	tables
CREATE	TABLE	bucketed_table(
column1	INT,
...
)	USING	parquet
CLUSTERED	BY	(column1,	…)
SORTED	BY	(column1,	…)
INTO	`n`	BUCKETS
via	SQL	statement
Check	bucketing	spec
scala>	sparkContext.sql("DESC	FORMATTED	student").collect.foreach(println)
[#	col_name,data_type,comment]
[student_id,int,null]
[name,int,null]
[#	Detailed	Table	Information,,]
[Database,default,]
[Table,table1,]
[Owner,tejas,]
[Created,Fri May	12	08:06:33	PDT	2017,]
[Type,MANAGED,]
[Num Buckets,64,]
[Bucket	Columns,[`student_id`],]
[Sort	Columns,[`student_id`],]
[Properties,[serialization.format=1],]
[Serde Library,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat,org.apache.hadoop.mapred.SequenceFileInputFormat,]
Bucketing	config
SET	spark.sql.sources.bucketing.enabled=true
[SPARK-15453]	Extract	bucketing	info	in	
FileSourceScanExec
SELECT	*	FROM	tableA JOIN	tableB ON	tableA.i=	tableB.i AND	tableA.j=	tableB.j
[SPARK-15453]	Extract	bucketing	info	in	
FileSourceScanExec
SELECT	*	FROM	tableA JOIN	tableB ON	tableA.i=	tableB.i AND	tableA.j=	tableB.j
Sort	Merge	Join
Sort
Exchange
TableScan
Sort
Exchange
TableScan Already	bucketed	on	columns	(i,	j)
Shuffle	on	columns	(i,	j)
Sort	on	columns	(i,	j)
Join	on	[i,j],	[i,j]	respectively	of	relations
[SPARK-15453]	Extract	bucketing	info	in	
FileSourceScanExec
SELECT	*	FROM	tableA JOIN	tableB ON	tableA.i=	tableB.i AND	tableA.j=	tableB.j
Sort	Merge	Join
Sort
Exchange
TableScan
Sort
Exchange
TableScan Already	bucketed	on	columns	(i,	j)
Shuffle	on	columns	(i,	j)
Sort	on	columns	(i,	j)
Join	on	[i,j],	[i,j]	respectively	of	relations
[SPARK-15453]	Extract	bucketing	info	in	
FileSourceScanExec
SELECT	*	FROM	tableA JOIN	tableB ON	tableA.i=	tableB.i AND	tableA.j=	tableB.j
Sort	Merge	Join
TableScan TableScan
Sort	Merge	Join
TableScan TableScanSort
Exchange
Sort
Exchange
Bucketing	semantics	of
Spark	vs	Hive
Bucketing	semantics	of	Spark	vs	Hive
bucket	0 bucket	1
bucket
(n-1)
…..
Hive
my_bucketed_table
Bucketing	semantics	of	Spark	vs	Hive
bucket	0 bucket	1
bucket
(n-1)
…..
Hive Spark
bucket	0 bucket	1
bucket
(n-1)
…..
bucket	0
bucket	0
bucket	1
bucket
(n-1)bucket
(n-1)bucket
(n-1)
my_bucketed_tablemy_bucketed_table
Bucketing	semantics	of	Spark	vs	Hive
bucket	0 bucket	1
bucket
(n-1)
…..
Hive
M M M M M
R R R
…..
Bucketing	semantics	of	Spark	vs	Hive
bucket	0 bucket	1
bucket
(n-1)
…..
Hive Spark
M M M M M
R R R
…..
bucket	0 bucket	1
bucket
(n-1)
…..
M M M M M…..
bucket	0
bucket	0
bucket	1
bucket
(n-1)bucket
(n-1)bucket
(n-1)
Bucketing	semantics	of	Spark	vs	Hive
Hive Spark
Model Optimizes	 reads,	writes	are	costly Writes	are	cheaper,	 reads	are	
costlier
Bucketing	semantics	of	Spark	vs	Hive
Hive Spark
Model Optimizes	 reads,	writes	are	costly Writes	are	cheaper,	 reads	are	
costlier
Unit	of	bucketing A	single	file	represents	 a	 single	
bucket
A	collection	 of	files	together	
comprise	 a	single	bucket
Bucketing	semantics	of	Spark	vs	Hive
Hive Spark
Model Optimizes	 reads,	writes	are	costly Writes	are	cheaper,	 reads	are	
costlier
Unit	of	bucketing A	single	file	represents	 a	 single	
bucket
A	collection	 of	files	together	
comprise	 a	single	bucket
Sort	ordering Each	bucket	is	sorted	globally.	 This	
makes	it	easy	to	optimize	 queries	
reading	this	data
Each	file	is	individually	 sorted	 but	
the	“bucket”	as	a	whole	is	not	
globally	 sorted
Bucketing	semantics	of	Spark	vs	Hive
Hive Spark
Model Optimizes	 reads,	writes	are	costly Writes	are	cheaper,	 reads	are	
costlier
Unit	of	bucketing A	single	file	represents	 a	 single	
bucket
A	collection	 of	files	together	
comprise	 a	single	bucket
Sort	ordering Each	bucket	is	sorted	globally.	 This	
makes	it	easy	to	optimize	 queries	
reading	this	data
Each	file	is	individually	 sorted	 but	
the	“bucket”	as	a	whole	is	not	
globally	 sorted
Hashing	function Hive’s	 inbuilt	 hash Murmur3Hash
[SPARK-19256] Hive	bucketing	support
• Introduce	Hive’s	hashing	function	[SPARK-17495]
• Enable	creating	hive	bucketed	tables	[SPARK-17729]
• Support	Hive’s	`Bucketed	file	format`
• Propagate	Hive	bucketing	information	to	planner	[SPARK-17654]
• Expressing	outputPartitioning and	requiredChildDistribution
• Creating	empty	bucket	files	when	no	data	for	a	bucket
• Allow	Sort	Merge	join	over	tables	with	number	of	buckets	
multiple	of	each	other
• Support	N-way	Sort	Merge	Join
[SPARK-19256]	Hive	bucketing	support
• Introduce	Hive’s	hashing	function	[SPARK-17495]
• Enable	creating	hive	bucketed	tables	[SPARK-17729]
• Support	Hive’s	`Bucketed	file	format`
• Propagate	Hive	bucketing	information	to	planner	[SPARK-17654]
• Expressing	outputPartitioning and	requiredChildDistribution
• Creating	empty	bucket	files	when	no	data	for	a	bucket
• Allow	Sort	Merge	join	over	tables	with	number	of	buckets	
multiple	of	each	other
• Support	N-way	Sort	Merge	Join
Merged	in
upstream
[SPARK-19256]	Hive	bucketing	support
• Introduce	Hive’s	hashing	function	[SPARK-17495]
• Enable	creating	hive	bucketed	tables	[SPARK-17729]
• Support	Hive’s	`Bucketed	file	format`
• Propagate	Hive	bucketing	information	to	planner	[SPARK-17654]
• Expressing	outputPartitioning and	requiredChildDistribution
• Creating	empty	bucket	files	when	no	data	for	a	bucket
• Allow	Sort	Merge	join	over	tables	with	number	of	buckets	
multiple	of	each	other
• Support	N-way	Sort	Merge	Join
FB-prod	(6	months)
SQL	Planner	improvements
[SPARK-19122]	Unnecessary	shuffle+sort added	if	
join	predicates	ordering	differ	from	bucketing	and	
sorting	order
SELECT	*	FROM	a	JOIN	b	ON	a.x=b.xAND	a.y=b.y
[SPARK-19122]	Unnecessary	shuffle+sort added	if	
join	predicates	ordering	differ	from	bucketing	and	
sorting	order
Sort	Merge	Join
TableScan TableScan
SELECT	*	FROM	a	JOIN	b	ON	a.x=b.xAND	a.y=b.y
[SPARK-19122]	Unnecessary	shuffle+sort added	if	
join	predicates	ordering	differ	from	bucketing	and	
sorting	order
SELECT	*	FROM	a	JOIN	b	ON	a.x=b.xAND	a.y=b.y
Sort	Merge	Join
TableScan TableScan
SELECT	*	FROM	a	JOIN	b	ON	a.y=b.y AND	a.x=b.x
[SPARK-19122]	Unnecessary	shuffle+sort added	if	
join	predicates	ordering	differ	from	bucketing	and	
sorting	order
Sort	Merge	Join
Exchange
TableScan
Sort
Exchange
TableScan
Sort
Sort	Merge	Join
TableScan TableScan
SELECT	*	FROM	a	JOIN	b	ON	a.x=b.xAND	a.y=b.y SELECT	*	FROM	a	JOIN	b	ON	a.y=b.y AND	a.x=b.x
[SPARK-19122]	Unnecessary	shuffle+sort added	if	
join	predicates	ordering	differ	from	bucketing	and	
sorting	order
Sort	Merge	Join
Exchange
TableScan
Sort
Exchange
TableScan
Sort
Sort	Merge	Join
TableScan TableScan
SELECT	*	FROM	a	JOIN	b	ON	a.x=b.xAND	a.y=b.y SELECT	*	FROM	a	JOIN	b	ON	a.y=b.y AND	a.x=b.x
[SPARK-19122]	Unnecessary	shuffle+sort added	if	
join	predicates	ordering	differ	from	bucketing	and	
sorting	order
Sort	Merge	Join
TableScan TableScan
SELECT	*	FROM	a	JOIN	b	ON	a.x=b.xAND	a.y=b.y SELECT	*	FROM	a	JOIN	b	ON	a.y=b.y AND	a.x=b.x
Sort	Merge	Join
TableScan TableScan
[SPARK-17271]	Planner	adds	un-necessary	Sort	
even	if	child	ordering	is	semantically	same	as	
required	ordering
SELECT	*	FROM	table1	a	JOIN	table2	b	ON	a.col1=b.col1
[SPARK-17271]	Planner	adds	un-necessary	Sort	
even	if	child	ordering	is	semantically	same	as	
required	ordering
SELECT	*	FROM	table1	a	JOIN	table2	b	ON	a.col1=b.col1
Sort	Merge	Join
Sort
TableScan
Sort
TableScan Already	bucketed	on	column	`col1`
Sort	on	columns	(a.col1	and	b.col1)
Join	on	[col1],	[col1]	respectively	of	relations
[SPARK-17271]	Planner	adds	un-necessary	Sort	
even	if	child	ordering	is	semantically	same	as	
required	ordering
SELECT	*	FROM	table1	a	JOIN	table2	b	ON	a.col1=b.col1
Sort	Merge	Join
TableScan TableScan Already	bucketed	on	column	`col1`
Sort	on	columns	(a.col1	and	b.col1)Sort Sort
Join	on	[col1],	[col1]	respectively	of	relations
[SPARK-17271]	Planner	adds	un-necessary	Sort	
even	if	child	ordering	is	semantically	same	as	
required	ordering
SELECT	*	FROM	table1	a	JOIN	table2	b	ON	a.col1=b.col1
Sort	Merge	Join
TableScan TableScan
Sort Sort
Sort	Merge	Join
TableScan TableScan
[SPARK-17698]	Join	predicates	should	not	
contain	filter	clauses
SELECT	a.id,	b.id FROM	table1	a	FULL	OUTER	JOIN	table2	b	ON	a.id=	b.id AND	a.id='1'	AND	b.id='1'
[SPARK-17698]	Join	predicates	should	not	
contain	filter	clauses
SELECT	a.id,	b.id FROM	table1	a	FULL	OUTER	JOIN	table2	b	ON	a.id=	b.id AND	a.id='1'	AND	b.id='1'
Sort	Merge	Join
TableScan TableScan
Sort
Exchange
Sort
Exchange
Already	bucketed	on	column	[id]
Shuffle	on	columns	[id,	1,	id]	and	[id,	id,	1]
Sort	on	columns	[id,	id,	1]	and	[id,	1,	id]	
Join	on	[id,	id,	1],	[id,	1,	id]	respectively	of	relations
[SPARK-17698]	Join	predicates	should	not	
contain	filter	clauses
SELECT	a.id,	b.id FROM	table1	a	FULL	OUTER	JOIN	table2	b	ON	a.id =	b.id AND a.id='1'	AND b.id='1'
Sort	Merge	Join
TableScan TableScan
Sort
Exchange
Sort
Exchange
Already	bucketed	on	column	[id]
Shuffle	on	columns	[id,	1,	id]	and	[id,	id,	1]
Sort	on	columns	[id,	id,	1]	and	[id,	1,	id]	
Join	on	[id,	id,	1],	[id,	1,	id]	respectively	of	relations
[SPARK-17698]	Join	predicates	should	not	
contain	filter	clauses
SELECT	a.id,	b.id FROM	table1	a	FULL	OUTER	JOIN	table2	b	ON	a.id =	b.id AND a.id='1'	AND b.id='1'
Sort	Merge	Join
TableScan TableScan
Sort
Exchange
Sort
Exchange
Already	bucketed	on	column	[id]
Shuffle	on	columns	[id,	1,	id]	and	[id,	id,	1]
Sort	on	columns	[id,	id,	1]	and	[id,	1,	id]	
Join	on	[id,	id,	1],	[id,	1,	id]	respectively	of	relations
[SPARK-17698]	Join	predicates	should	not	
contain	filter	clauses
SELECT	a.id,	b.id FROM	table1	a	FULL	OUTER	JOIN	table2	b	ON	a.id =	b.id AND a.id='1'	AND b.id='1'
Sort	Merge	Join
TableScan TableScan
Sort	Merge	Join
TableScan TableScanSort
Exchange
Sort
Exchange
[SPARK-13450]	Introduce	
ExternalAppendOnlyUnsafeRowArray
[SPARK-13450]	Introduce	
ExternalAppendOnlyUnsafeRowArray
Buffered	
relation
Streamed	
relation
If	there	are	a	lot	of	rows	to	be	
buffered	(eg.	Skew),	
the	query	would	OOM
row	with	
key=`foo`
rows	with	
key=`foo`
[SPARK-13450]	Introduce	
ExternalAppendOnlyUnsafeRowArray
Buffered	
relation
Streamed	
relation
row	with	
key=`foo`
rows	with	
key=`foo`
Disk
buffer	would	spill	to	disk	
after	reaching	a	threshold
spill
Summary
• Shuffle	and	sort	is	costly	due	to	disk	and	network	IO
• Bucketing	will	pre-(shuffle	and	sort)	the	inputs
• Expect	at	least	2-5x	gains	after	bucketing	input	tables	for	joins
• Candidates	for	bucketing:
• Tables	used	frequently	in	JOINs	with	same	key
• Loading	of	data	in	cumulative	tables
• Spark’s	bucketing	is	not	compatible	with	Hive’s	bucketing
Questions	?

More Related Content

What's hot

Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Databricks
 
PLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptxPLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptxjohnwick814916
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 

What's hot (20)

Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
 
PLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptxPLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptx
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

Hive Bucketing in Apache Spark with Tejas Patil