SlideShare a Scribd company logo
1 of 40
Download to read offline
V5	
  
The	
  Evolu*on	
  of	
  a	
  Rela*onal	
  Database	
  Layer	
  over	
  HBase	
  
@ApachePhoenix	
  
h0p://phoenix.apache.org	
  
2015-­‐09-­‐29	
  
Nick	
  Dimiduk	
  (@xefyr)	
  
h0p://n10k.com	
  
#apachebigdata	
  
Agenda	
  
o  What	
  is	
  Apache	
  Phoenix?	
  
o  State	
  of	
  the	
  Union	
  
o  Latest	
  Releases	
  
o  A	
  brief	
  aside	
  on	
  TransacQons	
  
o  What’s	
  Next?	
  
o  Q	
  &	
  A	
  
WHAT	
  IS	
  APACHE	
  PHOENIX	
  
PuUng	
  the	
  SQL	
  back	
  into	
  NoSQL	
  
What	
  is	
  Apache	
  Phoenix?	
  
o  A	
  relaQonal	
  database	
  layer	
  for	
  Apache	
  HBase	
  
o  Query	
  engine	
  
o  Transforms	
  SQL	
  queries	
  into	
  naQve	
  HBase	
  API	
  calls	
  
o  Pushes	
   as	
   much	
   work	
   as	
   possible	
   onto	
   the	
   cluster	
   for	
   parallel	
  
execuQon	
  
o  Metadata	
  repository	
  
o  Typed	
  access	
  to	
  data	
  stored	
  in	
  HBase	
  tables	
  
o  A	
  JDBC	
  driver	
  
o  A	
  top	
  level	
  Apache	
  So_ware	
  FoundaQon	
  project	
  
o  Originally	
  developed	
  at	
  Salesforce	
  
o  A	
  growing	
  community	
  with	
  momentum	
  
What	
  is	
  Apache	
  Phoenix?	
  
==	
  
==	
  
Where	
  Does	
  Phoenix	
  Fit	
  In?	
  Sqoop	
  	
  
RDB	
  Data	
  Collector	
  
Flume	
  	
  
Log	
  Data	
  Collector	
  
Zookeeper	
  
CoordinaQon	
  
YARN	
  /	
  MRv2	
  
Cluster	
  Resource	
  
Manager	
  /	
  
MapReduce	
  
HDFS	
  
Hadoop	
  Distributed	
  File	
  System	
  
GraphX	
  
Graph	
  analysis	
  
framework	
  
Phoenix	
  
Query	
  execuQon	
  engine	
  
HBase	
  
Distributed	
  Database	
  
The	
  Java	
  Virtual	
  Machine	
  
Hadoop	
  
Common	
  JNI	
  
Spark	
  
IteraQve	
  In-­‐Memory	
  
ComputaQon	
  
MLLib	
  
Data	
  mining	
  
Pig	
  
Data	
  ManipulaQon	
  
Hive	
  
Structured	
  Query	
  
Phoenix	
  
JDBC	
  client	
  
Phoenix	
  Puts	
  the	
  SQL	
  Back	
  in	
  NoSQL	
  
o  Familiar	
  SQL	
  interface	
  for	
  (DDL,	
  DML,	
  etc.)	
  
o Read	
  only	
  views	
  on	
  exisQng	
  HBase	
  data	
  
o Dynamic	
  columns	
  extend	
  schema	
  at	
  runQme	
  
o Flashback	
  queries	
  using	
  versioned	
  schema	
  
o  Data	
  types	
  
o  Secondary	
  indexes	
  
o  Joins	
  
o  Query	
  planning	
  and	
  opQmizaQon	
  
o  MulQ-­‐tenancy	
  
Phoenix	
  Puts	
  the	
  SQL	
  Back	
  in	
  NoSQL	
  
Accessing	
  HBase	
  data	
  with	
  Phoenix	
  can	
  be	
  substanQally	
  easier	
  than	
  direct	
  
HBase	
  API	
  use	
  
	
  
	
  
Versus	
  
	
  
HTable t = new HTable(“foo”);
RegionScanner s = t.getScanner(new Scan(…,
new ValueFilter(CompareOp.GT,
new CustomTypedComparator(30)), …));
while ((Result r = s.next()) != null) {
// blah blah blah Java Java Java
}
s.close();
t.close();
SELECT * FROM foo WHERE bar > 30
HTable t = new HTable(“foo”);
RegionScanner s = t.getScanner(new Scan(…,
new ValueFilter(CompareOp.GT,
new CustomTypedComparator(30)), …));
while ((Result r = s.next()) != null) {
// blah blah blah Java Java Java
}
s.close();
t.close();
Your	
  BI	
  tool	
  
probably	
  can’t	
  do	
  
this	
  
(And	
  we	
  didn’t	
  include	
  error	
  handling…)	
  
STATE	
  OF	
  THE	
  UNION	
  
So	
  how’s	
  things?	
  
And	
  more	
  ….	
  
State	
  of	
  the	
  Union	
  
o  SQL	
  Support:	
  Broad	
  enough	
  to	
  run	
  TPC	
  queries	
  
o  Joins,	
  Sub-­‐queries,	
  Derived	
  tables,	
  etc.	
  
o  Secondary	
  Indices:	
  Three	
  different	
  strategies	
  
o  Immutable	
  for	
  write-­‐once/append	
  only	
  data	
  	
  
o  Global	
  for	
  read-­‐heavy	
  mutable	
  data	
  
o  Local	
  for	
  write-­‐heavy	
  mutable	
  or	
  immutable	
  data	
  
o  StaQsQcs	
  driven	
  parallel	
  execuQon	
  
o  Monitoring	
  &	
  Management	
  
o  Tracing,	
  metrics	
  
Join	
  and	
  Subquery	
  Support
o Grammar	
  
o inner	
  join;	
  le_/right/full	
  outer	
  join;	
  cross	
  join	
  
o AddiQonal	
  
o semi	
  join;	
  anQ	
  join	
  
o Algorithms	
  
o hash-­‐join;	
  sort-­‐merge-­‐join	
  
o OpQmizaQons	
  
o Predicate	
  push-­‐down;	
  FK-­‐to-­‐PK	
  join	
  opQmizaQon;	
  
Global	
  index	
  with	
  missing	
  data	
  columns;	
  Correlated	
  
subquery	
  rewrite	
  
TPC	
  Example	
  1	
  
Small-­‐QuanQty-­‐Order	
  Revenue	
  Query	
  (Q17)
select sum(l_extendedprice) / 7.0 as avg_yearly
from lineitem, part
where p_partkey = l_partkey
and p_brand = '[B]'
and p_container = '[C]'
and l_quantity < (
select 0.2 * avg(l_quantity)
from lineitem
where l_partkey = p_partkey
);
CLIENT 4-WAY FULL SCAN OVER lineitem
PARALLEL INNER JOIN TABLE 0
CLIENT 1-WAY FULL SCAN OVER part
SERVER FILTER BY p_partkey = ‘[B]’ AND p_container = ‘[C]’
PARALLEL INNER JOIN TABLE 1
CLIENT 4-WAY FULL SCAN OVER lineitem
SERVER AGGREGATE INTO DISTINCT ROWS BY l_partkey
AFTER-JOIN SERVER FILTER BY l_quantity < $0
TPC	
  Example	
  2	
  
Order	
  Priority	
  Checking	
  Query	
  (Q4)
select o_orderpriority, count(*) as order_count
from orders
where o_orderdate >= date '[D]'
and o_orderdate < date '[D]' + interval '3' month
and exists (
select * from lineitem
where l_orderkey = o_orderkey and l_commitdate < l_receiptdate
)
group by o_orderpriority
order by o_orderpriority;
CLIENT 4-WAY FULL SCAN OVER orders
SERVER FILTER o_orderdate >= ‘[D]’ AND o_orderdate < ‘[D]’ + 3(d)
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY o_orderpriority
CLIENT MERGE SORT
SKIP-SCAN JOIN TABLE 0
CLIENT 4-WAY FULL SCAN OVER lineitem
SERVER FILTER BY l_commitdate < l_receiptdate
DYNAMIC SERVER FILTER BY o_orderkey IN l_orderkey
Join	
  support	
  -­‐	
  what	
  can't	
  we	
  do?
o Nested	
  Loop	
  Join	
  
o StaQsQcs	
  Guided	
  Join	
  Algorithm	
  
o Smartly	
  choose	
  the	
  smaller	
  table	
  for	
  the	
  build	
  side	
  
o Smartly	
  switch	
  between	
  hash-­‐join	
  and	
  sort-­‐merge-­‐join	
  
o Smartly	
  turn	
  on/off	
  FK-­‐to-­‐PK	
  join	
  opQmizaQon
LATEST	
  RELEASES	
  
The	
  new	
  goodness,	
  from	
  our	
  community	
  to	
  your	
  datacenter!	
  
Phoenix	
  4.4	
  
o  HBase	
  1.0	
  Support	
  
o  Func*onal	
  Indexes	
  
o  User	
  Defined	
  Func*ons	
  
o  Query	
  Server	
  with	
  Thin	
  Driver	
  
o  Union	
  All	
  support	
  
o  TesQng	
  at	
  scale	
  with	
  Pherf	
  
o  MR	
  index	
  build	
  
o  Spark	
  integraQon	
  
o  Date	
  built-­‐in	
  funcQons	
  –	
  WEEK,	
  DAYOFMONTH,	
  etc.	
  
Phoenix	
  4.4:	
  FuncQonal	
  Indexes	
  
o CreaQng	
  an	
  index	
  on	
  an	
  expression	
  as	
  
opposed	
  to	
  just	
  a	
  column	
  value.	
  
o Example	
  without	
  index,	
  full	
  table	
  scan:	
  
SELECT	
  AVG(response_*me)	
  FROM	
  SERVER_METRICS	
  
WHERE	
  DAYOFMONTH(create_*me)	
  =	
  1	
  
o Add	
  funcQonal	
  index,	
  range	
  scan:	
  
CREATE	
  INDEX	
  day_of_month_idx	
  
ON	
  SERVER_METRICS	
  (DAYOFMONTH(create_*me))	
  
INCLUDE	
  (response_*me)	
  
Phoenix	
  4.4:	
  User	
  Defined	
  FuncQons	
  
o  Extension	
  points	
  to	
  Phoenix	
  for	
  domain-­‐specific	
  
funcQons.	
  
o  For	
  example,	
  a	
  geo-­‐locaQon	
  applicaQon	
  might	
  load	
  a	
  
set	
  of	
  UDFs	
  like	
  this:	
  
CREATE	
  FUNCTION	
  WOEID_DISTANCE(INTEGER,INTEGER)	
  	
  
RETURNS	
  INTEGER	
  AS	
  ‘org.apache.geo.woeidDistance’	
  
USING	
  JAR	
  ‘/lib/geo/geoloc.jar’	
  
o  Querying,	
  funcQonal	
  indexing,	
  etc.	
  then	
  possible:	
  
SELECT	
  *	
  FROM	
  woeid	
  a	
  JOIN	
  woeid	
  b	
  ON	
  a.country	
  =	
  b.country	
  
WHERE	
  woeid_distance(a.ID,b.ID)	
  <	
  5	
  
Phoenix	
  4.4:	
  Query	
  Server	
  +	
  Thin	
  Driver	
  
o  Offloads	
  query	
  planning	
  and	
  execuQon	
  to	
  different	
  
server(s)	
  
o  Minimizes	
  client	
  dependencies	
  
o  Enabler	
  for	
  ODBC	
  driver	
  (work	
  in	
  progress)	
  
o  Connect	
  like	
  this	
  instead:	
  
Connection conn = DriverManager.getConnection(
"jdbc:phoenix:thin:url=http://localhost:8765");
o  SQll	
  evolving:	
  no	
  backward	
  compa*bility	
  guarantees	
  yet	
  
	
  
h0p://phoenix.apache.org/server.html	
  
Phoenix	
  4.4	
  
o  HBase	
  1.0	
  Support	
  
o  FuncQonal	
  Indexes	
  
o  User	
  Defined	
  FuncQons	
  
o  Query	
  Server	
  with	
  Thin	
  Driver	
  
o  Union	
  All	
  support	
  
o  Tes*ng	
  at	
  scale	
  with	
  Pherf	
  
o  MR	
  index	
  build	
  
o  Spark	
  integra*on	
  
o  Date	
  built-­‐in	
  func*ons	
  –	
  WEEK,	
  DAYOFMONTH,	
  &c.	
  
	
  
Phoenix	
  4.5	
  
o  HBase	
  1.1	
  Support	
  
o  Client-­‐side	
  per-­‐statement	
  metrics	
  
o  SELECT	
  without	
  FROM	
  
o  Now	
  possible	
  to	
  alter	
  tables	
  with	
  views	
  
o  Math	
  built-­‐in	
  funcQons	
  –	
  SIGN,	
  SQRT,	
  CBRT,	
  EXP,	
  &c.	
  
o  Array	
  built-­‐in	
  funcQons	
  –	
  ARRAY_CAT,	
  ARRAY_FILL,	
  
&c.	
  
o  UDF,	
  View,	
  FuncQonal	
  Index	
  enhancements	
  
o  Pherf	
  and	
  Performance	
  enhancements	
  
TRANSACTIONS	
  
I	
  was	
  told	
  there	
  would	
  be…	
  
TransacQons	
  (Work	
  in	
  Progress)	
  
o  Snapshot	
  isolaQon	
  model	
  
o  Using	
  Tephra	
  (h0p://tephra.io/)	
  
o  Supports	
  REPEATABLE_READ	
  isolaQon	
  level	
  
o  Allows	
  reading	
  your	
  own	
  uncommi0ed	
  data	
  
o  OpQonal	
  
o  Enabled	
  on	
  a	
  table	
  by	
  table	
  basis	
  
o  No	
  performance	
  penalty	
  when	
  not	
  used	
  
o  Work	
  in	
  progress,	
  but	
  close	
  to	
  release	
  
o  Try	
  our	
  txn	
  branch	
  
o  Will	
  be	
  available	
  in	
  next	
  release	
  
OpQmisQc	
  Concurrency	
  Control	
  
•  Avoids cost of locking rows and tables
•  No deadlocks or lock escalations
•  Cost of conflict detection and possible rollback is higher
•  Good if conflicts are rare: short transaction, disjoint
partitioning of work
•  Conflict detection not always necessary: write-once/
append-only data
Tephra	
  Architecture	
  
ZooKeeper	

Tx Manager	

(standby)	

HBase	

Master 1	

Master 2 	

	

RS 1	

RS 2	

 RS 4	

RS 3	

Client 1	

Client 2	

Client N	

Tx Manager	

(active)
time out
try abort
failed
roll back
in HBase
write
to
HBase
do work
Client Tx Manager
none
complete V
abortsucceeded
in progress
start tx
start
start tx
commit
try commit check conflicts
invalid X
invalidate
failed
TransacQon	
  Lifecycle	
  
Tephra	
  Architecture	
  
•  TransactionAware client!
•  Coordinates transaction lifecycle with manager
•  Communicates directly with HBase for reads and writes
•  Transaction Manager
•  Assigns transaction IDs
•  Maintains state on in-progress, committed and invalid transactions!
•  Transaction Processor coprocessor
•  Applies server-side filtering for reads
•  Cleans up data from failed transactions, and no longer visible versions
WHAT’S	
  NEXT?	
  
A	
  brave	
  new	
  world	
  
What’s	
  Next?	
  
o Is	
  Phoenix	
  done?	
  
o What	
  about	
  the	
  Big	
  Picture?	
  
o How	
  can	
  Phoenix	
  be	
  leveraged	
  in	
  the	
  larger	
  
ecosystem?	
  
o Hive,	
  Pig,	
  MR	
  integraQon	
  with	
  Phoenix	
  today,	
  but	
  
it’s	
  not	
  a	
  great	
  story	
  
What’s	
  Next?	
  
You	
  are	
  here	
  
Moving	
  to	
  Apache	
  Calcite	
  
o  Query	
  parser,	
  compiler,	
  and	
  planner	
  framework	
  
o SQL-­‐92	
  compliant	
  (ever	
  argue	
  SQL	
  with	
  Julian?	
  :-­‐)	
  )	
  
o Enables	
  Phoenix	
  to	
  get	
  missing	
  SQL	
  support	
  
o  Pluggable	
  cost-­‐based	
  opQmizer	
  framework	
  
o Sane	
  way	
  to	
  model	
  push	
  down	
  through	
  rules	
  
o  Interop	
  with	
  other	
  Calcite	
  adaptors	
  
o Not	
  for	
  free,	
  but	
  it	
  becomes	
  feasible	
  
o Already	
  used	
  by	
  Drill,	
  Hive,	
  Kylin,	
  Samza	
  
o Supports	
  any	
  JDBC	
  source	
  (i.e.	
  RDBMS	
  -­‐	
  remember	
  
them	
  :-­‐)	
  )	
  
o One	
  cost-­‐model	
  to	
  rule	
  them	
  all	
  
How	
  does	
  Phoenix	
  plug	
  in?	
  
Calcite	
  Parser	
  &	
  Validator
Calcite	
  Query	
  OpQmizer
Phoenix	
  Query	
  Plan	
  Generator
Phoenix	
  RunQme
Phoenix	
  Tables	
  over	
  HBase	
  
JDBC	
  Client
SQL	
  +	
  Phoenix	
  
specific	
  
grammar
Built-­‐in	
  rules	
  
+	
  Phoenix	
  
specific	
  rules
OpQmizaQon	
  Rules	
  
o AggregateRemoveRule	
  
o FilterAggregateTransposeRule	
  
o FilterJoinRule	
  
o FilterMergeRule	
  
o JoinCommuteRule	
  
o PhoenixFilterScanMergeRule	
  
o PhoenixJoinSingleAggregateMergeRule	
  
o …	
  
Query	
  Example	
  	
  
(filter	
  push-­‐down	
  and	
  smart	
  join	
  algorithm)	
  
LogicalFilter	
  
filter:	
  $0	
  =	
  ‘x’
LogicalJoin	
  
type:	
  inner	
  
cond:	
  $3	
  =	
  $7
LogicalProject	
  
projects:	
  $0,	
  $5
LogicalTableScan	
  
table:	
  A
LogicalTableScan	
  
table:	
  B
PhoenixTableScan	
  
table:	
  ‘a’	
  
filter:	
  $0	
  =	
  ‘x’	
  
PhoenixServerJoin	
  
type:	
  inner	
  
cond:	
  $3	
  =	
  $1
PhoenixServerProject	
  
projects:	
  $2,	
  $0
OpQmizer	
  
(with	
  
RelOptRules	
  &	
  
ConvertRules)	
  
PhoenixTableScan	
  
table:	
  ‘b’	
  
PhoenixServerProject	
  
projects:	
  $0,	
  $2
PhoenixServerProject	
  
projects:	
  $0,	
  $3
Query	
  Example	
  
(filter	
  push-­‐down	
  and	
  smart	
  join	
  algorithm)
ScanPlan	
  
table:	
  ‘a’	
  
skip-­‐scan:	
  pk0	
  =	
  ‘x’	
  
projects:	
  pk0,	
  c3
HashJoinPlan	
  
types	
  {inner}
join-­‐keys:	
  {$1}
projects:	
  $2,	
  $0
Build	
  
hash-­‐key:	
  $1
Phoenix	
  
Implementor	
  
PhoenixTableScan	
  
table:	
  ‘a’	
  
filter:	
  $0	
  =	
  ‘x’	
  
PhoenixServerJoin	
  
type:	
  inner	
  
cond:	
  $3	
  =	
  $1
PhoenixServerProject	
  
projects:	
  $2,	
  $0
PhoenixTableScan	
  
table:	
  ‘b’	
  
PhoenixServerProject	
  
projects:	
  $0,	
  $2
PhoenixServerProject	
  
projects:	
  $0,	
  $3
ScanPlan	
  
table:	
  ‘b’	
  
projects:	
  col0,	
  col2	
  
Probe
Interoperability	
  Example	
  
o Joining	
  data	
  from	
  Phoenix	
  and	
  mySQL	
  
EnumerableJoin
PhoenixTableScan JdbcTableScan
Phoenix	
  Tables	
  over	
  HBase	
   mySQL	
  Database	
  
PhoenixToEnumerable	
  
Converter
JdbcToEnumerable	
  
Converter
Query	
  Example	
  1
WITH	
  m	
  AS	
  	
  
	
  	
  	
  	
  (SELECT	
  *	
  	
  
	
  	
  	
  	
  	
  FROM	
  dept_manager	
  dm	
  	
  
	
  	
  	
  	
  	
  WHERE	
  from_date	
  =	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  (SELECT	
  max(from_date)	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  FROM	
  dept_manager	
  dm2	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  WHERE	
  dm.dept_no	
  =	
  dm2.dept_no))	
  	
  
SELECT	
  m.dept_no,	
  d.dept_name,	
  e.first_name,	
  e.last_name	
  	
  
FROM	
  employees	
  e	
  	
  
JOIN	
  m	
  ON	
  e.emp_no	
  =	
  m.emp_no	
  
JOIN	
  departments	
  d	
  ON	
  d.dept_no	
  =	
  m.dept_no	
  
ORDER	
  BY	
  d.dept_no;
Query	
  Example	
  2	
  
SELECT	
  dept_no,	
  Qtle,	
  count(*)	
  	
  
FROM	
  Qtles	
  t	
  	
  
JOIN	
  dept_emp	
  de	
  ON	
  t.emp_no	
  =	
  de.emp_no	
  
WHERE	
  dept_no	
  <=	
  'd006'	
  
GROUP	
  BY	
  rollup(dept_no,	
  *tle)	
  	
  
ORDER	
  BY	
  dept_no,	
  Qtle;	
  
V5	
  
Thanks!	
  
@ApachePhoenix	
  
h0p://phoenix.apache.org	
  
2015-­‐09-­‐29	
  
Nick	
  Dimiduk	
  (@xefyr)	
  
h0p://n10k.com	
  
#apachebigdata	
  

More Related Content

What's hot

Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Phoenix: How (and why) we put the SQL back into the NoSQL
Phoenix: How (and why) we put the SQL back into the NoSQLPhoenix: How (and why) we put the SQL back into the NoSQL
Phoenix: How (and why) we put the SQL back into the NoSQLDataWorks Summit
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentDataWorks Summit/Hadoop Summit
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixRajeshbabu Chintaguntla
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
 

What's hot (20)

April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Phoenix: How (and why) we put the SQL back into the NoSQL
Phoenix: How (and why) we put the SQL back into the NoSQLPhoenix: How (and why) we put the SQL back into the NoSQL
Phoenix: How (and why) we put the SQL back into the NoSQL
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 

Viewers also liked

IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudAndrew Schofield
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaAndrew Schofield
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
IBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native MessagingIBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native MessagingAndrew Schofield
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConSeshika Fernando
 
Effectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging EnvironmentEffectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging EnvironmentAndrew Schofield
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Hybrid Messaging with IBM Bluemix
Hybrid Messaging with IBM BluemixHybrid Messaging with IBM Bluemix
Hybrid Messaging with IBM Bluemixmatthew1001
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Groupgethue
 
Hue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singaporegethue
 
August 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache HadoopAugust 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache HadoopYahoo Developer Network
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)Nick Dimiduk
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleNeha Narkhede
 
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing SmartcloudConnect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing SmartcloudMatteo Bisi
 

Viewers also liked (20)

IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
IBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native MessagingIBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native Messaging
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data Con
 
Effectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging EnvironmentEffectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging Environment
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Hybrid Messaging with IBM Bluemix
Hybrid Messaging with IBM BluemixHybrid Messaging with IBM Bluemix
Hybrid Messaging with IBM Bluemix
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
Hue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singapore
 
August 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache HadoopAugust 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache Hadoop
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
 
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing SmartcloudConnect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud
Connect 2017 DEV-1420 - Blue Mix and Domino – Complementing Smartcloud
 
HBase Data Types
HBase Data TypesHBase Data Types
HBase Data Types
 

Similar to Apache Big Data EU 2015 - Phoenix

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixDataWorks Summit
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-finalMaryann Xue
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data PlaneC4Media
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Hyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache SparkHyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache SparkDatabricks
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixDataWorks Summit
 
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: ScaleGraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: ScaleNeo4j
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4Open Networking Summits
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part isqlserver.co.il
 

Similar to Apache Big Data EU 2015 - Phoenix (20)

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data Plane
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Hyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache SparkHyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
 
Java one2013
Java one2013Java one2013
Java one2013
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
 
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: ScaleGraphConnect 2014 SF: From Zero to Graph in 120: Scale
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part i
 

More from Nick Dimiduk

Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
 

More from Nick Dimiduk (8)

Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
 

Recently uploaded

20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 

Recently uploaded (20)

20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 

Apache Big Data EU 2015 - Phoenix

  • 1. V5   The  Evolu*on  of  a  Rela*onal  Database  Layer  over  HBase   @ApachePhoenix   h0p://phoenix.apache.org   2015-­‐09-­‐29   Nick  Dimiduk  (@xefyr)   h0p://n10k.com   #apachebigdata  
  • 2. Agenda   o  What  is  Apache  Phoenix?   o  State  of  the  Union   o  Latest  Releases   o  A  brief  aside  on  TransacQons   o  What’s  Next?   o  Q  &  A  
  • 3. WHAT  IS  APACHE  PHOENIX   PuUng  the  SQL  back  into  NoSQL  
  • 4. What  is  Apache  Phoenix?   o  A  relaQonal  database  layer  for  Apache  HBase   o  Query  engine   o  Transforms  SQL  queries  into  naQve  HBase  API  calls   o  Pushes   as   much   work   as   possible   onto   the   cluster   for   parallel   execuQon   o  Metadata  repository   o  Typed  access  to  data  stored  in  HBase  tables   o  A  JDBC  driver   o  A  top  level  Apache  So_ware  FoundaQon  project   o  Originally  developed  at  Salesforce   o  A  growing  community  with  momentum  
  • 5. What  is  Apache  Phoenix?   ==   ==  
  • 6. Where  Does  Phoenix  Fit  In?  Sqoop     RDB  Data  Collector   Flume     Log  Data  Collector   Zookeeper   CoordinaQon   YARN  /  MRv2   Cluster  Resource   Manager  /   MapReduce   HDFS   Hadoop  Distributed  File  System   GraphX   Graph  analysis   framework   Phoenix   Query  execuQon  engine   HBase   Distributed  Database   The  Java  Virtual  Machine   Hadoop   Common  JNI   Spark   IteraQve  In-­‐Memory   ComputaQon   MLLib   Data  mining   Pig   Data  ManipulaQon   Hive   Structured  Query   Phoenix   JDBC  client  
  • 7. Phoenix  Puts  the  SQL  Back  in  NoSQL   o  Familiar  SQL  interface  for  (DDL,  DML,  etc.)   o Read  only  views  on  exisQng  HBase  data   o Dynamic  columns  extend  schema  at  runQme   o Flashback  queries  using  versioned  schema   o  Data  types   o  Secondary  indexes   o  Joins   o  Query  planning  and  opQmizaQon   o  MulQ-­‐tenancy  
  • 8. Phoenix  Puts  the  SQL  Back  in  NoSQL   Accessing  HBase  data  with  Phoenix  can  be  substanQally  easier  than  direct   HBase  API  use       Versus     HTable t = new HTable(“foo”); RegionScanner s = t.getScanner(new Scan(…, new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), …)); while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close(); SELECT * FROM foo WHERE bar > 30 HTable t = new HTable(“foo”); RegionScanner s = t.getScanner(new Scan(…, new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), …)); while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close(); Your  BI  tool   probably  can’t  do   this   (And  we  didn’t  include  error  handling…)  
  • 9. STATE  OF  THE  UNION   So  how’s  things?  
  • 11. State  of  the  Union   o  SQL  Support:  Broad  enough  to  run  TPC  queries   o  Joins,  Sub-­‐queries,  Derived  tables,  etc.   o  Secondary  Indices:  Three  different  strategies   o  Immutable  for  write-­‐once/append  only  data     o  Global  for  read-­‐heavy  mutable  data   o  Local  for  write-­‐heavy  mutable  or  immutable  data   o  StaQsQcs  driven  parallel  execuQon   o  Monitoring  &  Management   o  Tracing,  metrics  
  • 12. Join  and  Subquery  Support o Grammar   o inner  join;  le_/right/full  outer  join;  cross  join   o AddiQonal   o semi  join;  anQ  join   o Algorithms   o hash-­‐join;  sort-­‐merge-­‐join   o OpQmizaQons   o Predicate  push-­‐down;  FK-­‐to-­‐PK  join  opQmizaQon;   Global  index  with  missing  data  columns;  Correlated   subquery  rewrite  
  • 13. TPC  Example  1   Small-­‐QuanQty-­‐Order  Revenue  Query  (Q17) select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = '[B]' and p_container = '[C]' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); CLIENT 4-WAY FULL SCAN OVER lineitem PARALLEL INNER JOIN TABLE 0 CLIENT 1-WAY FULL SCAN OVER part SERVER FILTER BY p_partkey = ‘[B]’ AND p_container = ‘[C]’ PARALLEL INNER JOIN TABLE 1 CLIENT 4-WAY FULL SCAN OVER lineitem SERVER AGGREGATE INTO DISTINCT ROWS BY l_partkey AFTER-JOIN SERVER FILTER BY l_quantity < $0
  • 14. TPC  Example  2   Order  Priority  Checking  Query  (Q4) select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '[D]' and o_orderdate < date '[D]' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority; CLIENT 4-WAY FULL SCAN OVER orders SERVER FILTER o_orderdate >= ‘[D]’ AND o_orderdate < ‘[D]’ + 3(d) SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY o_orderpriority CLIENT MERGE SORT SKIP-SCAN JOIN TABLE 0 CLIENT 4-WAY FULL SCAN OVER lineitem SERVER FILTER BY l_commitdate < l_receiptdate DYNAMIC SERVER FILTER BY o_orderkey IN l_orderkey
  • 15. Join  support  -­‐  what  can't  we  do? o Nested  Loop  Join   o StaQsQcs  Guided  Join  Algorithm   o Smartly  choose  the  smaller  table  for  the  build  side   o Smartly  switch  between  hash-­‐join  and  sort-­‐merge-­‐join   o Smartly  turn  on/off  FK-­‐to-­‐PK  join  opQmizaQon
  • 16. LATEST  RELEASES   The  new  goodness,  from  our  community  to  your  datacenter!  
  • 17. Phoenix  4.4   o  HBase  1.0  Support   o  Func*onal  Indexes   o  User  Defined  Func*ons   o  Query  Server  with  Thin  Driver   o  Union  All  support   o  TesQng  at  scale  with  Pherf   o  MR  index  build   o  Spark  integraQon   o  Date  built-­‐in  funcQons  –  WEEK,  DAYOFMONTH,  etc.  
  • 18. Phoenix  4.4:  FuncQonal  Indexes   o CreaQng  an  index  on  an  expression  as   opposed  to  just  a  column  value.   o Example  without  index,  full  table  scan:   SELECT  AVG(response_*me)  FROM  SERVER_METRICS   WHERE  DAYOFMONTH(create_*me)  =  1   o Add  funcQonal  index,  range  scan:   CREATE  INDEX  day_of_month_idx   ON  SERVER_METRICS  (DAYOFMONTH(create_*me))   INCLUDE  (response_*me)  
  • 19. Phoenix  4.4:  User  Defined  FuncQons   o  Extension  points  to  Phoenix  for  domain-­‐specific   funcQons.   o  For  example,  a  geo-­‐locaQon  applicaQon  might  load  a   set  of  UDFs  like  this:   CREATE  FUNCTION  WOEID_DISTANCE(INTEGER,INTEGER)     RETURNS  INTEGER  AS  ‘org.apache.geo.woeidDistance’   USING  JAR  ‘/lib/geo/geoloc.jar’   o  Querying,  funcQonal  indexing,  etc.  then  possible:   SELECT  *  FROM  woeid  a  JOIN  woeid  b  ON  a.country  =  b.country   WHERE  woeid_distance(a.ID,b.ID)  <  5  
  • 20. Phoenix  4.4:  Query  Server  +  Thin  Driver   o  Offloads  query  planning  and  execuQon  to  different   server(s)   o  Minimizes  client  dependencies   o  Enabler  for  ODBC  driver  (work  in  progress)   o  Connect  like  this  instead:   Connection conn = DriverManager.getConnection( "jdbc:phoenix:thin:url=http://localhost:8765"); o  SQll  evolving:  no  backward  compa*bility  guarantees  yet     h0p://phoenix.apache.org/server.html  
  • 21. Phoenix  4.4   o  HBase  1.0  Support   o  FuncQonal  Indexes   o  User  Defined  FuncQons   o  Query  Server  with  Thin  Driver   o  Union  All  support   o  Tes*ng  at  scale  with  Pherf   o  MR  index  build   o  Spark  integra*on   o  Date  built-­‐in  func*ons  –  WEEK,  DAYOFMONTH,  &c.    
  • 22. Phoenix  4.5   o  HBase  1.1  Support   o  Client-­‐side  per-­‐statement  metrics   o  SELECT  without  FROM   o  Now  possible  to  alter  tables  with  views   o  Math  built-­‐in  funcQons  –  SIGN,  SQRT,  CBRT,  EXP,  &c.   o  Array  built-­‐in  funcQons  –  ARRAY_CAT,  ARRAY_FILL,   &c.   o  UDF,  View,  FuncQonal  Index  enhancements   o  Pherf  and  Performance  enhancements  
  • 23. TRANSACTIONS   I  was  told  there  would  be…  
  • 24. TransacQons  (Work  in  Progress)   o  Snapshot  isolaQon  model   o  Using  Tephra  (h0p://tephra.io/)   o  Supports  REPEATABLE_READ  isolaQon  level   o  Allows  reading  your  own  uncommi0ed  data   o  OpQonal   o  Enabled  on  a  table  by  table  basis   o  No  performance  penalty  when  not  used   o  Work  in  progress,  but  close  to  release   o  Try  our  txn  branch   o  Will  be  available  in  next  release  
  • 25. OpQmisQc  Concurrency  Control   •  Avoids cost of locking rows and tables •  No deadlocks or lock escalations •  Cost of conflict detection and possible rollback is higher •  Good if conflicts are rare: short transaction, disjoint partitioning of work •  Conflict detection not always necessary: write-once/ append-only data
  • 26. Tephra  Architecture   ZooKeeper Tx Manager (standby) HBase Master 1 Master 2 RS 1 RS 2 RS 4 RS 3 Client 1 Client 2 Client N Tx Manager (active)
  • 27. time out try abort failed roll back in HBase write to HBase do work Client Tx Manager none complete V abortsucceeded in progress start tx start start tx commit try commit check conflicts invalid X invalidate failed TransacQon  Lifecycle  
  • 28. Tephra  Architecture   •  TransactionAware client! •  Coordinates transaction lifecycle with manager •  Communicates directly with HBase for reads and writes •  Transaction Manager •  Assigns transaction IDs •  Maintains state on in-progress, committed and invalid transactions! •  Transaction Processor coprocessor •  Applies server-side filtering for reads •  Cleans up data from failed transactions, and no longer visible versions
  • 29. WHAT’S  NEXT?   A  brave  new  world  
  • 30. What’s  Next?   o Is  Phoenix  done?   o What  about  the  Big  Picture?   o How  can  Phoenix  be  leveraged  in  the  larger   ecosystem?   o Hive,  Pig,  MR  integraQon  with  Phoenix  today,  but   it’s  not  a  great  story  
  • 31. What’s  Next?   You  are  here  
  • 32. Moving  to  Apache  Calcite   o  Query  parser,  compiler,  and  planner  framework   o SQL-­‐92  compliant  (ever  argue  SQL  with  Julian?  :-­‐)  )   o Enables  Phoenix  to  get  missing  SQL  support   o  Pluggable  cost-­‐based  opQmizer  framework   o Sane  way  to  model  push  down  through  rules   o  Interop  with  other  Calcite  adaptors   o Not  for  free,  but  it  becomes  feasible   o Already  used  by  Drill,  Hive,  Kylin,  Samza   o Supports  any  JDBC  source  (i.e.  RDBMS  -­‐  remember   them  :-­‐)  )   o One  cost-­‐model  to  rule  them  all  
  • 33. How  does  Phoenix  plug  in?   Calcite  Parser  &  Validator Calcite  Query  OpQmizer Phoenix  Query  Plan  Generator Phoenix  RunQme Phoenix  Tables  over  HBase   JDBC  Client SQL  +  Phoenix   specific   grammar Built-­‐in  rules   +  Phoenix   specific  rules
  • 34. OpQmizaQon  Rules   o AggregateRemoveRule   o FilterAggregateTransposeRule   o FilterJoinRule   o FilterMergeRule   o JoinCommuteRule   o PhoenixFilterScanMergeRule   o PhoenixJoinSingleAggregateMergeRule   o …  
  • 35. Query  Example     (filter  push-­‐down  and  smart  join  algorithm)   LogicalFilter   filter:  $0  =  ‘x’ LogicalJoin   type:  inner   cond:  $3  =  $7 LogicalProject   projects:  $0,  $5 LogicalTableScan   table:  A LogicalTableScan   table:  B PhoenixTableScan   table:  ‘a’   filter:  $0  =  ‘x’   PhoenixServerJoin   type:  inner   cond:  $3  =  $1 PhoenixServerProject   projects:  $2,  $0 OpQmizer   (with   RelOptRules  &   ConvertRules)   PhoenixTableScan   table:  ‘b’   PhoenixServerProject   projects:  $0,  $2 PhoenixServerProject   projects:  $0,  $3
  • 36. Query  Example   (filter  push-­‐down  and  smart  join  algorithm) ScanPlan   table:  ‘a’   skip-­‐scan:  pk0  =  ‘x’   projects:  pk0,  c3 HashJoinPlan   types  {inner} join-­‐keys:  {$1} projects:  $2,  $0 Build   hash-­‐key:  $1 Phoenix   Implementor   PhoenixTableScan   table:  ‘a’   filter:  $0  =  ‘x’   PhoenixServerJoin   type:  inner   cond:  $3  =  $1 PhoenixServerProject   projects:  $2,  $0 PhoenixTableScan   table:  ‘b’   PhoenixServerProject   projects:  $0,  $2 PhoenixServerProject   projects:  $0,  $3 ScanPlan   table:  ‘b’   projects:  col0,  col2   Probe
  • 37. Interoperability  Example   o Joining  data  from  Phoenix  and  mySQL   EnumerableJoin PhoenixTableScan JdbcTableScan Phoenix  Tables  over  HBase   mySQL  Database   PhoenixToEnumerable   Converter JdbcToEnumerable   Converter
  • 38. Query  Example  1 WITH  m  AS            (SELECT  *              FROM  dept_manager  dm              WHERE  from_date  =                      (SELECT  max(from_date)                        FROM  dept_manager  dm2                        WHERE  dm.dept_no  =  dm2.dept_no))     SELECT  m.dept_no,  d.dept_name,  e.first_name,  e.last_name     FROM  employees  e     JOIN  m  ON  e.emp_no  =  m.emp_no   JOIN  departments  d  ON  d.dept_no  =  m.dept_no   ORDER  BY  d.dept_no;
  • 39. Query  Example  2   SELECT  dept_no,  Qtle,  count(*)     FROM  Qtles  t     JOIN  dept_emp  de  ON  t.emp_no  =  de.emp_no   WHERE  dept_no  <=  'd006'   GROUP  BY  rollup(dept_no,  *tle)     ORDER  BY  dept_no,  Qtle;  
  • 40. V5   Thanks!   @ApachePhoenix   h0p://phoenix.apache.org   2015-­‐09-­‐29   Nick  Dimiduk  (@xefyr)   h0p://n10k.com   #apachebigdata