SlideShare a Scribd company logo
1 of 40
Download to read offline
State	
  of	
  the	
  Database	
  
@HBase	
  
h)p://hbase.apache.org	
  
2015-­‐09-­‐28	
  
Nick	
  Dimiduk	
  (@xefyr)	
  
h)p://n10k.com	
  
#apachebigdata	
  
Agenda	
  
o State	
  of	
  the	
  Project	
  
o State	
  of	
  the	
  SoMware	
  
o State	
  of	
  the	
  Ecosystem	
  
o Latest	
  Releases	
  
o Bonus	
  Content!	
  
o Q	
  &	
  A	
  
STATE	
  OF	
  THE	
  PROJECT	
  
Who	
  we	
  are,	
  what	
  we	
  do,	
  why	
  we	
  do	
  it	
  
Project:	
  Vision	
  
Simple,	
  steady,	
  and	
  powerful:	
  “A	
  first	
  class	
  high	
  
performance	
  horizontally	
  scalable	
  data	
  storage	
  
engine	
  for	
  Big	
  Data,	
  suitable	
  as	
  the	
  store	
  of	
  
record	
  for	
  mission	
  cri]cal	
  data.”	
  
Project:	
  Usage	
  
o  Data	
  access	
  for	
  medium-­‐	
  and	
  high-­‐scale	
  services	
  
o Hundreds	
  of	
  enterprises	
  and	
  startups	
  
o Some	
  of	
  the	
  largest	
  Internet	
  companies	
  in	
  the	
  world	
  
o  Running	
  major	
  produc]on	
  workloads	
  since	
  2011	
  
o  Use-­‐cases	
  
o messaging,	
  security,	
  measurement/“IoT”,	
  
collabora]on,	
  digital	
  media,	
  digital	
  adver]sing,	
  
telecommunica]ons,	
  computa]onal	
  biology,	
  clinical	
  
informa]cs/healthcare,	
  insurance	
  
Project:	
  Goals	
  
o Availability:	
  Always	
  more,	
  always	
  faster	
  
o Stability	
  and	
  operability	
  
o Scaling	
  up,	
  scaling	
  down	
  
o Up-­‐to-­‐date	
  with	
  “commodity”	
  hardware	
  
o Mul]-­‐tenancy	
  
o Diversity	
  of	
  ecosystem	
  
STATE	
  OF	
  THE	
  SOFTWARE	
  
Regarding	
  the	
  codebase	
  
State	
  of	
  the	
  SoMware	
  
o Mature	
  codebase	
  
o 100+	
  contributors	
  (40+	
  commi)ers)	
  
o 1.1M	
  lines	
  of	
  code	
  (each	
  ac]ve	
  branch)	
  
o est.	
  1200+	
  human-­‐years’	
  effort	
  
o  Clusters	
  sizes	
  from	
  10	
  to	
  1000+	
  machines	
  
o  that	
  we	
  know	
  of!	
  
o Runs	
  on	
  HDFS,	
  MapR,	
  Gluster,	
  GPFS,	
  Lustre	
  
o HBase	
  as	
  a	
  Service	
  
o AWS/EMR,	
  HDInsight,	
  Qubole,	
  Google	
  (sort-­‐of)	
  
SoMware:	
  Releases	
  
SoMware:	
  Seman]c	
  Versioning	
  
MAJOR-­‐MINOR-­‐PATCH[-­‐iden]fier]	
  
	
  	
  
o Client/Server	
  wire	
  compa]bility	
  
o Server/Server	
  feature	
  compa]bility	
  
o API	
  compliance	
  guarantees	
  
o ABI	
  compliance	
  guarantees	
  
	
  
	
  
	
  	
  	
  
h)p://hbase.apache.org/book.html#hbase.versioning	
  
SoMware:	
  Ac]ve	
  Development	
  
o Smaller	
  regions,	
  more	
  regions	
  
o Less	
  write	
  amplifica]on	
  
o 1M+	
  region	
  clusters	
  
o Stability	
  
o ProcedureV2	
  
o Assignment	
  improvements/stability	
  
o Backup,	
  restore	
  tools	
  
o Built	
  on	
  snapshots,	
  easier	
  opera]ons	
  
SoMware:	
  Ac]ve	
  Development	
  
o Adap]on:	
  Workloads	
  
o HBase	
  as	
  Medium	
  Object	
  Store	
  (MOB)	
  
o Tunable	
  Availability	
  
o Region	
  replicas	
  
o TIMELINE	
  consistency	
  
o Coprocessor	
  API	
  stability	
  
o Less	
  GC,	
  more	
  RAM	
  (off-­‐heap)	
  
SoMware:	
  Ac]ve	
  Development	
  
o Mul]-­‐tenancy	
  
o Table	
  groups	
  
o Quotas	
  
o Priori]es	
  
o Improved	
  machine	
  u]liza]on	
  
o More	
  RAM	
  (100’s	
  of	
  GB)	
  
o IOPS	
  
o Be)er	
  concurrency	
  
STATE	
  OF	
  THE	
  ECOSYSTEM	
  
The	
  whole	
  enchilada	
  
State	
  of	
  the	
  Ecosystem	
  
o OpenTSDB	
  
o Transac]on	
  Managers	
  
o Themis,	
  Tephera,	
  Omid2,	
  LeanXcale	
  
o Graph	
  engines	
  
o Titan,	
  Giraph,	
  Zen,	
  S2Graph	
  
o Myriad	
  SQL’s	
  
o Other	
  Hadoop	
  components	
  
o Google	
  Cloud	
  Bigtable	
  
Ecosystem:	
  SQL	
  
Ecosystem:	
  Hadoop	
  Components	
  
o YARN-­‐2928	
  Applica]on	
  Timeline	
  Service	
  
o HIVE-­‐9452	
  HBase	
  to	
  store	
  Hive	
  metadata	
  
o AMBARI-­‐5707	
  Ambari	
  Metrics	
  System	
  
LATEST	
  RELEASES	
  
Come	
  and	
  get	
  it!	
  
Release:	
  0.94	
  
o Last	
  (final?)	
  release:	
  0.94.27,	
  2015-­‐03-­‐26	
  
o “ancient	
  history”	
  
o No	
  new	
  deployments	
  
o Exis]ng	
  users	
  highly	
  encouraged	
  to	
  upgrade	
  
o Requires	
  down]me	
  to	
  upgrade	
  
😫	
   😡	
  (╯°□°)╯︵	
  ┻━┻	
  
Release:	
  0.98	
  
o Last	
  release:	
  0.98.14,	
  2015-­‐08-­‐31	
  
o “legacy”	
  
o Most	
  produc]on	
  deploys	
  (probably)	
  
o Largest	
  produc]on	
  clusters	
  (probably)	
  
o New	
  features	
  back-­‐ported	
  when	
  possible	
  
Release	
  1.x	
  
o Last	
  release:	
  1.1.2,	
  2015-­‐09-­‐01	
  
o “stable”	
  
o Produc]on	
  deploys	
  moving	
  here	
  
o Ac]ve	
  development	
  
o Rolling	
  upgrade	
  from	
  0.98.x	
  
😄	
   😍	
  ヽ(´ー`)ノ	
  
Release	
  1.0	
  
o Released	
  1.0.0,	
  2015-­‐02-­‐24	
  
o Adop]ng	
  seman]c	
  versioning	
  
o Patch	
  releases	
  don’t	
  quite	
  follow	
  spec	
  yet	
  
o Client	
  /	
  Server	
  API	
  cleanup	
  
o Interfaces,	
  builder	
  pa)ern,	
  @InterfaceAudience	
  
o Region	
  Replicas	
  
o Trade	
  Consistency,	
  resources	
  for	
  Availability	
  
	
  
	
  
github.com/ndimiduk/hbase-­‐1.0-­‐api-­‐examples	
  
Region	
  Replicas	
  
o Mul]ple	
  Region	
  Servers	
  host	
  each	
  region	
  
o Primary	
  +	
  N	
  read	
  replicas	
  (usually	
  N=2)	
  
o Primary	
  is	
  authority	
  on	
  reads	
  and	
  writes	
  
o Replicas	
  tail	
  replicate	
  edits,	
  offer	
  TIMELINE	
  view	
  
o Client’s	
  choice	
  
o Read	
  primary	
  only	
  for	
  “classic”	
  strong	
  consistency	
  
o Fan-­‐out	
  reads	
  for	
  faster,	
  poten]ally	
  TIMELINE	
  
results	
  
Release	
  1.1	
  
o  Release	
  1.1.0,	
  2015-­‐05-­‐15	
  
o  Async	
  RPC	
  client	
  
o  Scanner	
  improvements	
  
o RPC	
  chunking,	
  heartbeat	
  messages,	
  API	
  
o  RPC	
  thro)ling	
  
o quotas	
  for	
  per	
  user,	
  table,	
  namespace	
  
o  Compac]on	
  thro)ling,	
  monitoring	
  
o  ProcedureV2	
  
o Improved	
  opera]onal	
  reliability	
  
ProcedureV2	
  
o Distributed,	
  fault-­‐tolerant	
  opera]ons	
  
o Mul]ple	
  steps	
  on	
  mul]ple	
  machines	
  
o Roll-­‐back	
  in	
  case	
  of	
  failure	
  
o Coordina]on	
  of	
  long-­‐running	
  procedures	
  
o Compac]ons,	
  splits,	
  &c.	
  
o Progress	
  tracking	
  
o No]fica]ons	
  across	
  mul]ple	
  machines	
  
o Current	
  status	
  inquiries	
  
Branch-­‐1.2	
  
o Next	
  up	
  in	
  1.x	
  line	
  
o Java	
  8	
  support	
  
o Na]ve	
  checksums	
  
o SyncTable	
  
	
  
	
  
	
  
o Flush-­‐per-­‐store	
  
o ProcV2	
  all	
  the	
  things!	
  
o (More)	
  Compac]on	
  
improvements	
  
o Region	
  normalizer	
  
Region	
  Normalizer	
  
o  An]-­‐entropy	
  for	
  region	
  size	
  
o  Converge	
  towards	
  uniform	
  size	
  
o  Compliments	
  balancer	
  working	
  toward	
  uniform	
  distribu]on	
  
o  Managed	
  by	
  Master,	
  runs	
  in	
  the	
  background	
  (like	
  balancer)	
  
o  Pluggable	
  normaliza]on	
  strategies	
  (“simple”	
  default)	
  
o  Use-­‐cases	
  
o  Merge	
  away	
  regions	
  from	
  expired	
  ]meseries	
  data	
  
o  Smooth	
  uneven	
  bulk	
  loads	
  
o  Correct	
  operator	
  ini]al	
  split	
  guesses	
  
o  Ease	
  upgrades	
  from	
  ancient	
  versions	
  (0.92/1g	
  vs.	
  today/20g)	
  
Thanks!	
  
@HBase	
  
h)p://hbase.apache.org	
  
2015-­‐09-­‐28	
  
Nick	
  Dimiduk	
  (@xefyr)	
  
h)p://n10k.com	
  
#apachebigdata	
  
BONUS	
  CONTENT!	
  
Ask	
  and	
  you	
  shall	
  receive	
  
Agenda	
  
o Replica]on	
  
o Filters	
  
o Coprocessors	
  
Replica]on	
  
o  Keep	
  data	
  synchronized	
  between	
  clusters	
  
o  Supports	
  mulOple	
  desOnaOons	
  
o  Cyclical	
  graphs	
  supported	
  
o  Configurable	
  at	
  Column	
  Family	
  granularity	
  	
  
o  Uses	
  WAL	
  shipping	
  to	
  propagate	
  data	
  
o  Replica]on	
  state,	
  status	
  stored	
  in	
  ZooKeeper	
  
o  General	
  purpose	
  interface	
  for	
  asynchronously	
  shipping	
  
edits	
  from	
  a	
  cluster	
  
o  Other	
  HBase	
  clusters,	
  Region	
  Replicas,	
  SOLR/Elas]cSearch	
  
	
  
hbase.apache.org/book.html#_cluster_replica]on	
  
Filters	
  
o  Addi]onal	
  applied	
  to	
  reads	
  
o Use	
  in	
  conjunc]on	
  with	
  specifying	
  start,	
  end	
  rows,	
  &c.	
  
o  Run	
  on	
  the	
  Region	
  Servers	
  
o Included	
  in	
  GET,	
  SCAN	
  request	
  
o  Explicitly	
  exclude	
  data	
  based	
  on	
  criteria	
  
o I.E.,	
  value	
  >=	
  10	
  
o  Implicitly	
  exclude	
  data	
  by	
  hin]ng	
  seeks	
  
o INCLUDE_AND_NEXT_COL,	
  NEXT_ROW,	
  
SEEK_NEXT_USING_HINT	
  
o  Operate	
  on	
  data	
  read	
  from	
  BlockCache	
  
Filters	
  
o 30+	
  Filters	
  included	
  in	
  distribu]on	
  
o Mini-­‐language	
  for	
  use	
  in	
  ThriM,	
  REST	
  
o  "(PrefixFilter ('row2') AND (QualifierFilter (>=,
'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"
o  hbase.apache.org/book.html#thriM.filter_language	
  
o Simple	
  interface,	
  Implement	
  your	
  own!	
  
public class PageFilter extends FilterBase {
public PageFilter(long pageSize) {…}
public boolean filterRowKey(Cell c) {
return false;
}
public ReturnCode filterKeyValue(Cell c) {
return ReturnCode.INCLUDE;
}
public boolean filterAllRemaining() {
return this.rowsAccepted >= this.pageSize;
}
public filterRow() {
this.rowsAccepted++;
return this.rowsAccepted > this.pageSize;
}
}
Coprocessors	
  
o  Extension	
  points	
  for	
  HBase	
  
o Think	
  Linux	
  Kernel	
  Module,	
  not	
  Stored	
  Procedure	
  
o I.E.,	
  customize	
  compac]ons,	
  Table	
  constraints	
  
o  Observers	
  
o pre-­‐	
  and	
  post-­‐execu]on	
  logic	
  
o I.E.,	
  MasterObserver#preTruncateTable,	
  
RegionObserver#postScannerNext	
  
o  Endpoints	
  
o Cluster	
  RPC	
  extensions	
  
o I.E.,	
  RowCountEndpoint,	
  BulkDeleteEndpoint	
  
public class RowCountEndpoint implements
ExampleProtos.RowCountService {
public void getRowCount(…) {
Scan = new Scan();
InternalScanner scanner =
env.getRegion().getScanner(scan);
…
do {
count++;
} while (scanner.next());
// return count
}
}
Thanks!	
  
@HBase	
  
h)p://hbase.apache.org	
  
2015-­‐09-­‐28	
  
Nick	
  Dimiduk	
  (@xefyr)	
  
h)p://n10k.com	
  
#apachebigdata	
  

More Related Content

What's hot

HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future HBaseCon
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 
HBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase ReplicationHBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase ReplicationCloudera, Inc.
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 

What's hot (20)

HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
HBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase ReplicationHBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase Replication
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 

Viewers also liked

Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick GuideAsim Jalis
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics QuotesCloudlytics
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBernard Marr
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBernard Marr
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 

Viewers also liked (13)

Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick Guide
 
Data analytics
Data analyticsData analytics
Data analytics
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
HBase Data Types
HBase Data TypesHBase Data Types
HBase Data Types
 
Big data road map
Big data road mapBig data road map
Big data road map
 
11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business Needs
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Similar to Apache Big Data EU 2015 - HBase

HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Futurercastain
 
HPC Resource Management: Futures
HPC Resource Management: FuturesHPC Resource Management: Futures
HPC Resource Management: Futuresrcastain
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
 
Scaling and High Performance Storage System: LeoFS
Scaling and High Performance Storage System: LeoFSScaling and High Performance Storage System: LeoFS
Scaling and High Performance Storage System: LeoFSRakuten Group, Inc.
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8Rahul Gupta
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
(Julien le dem) parquet
(Julien le dem)   parquet(Julien le dem)   parquet
(Julien le dem) parquetNAVER D2
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersEdelweiss Kammermann
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems Aparna Gaonkar
 
Sql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevSql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevAlex Tumanoff
 
Windows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best PracticesWindows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best PracticesAnton Vidishchev
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Accelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAccelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAmazon Web Services
 

Similar to Apache Big Data EU 2015 - HBase (20)

HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
 
HPC Resource Management: Futures
HPC Resource Management: FuturesHPC Resource Management: Futures
HPC Resource Management: Futures
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
Scaling and High Performance Storage System: LeoFS
Scaling and High Performance Storage System: LeoFSScaling and High Performance Storage System: LeoFS
Scaling and High Performance Storage System: LeoFS
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
(Julien le dem) parquet
(Julien le dem)   parquet(Julien le dem)   parquet
(Julien le dem) parquet
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems
 
Sql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevSql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton Vidishchev
 
Windows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best PracticesWindows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best Practices
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Accelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAccelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of Genomics
 

More from Nick Dimiduk

Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)Nick Dimiduk
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
 

More from Nick Dimiduk (10)

Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Apache Big Data EU 2015 - HBase

  • 1. State  of  the  Database   @HBase   h)p://hbase.apache.org   2015-­‐09-­‐28   Nick  Dimiduk  (@xefyr)   h)p://n10k.com   #apachebigdata  
  • 2. Agenda   o State  of  the  Project   o State  of  the  SoMware   o State  of  the  Ecosystem   o Latest  Releases   o Bonus  Content!   o Q  &  A  
  • 3. STATE  OF  THE  PROJECT   Who  we  are,  what  we  do,  why  we  do  it  
  • 4. Project:  Vision   Simple,  steady,  and  powerful:  “A  first  class  high   performance  horizontally  scalable  data  storage   engine  for  Big  Data,  suitable  as  the  store  of   record  for  mission  cri]cal  data.”  
  • 5. Project:  Usage   o  Data  access  for  medium-­‐  and  high-­‐scale  services   o Hundreds  of  enterprises  and  startups   o Some  of  the  largest  Internet  companies  in  the  world   o  Running  major  produc]on  workloads  since  2011   o  Use-­‐cases   o messaging,  security,  measurement/“IoT”,   collabora]on,  digital  media,  digital  adver]sing,   telecommunica]ons,  computa]onal  biology,  clinical   informa]cs/healthcare,  insurance  
  • 6.
  • 7. Project:  Goals   o Availability:  Always  more,  always  faster   o Stability  and  operability   o Scaling  up,  scaling  down   o Up-­‐to-­‐date  with  “commodity”  hardware   o Mul]-­‐tenancy   o Diversity  of  ecosystem  
  • 8.
  • 9. STATE  OF  THE  SOFTWARE   Regarding  the  codebase  
  • 10. State  of  the  SoMware   o Mature  codebase   o 100+  contributors  (40+  commi)ers)   o 1.1M  lines  of  code  (each  ac]ve  branch)   o est.  1200+  human-­‐years’  effort   o  Clusters  sizes  from  10  to  1000+  machines   o  that  we  know  of!   o Runs  on  HDFS,  MapR,  Gluster,  GPFS,  Lustre   o HBase  as  a  Service   o AWS/EMR,  HDInsight,  Qubole,  Google  (sort-­‐of)  
  • 12. SoMware:  Seman]c  Versioning   MAJOR-­‐MINOR-­‐PATCH[-­‐iden]fier]       o Client/Server  wire  compa]bility   o Server/Server  feature  compa]bility   o API  compliance  guarantees   o ABI  compliance  guarantees             h)p://hbase.apache.org/book.html#hbase.versioning  
  • 13. SoMware:  Ac]ve  Development   o Smaller  regions,  more  regions   o Less  write  amplifica]on   o 1M+  region  clusters   o Stability   o ProcedureV2   o Assignment  improvements/stability   o Backup,  restore  tools   o Built  on  snapshots,  easier  opera]ons  
  • 14. SoMware:  Ac]ve  Development   o Adap]on:  Workloads   o HBase  as  Medium  Object  Store  (MOB)   o Tunable  Availability   o Region  replicas   o TIMELINE  consistency   o Coprocessor  API  stability   o Less  GC,  more  RAM  (off-­‐heap)  
  • 15. SoMware:  Ac]ve  Development   o Mul]-­‐tenancy   o Table  groups   o Quotas   o Priori]es   o Improved  machine  u]liza]on   o More  RAM  (100’s  of  GB)   o IOPS   o Be)er  concurrency  
  • 16. STATE  OF  THE  ECOSYSTEM   The  whole  enchilada  
  • 17. State  of  the  Ecosystem   o OpenTSDB   o Transac]on  Managers   o Themis,  Tephera,  Omid2,  LeanXcale   o Graph  engines   o Titan,  Giraph,  Zen,  S2Graph   o Myriad  SQL’s   o Other  Hadoop  components   o Google  Cloud  Bigtable  
  • 19. Ecosystem:  Hadoop  Components   o YARN-­‐2928  Applica]on  Timeline  Service   o HIVE-­‐9452  HBase  to  store  Hive  metadata   o AMBARI-­‐5707  Ambari  Metrics  System  
  • 20. LATEST  RELEASES   Come  and  get  it!  
  • 21. Release:  0.94   o Last  (final?)  release:  0.94.27,  2015-­‐03-­‐26   o “ancient  history”   o No  new  deployments   o Exis]ng  users  highly  encouraged  to  upgrade   o Requires  down]me  to  upgrade   😫   😡  (╯°□°)╯︵  ┻━┻  
  • 22. Release:  0.98   o Last  release:  0.98.14,  2015-­‐08-­‐31   o “legacy”   o Most  produc]on  deploys  (probably)   o Largest  produc]on  clusters  (probably)   o New  features  back-­‐ported  when  possible  
  • 23. Release  1.x   o Last  release:  1.1.2,  2015-­‐09-­‐01   o “stable”   o Produc]on  deploys  moving  here   o Ac]ve  development   o Rolling  upgrade  from  0.98.x   😄   😍  ヽ(´ー`)ノ  
  • 24. Release  1.0   o Released  1.0.0,  2015-­‐02-­‐24   o Adop]ng  seman]c  versioning   o Patch  releases  don’t  quite  follow  spec  yet   o Client  /  Server  API  cleanup   o Interfaces,  builder  pa)ern,  @InterfaceAudience   o Region  Replicas   o Trade  Consistency,  resources  for  Availability       github.com/ndimiduk/hbase-­‐1.0-­‐api-­‐examples  
  • 25. Region  Replicas   o Mul]ple  Region  Servers  host  each  region   o Primary  +  N  read  replicas  (usually  N=2)   o Primary  is  authority  on  reads  and  writes   o Replicas  tail  replicate  edits,  offer  TIMELINE  view   o Client’s  choice   o Read  primary  only  for  “classic”  strong  consistency   o Fan-­‐out  reads  for  faster,  poten]ally  TIMELINE   results  
  • 26. Release  1.1   o  Release  1.1.0,  2015-­‐05-­‐15   o  Async  RPC  client   o  Scanner  improvements   o RPC  chunking,  heartbeat  messages,  API   o  RPC  thro)ling   o quotas  for  per  user,  table,  namespace   o  Compac]on  thro)ling,  monitoring   o  ProcedureV2   o Improved  opera]onal  reliability  
  • 27. ProcedureV2   o Distributed,  fault-­‐tolerant  opera]ons   o Mul]ple  steps  on  mul]ple  machines   o Roll-­‐back  in  case  of  failure   o Coordina]on  of  long-­‐running  procedures   o Compac]ons,  splits,  &c.   o Progress  tracking   o No]fica]ons  across  mul]ple  machines   o Current  status  inquiries  
  • 28. Branch-­‐1.2   o Next  up  in  1.x  line   o Java  8  support   o Na]ve  checksums   o SyncTable         o Flush-­‐per-­‐store   o ProcV2  all  the  things!   o (More)  Compac]on   improvements   o Region  normalizer  
  • 29. Region  Normalizer   o  An]-­‐entropy  for  region  size   o  Converge  towards  uniform  size   o  Compliments  balancer  working  toward  uniform  distribu]on   o  Managed  by  Master,  runs  in  the  background  (like  balancer)   o  Pluggable  normaliza]on  strategies  (“simple”  default)   o  Use-­‐cases   o  Merge  away  regions  from  expired  ]meseries  data   o  Smooth  uneven  bulk  loads   o  Correct  operator  ini]al  split  guesses   o  Ease  upgrades  from  ancient  versions  (0.92/1g  vs.  today/20g)  
  • 30. Thanks!   @HBase   h)p://hbase.apache.org   2015-­‐09-­‐28   Nick  Dimiduk  (@xefyr)   h)p://n10k.com   #apachebigdata  
  • 31. BONUS  CONTENT!   Ask  and  you  shall  receive  
  • 32. Agenda   o Replica]on   o Filters   o Coprocessors  
  • 33. Replica]on   o  Keep  data  synchronized  between  clusters   o  Supports  mulOple  desOnaOons   o  Cyclical  graphs  supported   o  Configurable  at  Column  Family  granularity     o  Uses  WAL  shipping  to  propagate  data   o  Replica]on  state,  status  stored  in  ZooKeeper   o  General  purpose  interface  for  asynchronously  shipping   edits  from  a  cluster   o  Other  HBase  clusters,  Region  Replicas,  SOLR/Elas]cSearch     hbase.apache.org/book.html#_cluster_replica]on  
  • 34.
  • 35. Filters   o  Addi]onal  applied  to  reads   o Use  in  conjunc]on  with  specifying  start,  end  rows,  &c.   o  Run  on  the  Region  Servers   o Included  in  GET,  SCAN  request   o  Explicitly  exclude  data  based  on  criteria   o I.E.,  value  >=  10   o  Implicitly  exclude  data  by  hin]ng  seeks   o INCLUDE_AND_NEXT_COL,  NEXT_ROW,   SEEK_NEXT_USING_HINT   o  Operate  on  data  read  from  BlockCache  
  • 36. Filters   o 30+  Filters  included  in  distribu]on   o Mini-­‐language  for  use  in  ThriM,  REST   o  "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))" o  hbase.apache.org/book.html#thriM.filter_language   o Simple  interface,  Implement  your  own!  
  • 37. public class PageFilter extends FilterBase { public PageFilter(long pageSize) {…} public boolean filterRowKey(Cell c) { return false; } public ReturnCode filterKeyValue(Cell c) { return ReturnCode.INCLUDE; } public boolean filterAllRemaining() { return this.rowsAccepted >= this.pageSize; } public filterRow() { this.rowsAccepted++; return this.rowsAccepted > this.pageSize; } }
  • 38. Coprocessors   o  Extension  points  for  HBase   o Think  Linux  Kernel  Module,  not  Stored  Procedure   o I.E.,  customize  compac]ons,  Table  constraints   o  Observers   o pre-­‐  and  post-­‐execu]on  logic   o I.E.,  MasterObserver#preTruncateTable,   RegionObserver#postScannerNext   o  Endpoints   o Cluster  RPC  extensions   o I.E.,  RowCountEndpoint,  BulkDeleteEndpoint  
  • 39. public class RowCountEndpoint implements ExampleProtos.RowCountService { public void getRowCount(…) { Scan = new Scan(); InternalScanner scanner = env.getRegion().getScanner(scan); … do { count++; } while (scanner.next()); // return count } }
  • 40. Thanks!   @HBase   h)p://hbase.apache.org   2015-­‐09-­‐28   Nick  Dimiduk  (@xefyr)   h)p://n10k.com   #apachebigdata