SlideShare a Scribd company logo
1 of 55
High Scale Relational Storage at
Salesforce built with Apache HBase
and Apache Phoenix
​ Andrew Purtell
​ Architect, Cloud Storage
apurtell@salesforce.com
​ @akpurtell
​ 
v3
​ Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
​ This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed
or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-
looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any
statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new,
planned, or upgraded services or technology developments and customer contracts or use of our services.
​ The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our
service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger
enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our
annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter.
These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section
of our Web site.
​ Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available
and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features
that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Safe Harbor
​ Architect, Cloud Storage at Salesforce.com
•  Data Platform Team
​ 
Open Source Contributor, since 2007
•  Committer, PMC, and Project Chair, Apache HBase
•  Committer and PMC, Apache Phoenix
•  Committer, PMC, and Project Chair, Apache Bigtop
•  Member, Apache Software Foundation
Distributed Systems Nerd, since 1997
whoami
​ Motivation
​ Open Source and the Data Platform Team
​ What are Apache HBase and Apache Phoenix?
​ HBase@Salesforce
•  The View from 30Kft
•  Keeping Up Appearances
•  Engineering The Whole Stack Holistically
​ Q&A
Agenda
Motivation
The Data Management Challenge
Scale data requires a single platform for analysis
(“data gravity”)
Salesforce already manages over 100B records of
customer data today
More than 3 billion transactions per day on the
platform
Exponential growth rate
​ Batch and stream compute data locality
requirements
​ Service continuity requirements
​ Systems of record provide reliable, highly
available, and secure data storage
•  SObjects: Traditional Salesforce platform objects
•  Salesforce Files: Blob storage
•  BigObjects: Scale out storage for immutable data
BigObjects
Systems of Record
•  Transactional data. Rows can be added and
updated
•  Example: Accounts, Contacts, Custom Objects
SObjects
BigObjects
Salesforce
Files
•  New Object type for immutable data.
•  Optimized for large volumes of data
•  Example: event data, purchase history, product
usage data
•  Blob storage for semi- or un-structured data
•  Example: CSV extracts from external systems,
weblogs, monitoring logs
Platform Connect
External Object
•  New proxy object connected to an external
oData source
Data Pipelines
Snapshot data from
SObjects / External Objects
Manipulate and analyze data
sets
•  Transformations
•  Joins
•  Calculations and enrichment
Data
Snapshot
External
Objects
Salesforce
Files
BigObjects
Data
Pipeline
Data Pipelines
Snapshot data from
SObjects / External Objects
Manipulate and analyze data
sets
•  Transformations
•  Joins
•  Calculations and enrichment
Apache Pig + Hadoop for
processing framework and
scripting language
Apache HBase for BigObjects
persistence
Apache Phoenix for indexing
and relational access
Data
Snapshot
External
Objects
Salesforce
Files
BigObjects
Data
Pipeline
Data Management Services
Customer 360
Enrich customer
profile w/ data from
external systems
Audit & TrackingData Retention Data Quality Connected
Products
Track what users are
doing on platform
Compliance
Keep data for audit
purposes
Maintain scalability
of operational
systems
Data enrichment
Data integration
Data cleansing
Event stream archive
Offline analysis and
data mining
Salesforce Shield
Persist the complete field
change history for up to ten
years with Field Audit Trail
Persist user event capture for
Event Monitoring and Audit
Support scale out batch and
stream compute for Backup,
DR, Threat Detection, etc.
Infrastructure Services
Network Services
Application Services
Secure
Data
Centers
Backup and
Disaster
Recovery
HTTPS
Encryption
Penetration
Testing
Advanced
Threat Detection
Identity &
Single Sign On
Two Factor
Authentication
User Roles &
Permissions
Field & Row
Level Security
Secure
Firewalls
Real-time
replication
Password
Policies
Third Party
Certifications
IP Login
Restrictions
Customer
Audits
Salesforce Shield
Platform
Encryption
Event
Monitoring
Field Audit Trail
Open Source and the Data Platform Team
The things we build are better
•  Building software with Open Source in mind is inherently beneficial (encourages loose coupling,
cohesiveness, quality)
•  Positive external pressure to make the right software engineering decisions
​ The things we don't build are better
•  The only thing better than writing a great component is not writing it
•  Smart engineers avoid “Not-Invented-Here” syndrome
​ We are happier
•  Open Source extends pride in ownership and visibility beyond the company walls
•  We attract better coders who gravitate towards companies that do open source work
​ We make the world a better place
How Open Source Helps Salesforce
​ No forking
•  Fork defined as: a departure from the open source repository significant enough to prevent us from
contributing patches back or applying patches from the upstream open source repository
​ Internal repositories as change buffer with local release numbering
•  Local repos for HBase, Phoenix, all dependencies (Hadoop, ZooKeeper, etc.)
•  Fast forwarded to new upstream releases after consideration
•  Updates to local repos are all manual by design
•  Hadoop is a special snowflake
​ Change the upstream repositories first
•  Almost all changes begin as patches developed using upstream repositories
•  Only critical bug fixes are an exception, or where we require a change locally meantime while working with
a slow moving upstream community
How We Contribute To And Use Open Source
​ Distributed systems engineers
​ Storage systems architects
​ Open source contributors
•  Project Chairs: Apache HBase, Phoenix, Bigtop
•  Committers and PMC: Apache HBase, Phoenix, Pig, Bigtop, Incubator
•  Mentors: Apache Phoenix, NiFi, Trafodion
•  Hundreds of commits per year
Who We Are
Apache HBase and Apache Phoenix
A new scale out relational storage option
​ A high performance horizontally scalable datastore engine for Big Data suitable as the store of
record for mission critical data
Apache HBase
​ An emerging platform for scale out relational datastores
Apache HBase
Apache Kylin
Apache HBase
​ A founding member of the
Hadoop pantheon
•  Introduced as a Hadoop ‘contrib’
module in 2007
​ Tablespaces
​ Not like a spreadsheet, a “sparse, consistent, distributed, multi-dimensional, sorted map”
HBase Data Model
HBase Scalability
RegionServers
Table A
Table B
Splits
Assignments
Regions
How is HBase Different from a RDBMS?
RDBMS HBase
Data layout Row oriented Column oriented
Transactions Multi-row ACID
Single row or adjacent row
groups only
Query language SQL None (API access)
Joins Yes No
Indexes On arbitrary columns Single row index only
Max data size Terabytes Petabytes*
R/W throughput limits
1000s of operations per
second
Millions of operations per
second*
* - No architectural upper bound on data size or aggregate throughput
​ 1969: CODASYL (network database)
​ 1979: First commercial SQL RDBMs
​ 1990: Transaction processing on SQL now popular
​ 1993: Multidimensional databases
​ 1996: Enterprise Data Warehouses
​ 2006: Hadoop and other “big data” technologies
​ 2008: NoSQL
​ 2011: SQL on Hadoop
​ 2014: Interactive analytics on Hadoop and NoSQL with SQL
​ Why?
SQL: In and Out of Fashion
From “SQL On Everything, In Memory” by Julian Hyde, Strata NYC 2014
​ Implementing structured queries well is hard
•  Systems cannot just “run the query” as written
•  Relational systems require the algebraic operators, a query planner, an optimizer, metadata, statistics, etc.
​ However, the result is very useful to non-technical users
•  Dumb queries (e.g. tool generated) can still get high performance
•  Adding new algorithms or reorganizations of physical data layouts or migrations from one data store to
another are transparent
​ What about blending the scale and performance of non-relational scale out stores with the ease
of use of SQL?
SQL: In and Out of Fashion
​ A relational system built on HBase
•  Reintroduces the familiar declarative SQL interface to data
•  Reintroduces typed data and query optimizations possible with it
•  Secondary indexes, joins, query optimization, statistics, …
•  Integrates with just about everything as a JDBC data source
​ With some new advantages
•  Dynamic columns extend schema at runtime
•  Schema is versioned – for free by HBase – allowing flashback queries using prior versions of metadata
​ And new challenges
•  Distributed transactions are hard; a work in progress building on Tephra*
“Putting the SQL back into NoSQL”
Apache Phoenix
* - Tephra open source transaction engine: http://blog.cask.co/2014/07/meet-tephra-an-open-source-transaction-engine-2/
Apache Phoenix
Supported?
CREATE / DROP / ALTER TABLE Yes
UPSERT / DELETE Yes
SELECT Yes
WHERE / HAVING Yes
GROUP BY / ORDER BY Yes
LIMIT Yes
JOIN Yes, with limitations
Views Yes
Secondary indexes Yes
Statistics and query optimization Yes
Transactions No, work in progress
Phoenix Integration With HBase
RegionServers
Regions
JDBC
driver
Coprocessors
(HBase server
side extensions)
SELECT * FROM users WHERE
id = ‘xxxabc’
​ Phoenix maps the HBase data model to the relational world
Phoenix Data Model
​ Phoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
​ Phoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
Primary key constraint
​ Phoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
Primary key constraint Columns
​ Phoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
Primary key constraint Columns
Multiple
versions
​ Dynamic columns
•  Specify a subset of columns at CREATE time; the remainder can be optionally specified at query time
•  Surfaces HBase’s schema flexibility
Extras
CREATE TABLE “t” (
K VARCHAR PRIMARY KEY,
“f1”.”col1” VARCHAR);
SELECT * FROM “t” (“f1”.”col2” VARCHAR);
Dynamic column
​ Native multitenancy
•  Multitenant isolation via a combination of multitenant tables and tenant-specific connections
•  Tenant-specific connections only access data that belongs to the tenant
•  Tenants can create their own schema addendums (views, columns, indexes)
Extras
CREATE TABLE event (
tenant_id VARCHAR,
type CHAR(1),
event_id BIGINT,
…
CONSTRAINT pk PRIMARY KEY (tenant_id, type, event_id))
MULTI_TENANT=true;
First PK column identifies tenant ID
Tenant-specific connection
DriverManager.connect(“jdbc:phoenix:localhost;tenantId=me”);
​ Three index types
•  Mutable indexes
•  Global mutable indexes
​ Server side intercepts primary table updates, builds and writes entries to secondary index tables
​ For read heavy, low write uses cases
•  Local mutable indexes
​ Index data and primary data are placed together (index in shadow column family)
​ For write heavy, space constrained use cases
•  Immutable indexes
​ Managed entirely by the client, writes scale best
​ For use cases where rows are immutable after write
Secondary Indexes
​ …
•  Global mutable indexes – covered indexes
•  Data in covered columns will be copied into the index
•  This allows a global index to be used more frequently, as a global index will only be used if all columns
referenced in the query are contained by it
Note: Rebuilds must currently be done using offline tooling (MapReduce)
Secondary Indexes
CREATE TABLE t (k VARCHAR PRIMARY KEY,
v1 VARCHAR, v2 INTEGER);
CREATE INDEX i ON t (v1) INCLUDE (v2);
Covered column
​ Standard join syntax is supported, with some limitations
Joins
•  Only equality (=) is supported in joining
conditions
•  No restriction on other predicates in the ON
clause
•  Some queries may exceed server side
resources and require rewrites by hand
•  Work in progress
​ Client side rewriting
•  Parallel scanning with final client side merge sort
•  RPC batching
•  Use secondary indexes if available
•  Rewrites for multitenant tables
​ Statistics
•  Use guideposts to increase intra-region
parallelism
​ Current Work-in-Progress
•  Integration with Apache Calcite
•  ~120 rewrite rules
Query Optimization
​ Server side push down
•  Filters
•  Skip scans
•  Partial aggregation
•  TopN
•  Hash joins
Query Optimization
Example query plan for a 32 region table
Query Optimization
With a secondary index on RESPONSE_TIME
HBase @ Salesforce.com
​ Durability
​ Consistency
​ Modeling constructs
​ Atomicity
​ Concurrency
​ Queryability
​ Schema mutability
​ Data size
​ Data portability and backup
​ Scalability
​ Reliability
​ Latency
​ Recoverability
​ Data Integrity
​ Required Downtime
​ Manageability
​ Cost
​ Support
​ Comprehensibility
​ License
​ Community
So Why HBase?
The View from 30Kft
​ Durability
​ Consistency
​ Modeling constructs
​ Atomicity
​ Concurrency
​ Queryability
​ Schema mutability
​ Data size
​ Data portability and backup
​ Scalability
​ Reliability
​ Latency
​ Recoverability
​ Data Integrity
​ Required Downtime
​ Manageability
​ Cost
​ Support
​ Comprehensibility
​ License
​ Community
So Why HBase?
The View from 30Kft
​ Scalability
​ Consistency
​ Operations
​ License
​ Community
​ Durability
​ Consistency
​ Modeling constructs
​ Atomicity
​ Concurrency
​ Queryability
​ Schema mutability
​ Data size
​ Data portability and backup
​ Scalability
​ Reliability
​ Latency
​ Recoverability
​ Data Integrity
​ Required Downtime
​ Manageability
​ Cost
​ Support
​ Comprehensibility
​ License
​ Community
So Why HBase?
The View from 30Kft
​ Scalability
​ Consistency
​ Operations
​ License
​ Community
​ HBase
​ Instances are identical collections of hardware and software that support a discrete subset of our
customers
​ An instance group is a collection of instances, in one location, configured as a single resource
pool and failure domain
​ For every instance or instance group, there is an identical mirror available in another datacenter
​ HBase is currently operating both in instance and instance group configurations
•  We started with smaller configurations (~10s of nodes)
•  Recently we have begun migrating to larger configurations (~100s of nodes)
•  Total fleet size is a “few thousand” servers
Instances and instance groups
The View from 30Kft
​ Apache Phoenix was motivated by the impedance mismatch between the HBase API and the
expectations of platform developers
Keeping Up Appearances
SELECT * FROM foo WHERE bar > 30 HTable t = new HTable(“foo”);
RegionScanner s = t.getScanner(new Scan(…,
new ValueFilter(CompareOp.GT,
new CustomTypedComparator(30)), …));
while ((Result r = s.next()) != null) {
// blah blah blah Java Java Java
}
s.close();
t.close();
vs.
Engineering The Whole Stack Holistically
Engineering The Whole Stack Holistically
Kernel tunables, filesystem, krb, nscd
Engineering The Whole Stack Holistically
Kernel tunables, filesystem, krb, nscd
GC tuning, local patches
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, filesystem, krb, nscd
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, filesystem, krb, nscd
Many bug fixes and enhancements,
tuning
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, filesystem, krb, nscd
Many bug fixes and enhancements,
tuning
Relational engine built on HBase
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, filesystem, krb, nscd
Many bug fixes and enhancements,
tuning
Relational engine built on HBase
Data access layer for internal users
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, filesystem, krb, nscd
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Many bug fixes and enhancements,
tuning
Relational engine built on HBase
Data access layer for internal users
Circuit breaker, relogin thread
Thank you
Q&A

More Related Content

What's hot

Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changesconfluent
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase DataWorks Summit
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debeziumKasun Don
 
[Practical] Functional Programming in Rails
[Practical] Functional Programming in Rails[Practical] Functional Programming in Rails
[Practical] Functional Programming in RailsGilbert B Garza
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareHostedbyConfluent
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Developers
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈Tim Y
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningKai Wähner
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Best Practices with Apex in 2022.pdf
Best Practices with Apex in 2022.pdfBest Practices with Apex in 2022.pdf
Best Practices with Apex in 2022.pdfMohith Shrivastava
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreDataWorks Summit
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Code Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data PipelinesCode Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data PipelinesDatabricks
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 

What's hot (20)

Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
[Practical] Functional Programming in Rails
[Practical] Functional Programming in Rails[Practical] Functional Programming in Rails
[Practical] Functional Programming in Rails
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We Do
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Best Practices with Apex in 2022.pdf
Best Practices with Apex in 2022.pdfBest Practices with Apex in 2022.pdf
Best Practices with Apex in 2022.pdf
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Code Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data PipelinesCode Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data Pipelines
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 

Viewers also liked

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...Trieu Nguyen
 
Salesforce External Objects for Big Data
Salesforce External Objects for Big DataSalesforce External Objects for Big Data
Salesforce External Objects for Big DataSumit Sarkar
 
HBase Secondary Indexing
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing Gino McCarty
 
How Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the CloudHow Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the CloudSalesforce Developers
 
Salesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache PhoenixSalesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache PhoenixSalesforce Engineering
 
Basics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.comBasics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.comDeepu S Nath
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Salesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting PartnerSalesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting PartnerSalesforce Partners
 
How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?Suyati Technologies
 
Salesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care ExperienceSalesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care ExperienceDreamforce
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Salesforce.com Overview
Salesforce.com OverviewSalesforce.com Overview
Salesforce.com OverviewEdureka!
 

Viewers also liked (19)

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
Salesforce External Objects for Big Data
Salesforce External Objects for Big DataSalesforce External Objects for Big Data
Salesforce External Objects for Big Data
 
HBase Secondary Indexing
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing
 
How Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the CloudHow Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the Cloud
 
Salesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache PhoenixSalesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache Phoenix
 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
 
Basics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.comBasics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.com
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Generic Roadmap Slide
Generic Roadmap SlideGeneric Roadmap Slide
Generic Roadmap Slide
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Salesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting PartnerSalesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting Partner
 
How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?
 
Salesforce CRM
Salesforce CRMSalesforce CRM
Salesforce CRM
 
Salesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care ExperienceSalesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care Experience
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Salesforce.com Overview
Salesforce.com OverviewSalesforce.com Overview
Salesforce.com Overview
 

Similar to High Scale Relational Storage at Salesforce Built with Apache HBase and Apache Phoenix

How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comSalesforce Engineering
 
Understanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoUnderstanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoSalesforce Developers
 
Enterprise API New Features and Roadmap
Enterprise API New Features and RoadmapEnterprise API New Features and Roadmap
Enterprise API New Features and RoadmapSalesforce Developers
 
Heroku - developer playground
Heroku - developer playground Heroku - developer playground
Heroku - developer playground Troy Sellers
 
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible AppsOur API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible AppsDreamforce
 
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Salesforce Partners
 
Processing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App CloudProcessing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App CloudSalesforce Developers
 
Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...Vineeth Mylapur
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Developing a Documentation Portal on Heroku
Developing a Documentation Portal on HerokuDeveloping a Documentation Portal on Heroku
Developing a Documentation Portal on HerokuSalesforce Developers
 
Manage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with GovernanceManage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with GovernanceSalesforce Admins
 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...IBM
 
Integrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-insIntegrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-insSalesforce Developers
 
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?US-Analytics
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceSalesforce Developers
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationInside Analysis
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Cloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuCloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuSalesforce Developers
 
Analyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectAnalyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectSalesforce Developers
 

Similar to High Scale Relational Storage at Salesforce Built with Apache HBase and Apache Phoenix (20)

How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.com
 
Understanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoUnderstanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We Do
 
Enterprise API New Features and Roadmap
Enterprise API New Features and RoadmapEnterprise API New Features and Roadmap
Enterprise API New Features and Roadmap
 
Heroku - developer playground
Heroku - developer playground Heroku - developer playground
Heroku - developer playground
 
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible AppsOur API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
 
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
 
Processing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App CloudProcessing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App Cloud
 
Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Developing a Documentation Portal on Heroku
Developing a Documentation Portal on HerokuDeveloping a Documentation Portal on Heroku
Developing a Documentation Portal on Heroku
 
Manage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with GovernanceManage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with Governance
 
Open Source at Salesforce.com
Open Source at Salesforce.comOpen Source at Salesforce.com
Open Source at Salesforce.com
 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
 
Integrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-insIntegrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-ins
 
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Cloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuCloud Integration with Database.com and Heroku
Cloud Integration with Database.com and Heroku
 
Analyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectAnalyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObject
 

More from Salesforce Engineering

Locker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With WebpackLocker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With WebpackSalesforce Engineering
 
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the CloudTechniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the CloudSalesforce Engineering
 
Predictive System Performance Data Analysis
Predictive System Performance Data AnalysisPredictive System Performance Data Analysis
Predictive System Performance Data AnalysisSalesforce Engineering
 
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already HaveAspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already HaveSalesforce Engineering
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteSalesforce Engineering
 
Implementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 MilesImplementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 MilesSalesforce Engineering
 
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief OverviewSalesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief OverviewSalesforce Engineering
 
Global State Management of Micro Services
Global State Management of Micro ServicesGlobal State Management of Micro Services
Global State Management of Micro ServicesSalesforce Engineering
 

More from Salesforce Engineering (20)

Locker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With WebpackLocker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With Webpack
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the CloudTechniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the Cloud
 
Predictive System Performance Data Analysis
Predictive System Performance Data AnalysisPredictive System Performance Data Analysis
Predictive System Performance Data Analysis
 
Apache HBase State of the Project
Apache HBase State of the ProjectApache HBase State of the Project
Apache HBase State of the Project
 
Hit the Trail with Trailhead
Hit the Trail with TrailheadHit the Trail with Trailhead
Hit the Trail with Trailhead
 
HBase/PHOENIX @ Scale
HBase/PHOENIX @ ScaleHBase/PHOENIX @ Scale
HBase/PHOENIX @ Scale
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Containers and Security for DevOps
Containers and Security for DevOpsContainers and Security for DevOps
Containers and Security for DevOps
 
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already HaveAspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already Have
 
Monitoring @ Scale in Salesforce
Monitoring @ Scale in SalesforceMonitoring @ Scale in Salesforce
Monitoring @ Scale in Salesforce
 
Performance Tuning with XHProf
Performance Tuning with XHProfPerformance Tuning with XHProf
Performance Tuning with XHProf
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
 
Implementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 MilesImplementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 Miles
 
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief OverviewSalesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
 
Koober Preduction IO Presentation
Koober Preduction IO PresentationKoober Preduction IO Presentation
Koober Preduction IO Presentation
 
Finding Security Issues Fast!
Finding Security Issues Fast!Finding Security Issues Fast!
Finding Security Issues Fast!
 
Microservices
MicroservicesMicroservices
Microservices
 
Global State Management of Micro Services
Global State Management of Micro ServicesGlobal State Management of Micro Services
Global State Management of Micro Services
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

High Scale Relational Storage at Salesforce Built with Apache HBase and Apache Phoenix

  • 1. High Scale Relational Storage at Salesforce built with Apache HBase and Apache Phoenix ​ Andrew Purtell ​ Architect, Cloud Storage apurtell@salesforce.com ​ @akpurtell ​  v3
  • 2. ​ Safe harbor statement under the Private Securities Litigation Reform Act of 1995: ​ This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward- looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. ​ The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. ​ Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements. Safe Harbor
  • 3. ​ Architect, Cloud Storage at Salesforce.com •  Data Platform Team ​  Open Source Contributor, since 2007 •  Committer, PMC, and Project Chair, Apache HBase •  Committer and PMC, Apache Phoenix •  Committer, PMC, and Project Chair, Apache Bigtop •  Member, Apache Software Foundation Distributed Systems Nerd, since 1997 whoami
  • 4. ​ Motivation ​ Open Source and the Data Platform Team ​ What are Apache HBase and Apache Phoenix? ​ HBase@Salesforce •  The View from 30Kft •  Keeping Up Appearances •  Engineering The Whole Stack Holistically ​ Q&A Agenda
  • 6. The Data Management Challenge Scale data requires a single platform for analysis (“data gravity”) Salesforce already manages over 100B records of customer data today More than 3 billion transactions per day on the platform Exponential growth rate ​ Batch and stream compute data locality requirements ​ Service continuity requirements
  • 7. ​ Systems of record provide reliable, highly available, and secure data storage •  SObjects: Traditional Salesforce platform objects •  Salesforce Files: Blob storage •  BigObjects: Scale out storage for immutable data BigObjects Systems of Record •  Transactional data. Rows can be added and updated •  Example: Accounts, Contacts, Custom Objects SObjects BigObjects Salesforce Files •  New Object type for immutable data. •  Optimized for large volumes of data •  Example: event data, purchase history, product usage data •  Blob storage for semi- or un-structured data •  Example: CSV extracts from external systems, weblogs, monitoring logs Platform Connect External Object •  New proxy object connected to an external oData source
  • 8. Data Pipelines Snapshot data from SObjects / External Objects Manipulate and analyze data sets •  Transformations •  Joins •  Calculations and enrichment Data Snapshot External Objects Salesforce Files BigObjects Data Pipeline
  • 9. Data Pipelines Snapshot data from SObjects / External Objects Manipulate and analyze data sets •  Transformations •  Joins •  Calculations and enrichment Apache Pig + Hadoop for processing framework and scripting language Apache HBase for BigObjects persistence Apache Phoenix for indexing and relational access Data Snapshot External Objects Salesforce Files BigObjects Data Pipeline
  • 10. Data Management Services Customer 360 Enrich customer profile w/ data from external systems Audit & TrackingData Retention Data Quality Connected Products Track what users are doing on platform Compliance Keep data for audit purposes Maintain scalability of operational systems Data enrichment Data integration Data cleansing Event stream archive Offline analysis and data mining
  • 11. Salesforce Shield Persist the complete field change history for up to ten years with Field Audit Trail Persist user event capture for Event Monitoring and Audit Support scale out batch and stream compute for Backup, DR, Threat Detection, etc. Infrastructure Services Network Services Application Services Secure Data Centers Backup and Disaster Recovery HTTPS Encryption Penetration Testing Advanced Threat Detection Identity & Single Sign On Two Factor Authentication User Roles & Permissions Field & Row Level Security Secure Firewalls Real-time replication Password Policies Third Party Certifications IP Login Restrictions Customer Audits Salesforce Shield Platform Encryption Event Monitoring Field Audit Trail
  • 12. Open Source and the Data Platform Team
  • 13. The things we build are better •  Building software with Open Source in mind is inherently beneficial (encourages loose coupling, cohesiveness, quality) •  Positive external pressure to make the right software engineering decisions ​ The things we don't build are better •  The only thing better than writing a great component is not writing it •  Smart engineers avoid “Not-Invented-Here” syndrome ​ We are happier •  Open Source extends pride in ownership and visibility beyond the company walls •  We attract better coders who gravitate towards companies that do open source work ​ We make the world a better place How Open Source Helps Salesforce
  • 14. ​ No forking •  Fork defined as: a departure from the open source repository significant enough to prevent us from contributing patches back or applying patches from the upstream open source repository ​ Internal repositories as change buffer with local release numbering •  Local repos for HBase, Phoenix, all dependencies (Hadoop, ZooKeeper, etc.) •  Fast forwarded to new upstream releases after consideration •  Updates to local repos are all manual by design •  Hadoop is a special snowflake ​ Change the upstream repositories first •  Almost all changes begin as patches developed using upstream repositories •  Only critical bug fixes are an exception, or where we require a change locally meantime while working with a slow moving upstream community How We Contribute To And Use Open Source
  • 15. ​ Distributed systems engineers ​ Storage systems architects ​ Open source contributors •  Project Chairs: Apache HBase, Phoenix, Bigtop •  Committers and PMC: Apache HBase, Phoenix, Pig, Bigtop, Incubator •  Mentors: Apache Phoenix, NiFi, Trafodion •  Hundreds of commits per year Who We Are
  • 16. Apache HBase and Apache Phoenix A new scale out relational storage option
  • 17. ​ A high performance horizontally scalable datastore engine for Big Data suitable as the store of record for mission critical data Apache HBase
  • 18. ​ An emerging platform for scale out relational datastores Apache HBase Apache Kylin
  • 19. Apache HBase ​ A founding member of the Hadoop pantheon •  Introduced as a Hadoop ‘contrib’ module in 2007
  • 20. ​ Tablespaces ​ Not like a spreadsheet, a “sparse, consistent, distributed, multi-dimensional, sorted map” HBase Data Model
  • 21. HBase Scalability RegionServers Table A Table B Splits Assignments Regions
  • 22. How is HBase Different from a RDBMS? RDBMS HBase Data layout Row oriented Column oriented Transactions Multi-row ACID Single row or adjacent row groups only Query language SQL None (API access) Joins Yes No Indexes On arbitrary columns Single row index only Max data size Terabytes Petabytes* R/W throughput limits 1000s of operations per second Millions of operations per second* * - No architectural upper bound on data size or aggregate throughput
  • 23. ​ 1969: CODASYL (network database) ​ 1979: First commercial SQL RDBMs ​ 1990: Transaction processing on SQL now popular ​ 1993: Multidimensional databases ​ 1996: Enterprise Data Warehouses ​ 2006: Hadoop and other “big data” technologies ​ 2008: NoSQL ​ 2011: SQL on Hadoop ​ 2014: Interactive analytics on Hadoop and NoSQL with SQL ​ Why? SQL: In and Out of Fashion From “SQL On Everything, In Memory” by Julian Hyde, Strata NYC 2014
  • 24. ​ Implementing structured queries well is hard •  Systems cannot just “run the query” as written •  Relational systems require the algebraic operators, a query planner, an optimizer, metadata, statistics, etc. ​ However, the result is very useful to non-technical users •  Dumb queries (e.g. tool generated) can still get high performance •  Adding new algorithms or reorganizations of physical data layouts or migrations from one data store to another are transparent ​ What about blending the scale and performance of non-relational scale out stores with the ease of use of SQL? SQL: In and Out of Fashion
  • 25. ​ A relational system built on HBase •  Reintroduces the familiar declarative SQL interface to data •  Reintroduces typed data and query optimizations possible with it •  Secondary indexes, joins, query optimization, statistics, … •  Integrates with just about everything as a JDBC data source ​ With some new advantages •  Dynamic columns extend schema at runtime •  Schema is versioned – for free by HBase – allowing flashback queries using prior versions of metadata ​ And new challenges •  Distributed transactions are hard; a work in progress building on Tephra* “Putting the SQL back into NoSQL” Apache Phoenix * - Tephra open source transaction engine: http://blog.cask.co/2014/07/meet-tephra-an-open-source-transaction-engine-2/
  • 26. Apache Phoenix Supported? CREATE / DROP / ALTER TABLE Yes UPSERT / DELETE Yes SELECT Yes WHERE / HAVING Yes GROUP BY / ORDER BY Yes LIMIT Yes JOIN Yes, with limitations Views Yes Secondary indexes Yes Statistics and query optimization Yes Transactions No, work in progress
  • 27. Phoenix Integration With HBase RegionServers Regions JDBC driver Coprocessors (HBase server side extensions) SELECT * FROM users WHERE id = ‘xxxabc’
  • 28. ​ Phoenix maps the HBase data model to the relational world Phoenix Data Model
  • 29. ​ Phoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table
  • 30. ​ Phoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table Primary key constraint
  • 31. ​ Phoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table Primary key constraint Columns
  • 32. ​ Phoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table Primary key constraint Columns Multiple versions
  • 33. ​ Dynamic columns •  Specify a subset of columns at CREATE time; the remainder can be optionally specified at query time •  Surfaces HBase’s schema flexibility Extras CREATE TABLE “t” ( K VARCHAR PRIMARY KEY, “f1”.”col1” VARCHAR); SELECT * FROM “t” (“f1”.”col2” VARCHAR); Dynamic column
  • 34. ​ Native multitenancy •  Multitenant isolation via a combination of multitenant tables and tenant-specific connections •  Tenant-specific connections only access data that belongs to the tenant •  Tenants can create their own schema addendums (views, columns, indexes) Extras CREATE TABLE event ( tenant_id VARCHAR, type CHAR(1), event_id BIGINT, … CONSTRAINT pk PRIMARY KEY (tenant_id, type, event_id)) MULTI_TENANT=true; First PK column identifies tenant ID Tenant-specific connection DriverManager.connect(“jdbc:phoenix:localhost;tenantId=me”);
  • 35. ​ Three index types •  Mutable indexes •  Global mutable indexes ​ Server side intercepts primary table updates, builds and writes entries to secondary index tables ​ For read heavy, low write uses cases •  Local mutable indexes ​ Index data and primary data are placed together (index in shadow column family) ​ For write heavy, space constrained use cases •  Immutable indexes ​ Managed entirely by the client, writes scale best ​ For use cases where rows are immutable after write Secondary Indexes
  • 36. ​ … •  Global mutable indexes – covered indexes •  Data in covered columns will be copied into the index •  This allows a global index to be used more frequently, as a global index will only be used if all columns referenced in the query are contained by it Note: Rebuilds must currently be done using offline tooling (MapReduce) Secondary Indexes CREATE TABLE t (k VARCHAR PRIMARY KEY, v1 VARCHAR, v2 INTEGER); CREATE INDEX i ON t (v1) INCLUDE (v2); Covered column
  • 37. ​ Standard join syntax is supported, with some limitations Joins •  Only equality (=) is supported in joining conditions •  No restriction on other predicates in the ON clause •  Some queries may exceed server side resources and require rewrites by hand •  Work in progress
  • 38. ​ Client side rewriting •  Parallel scanning with final client side merge sort •  RPC batching •  Use secondary indexes if available •  Rewrites for multitenant tables ​ Statistics •  Use guideposts to increase intra-region parallelism ​ Current Work-in-Progress •  Integration with Apache Calcite •  ~120 rewrite rules Query Optimization ​ Server side push down •  Filters •  Skip scans •  Partial aggregation •  TopN •  Hash joins
  • 39. Query Optimization Example query plan for a 32 region table
  • 40. Query Optimization With a secondary index on RESPONSE_TIME
  • 42. ​ Durability ​ Consistency ​ Modeling constructs ​ Atomicity ​ Concurrency ​ Queryability ​ Schema mutability ​ Data size ​ Data portability and backup ​ Scalability ​ Reliability ​ Latency ​ Recoverability ​ Data Integrity ​ Required Downtime ​ Manageability ​ Cost ​ Support ​ Comprehensibility ​ License ​ Community So Why HBase? The View from 30Kft
  • 43. ​ Durability ​ Consistency ​ Modeling constructs ​ Atomicity ​ Concurrency ​ Queryability ​ Schema mutability ​ Data size ​ Data portability and backup ​ Scalability ​ Reliability ​ Latency ​ Recoverability ​ Data Integrity ​ Required Downtime ​ Manageability ​ Cost ​ Support ​ Comprehensibility ​ License ​ Community So Why HBase? The View from 30Kft ​ Scalability ​ Consistency ​ Operations ​ License ​ Community
  • 44. ​ Durability ​ Consistency ​ Modeling constructs ​ Atomicity ​ Concurrency ​ Queryability ​ Schema mutability ​ Data size ​ Data portability and backup ​ Scalability ​ Reliability ​ Latency ​ Recoverability ​ Data Integrity ​ Required Downtime ​ Manageability ​ Cost ​ Support ​ Comprehensibility ​ License ​ Community So Why HBase? The View from 30Kft ​ Scalability ​ Consistency ​ Operations ​ License ​ Community ​ HBase
  • 45. ​ Instances are identical collections of hardware and software that support a discrete subset of our customers ​ An instance group is a collection of instances, in one location, configured as a single resource pool and failure domain ​ For every instance or instance group, there is an identical mirror available in another datacenter ​ HBase is currently operating both in instance and instance group configurations •  We started with smaller configurations (~10s of nodes) •  Recently we have begun migrating to larger configurations (~100s of nodes) •  Total fleet size is a “few thousand” servers Instances and instance groups The View from 30Kft
  • 46. ​ Apache Phoenix was motivated by the impedance mismatch between the HBase API and the expectations of platform developers Keeping Up Appearances SELECT * FROM foo WHERE bar > 30 HTable t = new HTable(“foo”); RegionScanner s = t.getScanner(new Scan(…, new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), …)); while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close(); vs.
  • 47. Engineering The Whole Stack Holistically
  • 48. Engineering The Whole Stack Holistically Kernel tunables, filesystem, krb, nscd
  • 49. Engineering The Whole Stack Holistically Kernel tunables, filesystem, krb, nscd GC tuning, local patches
  • 50. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, filesystem, krb, nscd Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 51. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, filesystem, krb, nscd Many bug fixes and enhancements, tuning Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 52. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, filesystem, krb, nscd Many bug fixes and enhancements, tuning Relational engine built on HBase Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 53. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, filesystem, krb, nscd Many bug fixes and enhancements, tuning Relational engine built on HBase Data access layer for internal users Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 54. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, filesystem, krb, nscd Multi-standby (HDFS-6440), fsync, tuning, custom UGI Many bug fixes and enhancements, tuning Relational engine built on HBase Data access layer for internal users Circuit breaker, relogin thread