SlideShare a Scribd company logo
1 of 21
Download to read offline
HBase Data Types 
Nick Dimiduk, Hortonworks 
@xefyr n10k.com
Agenda 
• Motivations 
• Progress thus far 
• Future work 
• Examples 
• More Examples 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
2
Why introduce types? 
• Δ(SQL, byte[]): (╯°□°)╯︵ ┻━┻ 
• Rule of least surprise 
• Interoperability across tools 
• Distill best practices 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
3
Considerations 
• Opt-in for current users 
• Easy transition for existing applications 
• Client-side only mostly 
– Filters, Split policies, Coprocessors, Block encoding 
• Avoid POJO constraints 
– No required base-class/interface 
– No magic (avoid ASM, ORM) 
• Non-Java clients 
• HBASE-8089 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
4
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
5
Inspiration 
• Orderly 
• PostgreSQL / PostGIS 
• HBASE-7221 
• HBASE-7692 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
6
Features: Encoding 
• Order preservation 
• Override direction (ASC/DSC) 
• Fixed, variable-width 
• Null-able 
• Self-identifying 
• Efficient 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
7
Features: API 
• Complex type encoding 
– Compound rowkey pattern 
– Order preservation 
– Nullable fields 
• Runtime metadata 
• User-extensible 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
8
Implementation$ 
HBASE-8089
Implementation: Encoding 
o.a.h.h.util.OrderedBytes 
• null 
• numeric, +/-Inf, NaN 
• int8, int16, int32, int64 
• float32, float64 
• variable-length text 
• variable-length blob 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
o.a.h.h.util.Bytes 
• numeric 
• boolean 
• int16, int32, int64 
• float32, float64 
• variable-length text 
2014-­‐11-­‐18 
10
Implementation: API 
interface DataType<T> 
• decode() 
• encode() 
• encodedClass() 
• encodedLength() 
• getOrder() 
• isNullable() 
• isOrderPreserving() 
• isSkippable() 
• skip() 
implements DataType 
• OrderedXXX 
• RawXXX 
• Struct 
– StructBuilder 
– StructIterator 
– TerminatedWrapper 
– FixedLengthWrapper 
• Union{2,3,4} 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
11
Up Next 
• “Default” types 
• More complex types 
– Arrays/Lists 
– Maps/Dicts 
• Tool integration 
– Apache Phoenix 
– Cloudera Kite 
• Performance audit, HBASE-8694 
• Improved metadata, 
HBASE-8863 
– isCastableTo 
– isCoercableTo 
– isComparableTo 
• TypedTable, HBASE-7941 
• Beyond Java, HBASE-10091 
– REST 
– Thrift 
– Shell 
• ImportTsv, HBASE-8593 
• User documentation 
• Coprocessors? 
• Filters? 
• CAS? 
• DataBlockEncoders? 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
12
Examples
A case for TypedTable 
Put p = new Put(Bytes.toBytes(u.user)); 
p.add(INFO_FAM, USER_COL, Bytes.toBytes(u.user)); 
p.add(INFO_FAM, NAME_COL, Bytes.toBytes(u.name)); 
p.add(INFO_FAM, EMAIL_COL, Bytes.toBytes(u.email)); 
p.add(INFO_FAM, PASS_COL, Bytes.toBytes(u.password)); 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
14
A case for TypedTable! 
static final RawString ENC_STR = new RawString();! 
static final RawLong ENC_LONG = new RawLong();! 
--! 
! 
SimplePositionedByteRange pbr =! 
new SimplePositionedByteRange(100);! 
ENC_STR.encode(pbr, u.user);! 
Put p = new Put(Bytes.copy(pbr.getBytes(), pbr.getOffset(), 
pbr.getPosition()));! 
p.add(INFO_FAM, USER_COL, Bytes.copy(pbr.getBytes(), ...);! 
pbr.setPosition(0);! 
ENC_STR.encode(pbr, u.name);! 
p.add(INFO_FAM, NAME_COL, Bytes.copy(pbr.getBytes(), ...);! 
...! 
2014-­‐11-­‐18 
15 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
Structs: writing 
! 
! 
! 
Struct struct = new StructBuilder()! 
.add(OrderedNumeric.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.toStruct();! 
PositionedByteRange buf1 =! 
new SimplePositionedByteRange(7);! 
struct.encode(buf1,! 
new Object[] { BigDecimal.ONE, "foo" });! 
! 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
16
Structs: reading 
! 
! 
! 
! 
buf1.setPosition(0);! 
StructIterator it = longer.iterator(buf1);! 
while (it.hasNext()) {! 
System.out.print(it.next() + ", ");! 
}! 
! 
> BigDecimal.ONE, foo! 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
17
Structs: schema migration 
Struct addedFields = new StructBuilder()! 
.add(OrderedNumeric.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.add(OrderedNumeric.ASCENDING)! 
.toStruct();! 
! 
buf1.setPosition(0);! 
StructIterator it = longer.iterator(buf1);! 
while (it.hasNext()) {! 
System.out.print(it.next() + ", ");! 
}! 
> BigDecimal.ONE, foo, null, null! 
!2014-­‐11-­‐18 
18 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
Protobuf (HBASE-11161) 
! 
class PBKeyValue extends PBType<CellProtos.KeyValue> {! 
! 
@Override! 
public int encode(PositionedByteRange dst, KeyValue val) {! 
CodedOutputStream os = outputStreamFromByteRange(dst);! 
int before = os.spaceLeft(), after, written;! 
val.writeTo(os);! 
after = os.spaceLeft();! 
written = before - after;! 
dst.setPosition(dst.getPosition() + written);! 
return written;! 
}! 
2014-­‐11-­‐18 
19 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
More Examples$ 
https://gist.github.com/ndimiduk/bcf33f09cc7e4408f684
Thanks! 
M A N N I N G 
Nick Dimiduk 
Amandeep Khurana 
FOREWORD BY 
Michael Stack 
hbaseinaction.com 
Nick Dimiduk 
github.com/ndimiduk 
@xefyr 
n10k.com 
http://s.apache.org/bGN 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
21

More Related Content

Viewers also liked

The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick GuideAsim Jalis
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics QuotesCloudlytics
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBernard Marr
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBernard Marr
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 

Viewers also liked (12)

Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Big data road map
Big data road mapBig data road map
Big data road map
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick Guide
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business Needs
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Similar to HBase Data Types

OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
OpenStack Swift的性能调优
OpenStack Swift的性能调优OpenStack Swift的性能调优
OpenStack Swift的性能调优Hardway Hou
 
Native Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut FrameworkNative Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut FrameworkZachary Klein
 
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...TelecomValley
 
Postcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration nullPostcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration nullPiyush Pattanayak
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar SlidesDuraSpace
 
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...InfluxData
 
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Amazon Web Services
 
Meetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on KubernetesMeetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on Kubernetesdtoledo67
 
Developing applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDKDeveloping applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDKHorea Porutiu
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginnersarcomem
 
Three Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside ContainersThree Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside ContainersBen Hall
 
FIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 MinutesFIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 MinutesFederico Michele Facca
 
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 MinutesFederico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 MinutesCodemotion
 
CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006Mike Linksvayer
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack ArchitectureMirantis
 

Similar to HBase Data Types (20)

OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
OpenStack Swift的性能调优
OpenStack Swift的性能调优OpenStack Swift的性能调优
OpenStack Swift的性能调优
 
Native Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut FrameworkNative Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut Framework
 
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
 
Postcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration nullPostcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration null
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides
 
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
 
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
 
Meetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on KubernetesMeetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on Kubernetes
 
Developing applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDKDeveloping applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDK
 
Building Client-Side Attacks with HTML5 Features
Building Client-Side Attacks with HTML5 FeaturesBuilding Client-Side Attacks with HTML5 Features
Building Client-Side Attacks with HTML5 Features
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginners
 
Three Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside ContainersThree Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside Containers
 
FIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 MinutesFIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 Minutes
 
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 MinutesFederico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
 
CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack Architecture
 

More from Nick Dimiduk

Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixNick Dimiduk
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)Nick Dimiduk
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
 

More from Nick Dimiduk (12)

Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
 

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

HBase Data Types

  • 1. HBase Data Types Nick Dimiduk, Hortonworks @xefyr n10k.com
  • 2. Agenda • Motivations • Progress thus far • Future work • Examples • More Examples Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 2
  • 3. Why introduce types? • Δ(SQL, byte[]): (╯°□°)╯︵ ┻━┻ • Rule of least surprise • Interoperability across tools • Distill best practices Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 3
  • 4. Considerations • Opt-in for current users • Easy transition for existing applications • Client-side only mostly – Filters, Split policies, Coprocessors, Block encoding • Avoid POJO constraints – No required base-class/interface – No magic (avoid ASM, ORM) • Non-Java clients • HBASE-8089 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 4
  • 5. Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 5
  • 6. Inspiration • Orderly • PostgreSQL / PostGIS • HBASE-7221 • HBASE-7692 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 6
  • 7. Features: Encoding • Order preservation • Override direction (ASC/DSC) • Fixed, variable-width • Null-able • Self-identifying • Efficient Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 7
  • 8. Features: API • Complex type encoding – Compound rowkey pattern – Order preservation – Nullable fields • Runtime metadata • User-extensible Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 8
  • 10. Implementation: Encoding o.a.h.h.util.OrderedBytes • null • numeric, +/-Inf, NaN • int8, int16, int32, int64 • float32, float64 • variable-length text • variable-length blob Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. o.a.h.h.util.Bytes • numeric • boolean • int16, int32, int64 • float32, float64 • variable-length text 2014-­‐11-­‐18 10
  • 11. Implementation: API interface DataType<T> • decode() • encode() • encodedClass() • encodedLength() • getOrder() • isNullable() • isOrderPreserving() • isSkippable() • skip() implements DataType • OrderedXXX • RawXXX • Struct – StructBuilder – StructIterator – TerminatedWrapper – FixedLengthWrapper • Union{2,3,4} Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 11
  • 12. Up Next • “Default” types • More complex types – Arrays/Lists – Maps/Dicts • Tool integration – Apache Phoenix – Cloudera Kite • Performance audit, HBASE-8694 • Improved metadata, HBASE-8863 – isCastableTo – isCoercableTo – isComparableTo • TypedTable, HBASE-7941 • Beyond Java, HBASE-10091 – REST – Thrift – Shell • ImportTsv, HBASE-8593 • User documentation • Coprocessors? • Filters? • CAS? • DataBlockEncoders? Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 12
  • 14. A case for TypedTable Put p = new Put(Bytes.toBytes(u.user)); p.add(INFO_FAM, USER_COL, Bytes.toBytes(u.user)); p.add(INFO_FAM, NAME_COL, Bytes.toBytes(u.name)); p.add(INFO_FAM, EMAIL_COL, Bytes.toBytes(u.email)); p.add(INFO_FAM, PASS_COL, Bytes.toBytes(u.password)); Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 14
  • 15. A case for TypedTable! static final RawString ENC_STR = new RawString();! static final RawLong ENC_LONG = new RawLong();! --! ! SimplePositionedByteRange pbr =! new SimplePositionedByteRange(100);! ENC_STR.encode(pbr, u.user);! Put p = new Put(Bytes.copy(pbr.getBytes(), pbr.getOffset(), pbr.getPosition()));! p.add(INFO_FAM, USER_COL, Bytes.copy(pbr.getBytes(), ...);! pbr.setPosition(0);! ENC_STR.encode(pbr, u.name);! p.add(INFO_FAM, NAME_COL, Bytes.copy(pbr.getBytes(), ...);! ...! 2014-­‐11-­‐18 15 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 16. Structs: writing ! ! ! Struct struct = new StructBuilder()! .add(OrderedNumeric.ASCENDING)! .add(OrderedString.ASCENDING)! .toStruct();! PositionedByteRange buf1 =! new SimplePositionedByteRange(7);! struct.encode(buf1,! new Object[] { BigDecimal.ONE, "foo" });! ! Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 16
  • 17. Structs: reading ! ! ! ! buf1.setPosition(0);! StructIterator it = longer.iterator(buf1);! while (it.hasNext()) {! System.out.print(it.next() + ", ");! }! ! > BigDecimal.ONE, foo! Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 17
  • 18. Structs: schema migration Struct addedFields = new StructBuilder()! .add(OrderedNumeric.ASCENDING)! .add(OrderedString.ASCENDING)! .add(OrderedString.ASCENDING)! .add(OrderedNumeric.ASCENDING)! .toStruct();! ! buf1.setPosition(0);! StructIterator it = longer.iterator(buf1);! while (it.hasNext()) {! System.out.print(it.next() + ", ");! }! > BigDecimal.ONE, foo, null, null! !2014-­‐11-­‐18 18 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 19. Protobuf (HBASE-11161) ! class PBKeyValue extends PBType<CellProtos.KeyValue> {! ! @Override! public int encode(PositionedByteRange dst, KeyValue val) {! CodedOutputStream os = outputStreamFromByteRange(dst);! int before = os.spaceLeft(), after, written;! val.writeTo(os);! after = os.spaceLeft();! written = before - after;! dst.setPosition(dst.getPosition() + written);! return written;! }! 2014-­‐11-­‐18 19 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 21. Thanks! M A N N I N G Nick Dimiduk Amandeep Khurana FOREWORD BY Michael Stack hbaseinaction.com Nick Dimiduk github.com/ndimiduk @xefyr n10k.com http://s.apache.org/bGN Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 21