SlideShare a Scribd company logo
1 of 45
ORC DEEP DIVE
Owen O’Malley
omalley@apache.org
January 2020
@owen_omalley
OVERVIEW
© 2019 Cloudera, Inc. All rights
reserved.
3
REQUIREMENTS
• Files had to be completely self describing
• Schema
• File version
• Tight compression ⇒ Run Length Encoding (RLE) &
compression
• Column projection ⇒ segregate column data
• Predicate pushdown ⇒ understand & index user’s types
• Files had to be easy & fast to divide
• Compatible with a write once file systems
© 2019 Cloudera, Inc. All rights
reserved.
4
FILE STRUCTURE
• The file footer contains:
• Metadata – schema, file statistics
• Stripe information – metadata and location of stripes
• Postscript with the compression, buffer size, & file version
• ORC file data is divided into stripes.
• Stripes are self contained sets of rows organized by
columns.
• Stripes are the smallest unit of work for tasks.
• Default is ~64MB, but often configured larger.
© 2019 Cloudera, Inc. All rights
reserved.
5
STRIPE STRUCTURE
• Within a stripe, the metadata data is in the stripe
footer.
• List of streams
• Column encoding information (eg. direct or
dictionary)
• Columns are written as a set of streams. There
are 3 kinds:
• Index streams
• Data streams
• Dictionary streams
© 2019 Cloudera, Inc. All rights
reserved.
6
FILE STRUCTURE
© 2019 Cloudera, Inc. All rights
reserved.
7
READ PATH
• The Reader reads last 16k of file, extra as
needed
• The RowReader reads
• Stripe footer
• Required streams
© 2019 Cloudera, Inc. All rights
reserved.
8
STREAMS
• Streams are an independent sequence of bytes
• Serialization into streams depends on column
type & encoding
• Optional pipeline stages:
• Run Length Encoding (RLE) – first pass integer
compression
• Generic compression – Zlib, Snappy, LZO, Zstd
• Encryption – AES/CTR
DATA ENCODING
© 2019 Cloudera, Inc. All rights
reserved.
10
COMPOUND TYPES
• Compound types are serialized as trees of
columns.
• struct, list, map, uniontype all have child
columns
• Types are numbered in a preorder traversal
• The column reading classes are called TreeReadera: int,
b: map<string,
struct<c: string,
d: double>>,
e: timestamp
© 2019 Cloudera, Inc. All rights
reserved.
11
ENCODING COLUMNS
• To interpret a stream, you need three pieces of information:
• Column type
• Column encoding (direct, dictionary)
• Stream kind (present, data, length, etc.)
• All columns, if they have nulls, will have a present stream
• Serialized using a boolean RLE
• Integer columns are serialized with
• A data stream using integer RLE
© 2019 Cloudera, Inc. All rights
reserved.
12
ENCODING COLUMNS
• Binary columns are serialized with:
• Length stream of integer RLE
• Data stream of raw sequence of bytes
• String columns may be direct or dictionary encoded
• Direct looks like binary column, but dictionary is different
• Dictionary_data is raw sequence of dictionary bytes
• Length is an integer RLE stream of the dictionary lengths
• Data is an integer RLE stream of indexes into dictionary
© 2019 Cloudera, Inc. All rights
reserved.
13
ENCODING COLUMNS
• Lists and maps record the number of child
elements
• Length is an integer RLE stream
• Structs only have the present stream
• Timestamps need nanosecond resolution (ouch!)
• Data is an integer RLE of seconds from Jan 2015
• Secondary is an integer RLE of nanoseconds
with 0 suppress
© 2019 Cloudera, Inc. All rights
reserved.
14
RUN LENGTH ENCODING
• Goal is to get some cheap quick compression
• Handles repeating/incrementing values
• Handles integer byte packing
• Two versions
• Version 1 – relative simple repeat/literal
encoding
• Version 2 – complex encoding with 4 variants
• Column encoding of *_V2 means use RLE version
2
COMPRESSION & INDEXES
© 2019 Cloudera, Inc. All rights
reserved.
16
ROW PRUNING
• Three levels of indexing/row pruning
• File – uses file statistics in file footer
• Stripe – uses stripe statistics before file footer
• Row group (default of 10k rows) – uses index
stream
• The index stream for each column includes for
each row group
• Column statistics (min, max, count, sum)
• The start positions of each stream
© 2019 Cloudera, Inc. All rights
reserved.
17
SEARCH ARGUMENTS
• Engines can pass Search Arguments (SArgs) to the
RowReader.
• Limited set of operations (=, <=>, <, <=, in, between, is
null)
• Compare one column to literal(s)
• Can only eliminate entire row groups, stripes, or files.
• Engine must still filter the individual rows afterwards
• For Hive, ensure hive.optimize.index.filter is true.
© 2019 Cloudera, Inc. All rights
reserved.
18
COMPRESSION
• All of the generic compression is done in chunks
• Codec is reinitialized at start of chunk
• Each chunk is compressed separately
• Each uncompressed chunk is at most the buffer
size
• Each chunk has a 3 byte header giving:
• Compressed size of chunk
• Whether it is the original or compressed
© 2019 Cloudera, Inc. All rights
reserved.
19
INDEXES
• Wanted ability to seek to each row group
• Allows fine grain seeking & row pruning
• Could have flushed stream compression pipeline
• Would have dramatically lowered compression
• Instead treat compression & RLE has gray boxes
• Use our knowledge of compression & RLE
• Always start fresh at beginning of chunk or run
© 2019 Cloudera, Inc. All rights
reserved.
20
INDEX POSITIONS
• Records information to
seek to a given row in all
of a column’s streams
• Includes:
• C Compressed bytes
• U Uncompressed bytes
• V RLE values
• C, U, & V jump to RG 4
© 2019 Cloudera, Inc. All rights
reserved.
21
BLOOM FILTERS
• For use cases where you need to find particular
values
• Sorting by that column allows min/max filtering
• But you can only sort on one column effectively
• Bloom filters are probabilistic data structures
• Only useful for equality, not less than or greater
than
• Need ~10 bits/distinct value ⇒ opt in
• ORC uses a bloom_filter_utf8 stream to record a
bloom filter per a row group
© 2019 Cloudera, Inc. All rights
reserved.
22
ROW PRUNING EXAMPLE
• TPC-DS
 from tpch1000.lineitem where l_orderkey = 1212000001;
Index Rows Read Time
Nothing 5,999,989,709 74 sec
Min/Max 540,000 4.5 sec
Bloom 10,000 1.3 sec
VERSIONING
© 2019 Cloudera, Inc. All rights
reserved.
24
COMPATIBILITY
• Within a file version, old readers must be able to read all
files.
• A few exceptions (eg. new codecs, types)
• Version 0 (from Hive 0.11)
• Only RLE V1 & string dictionary encoding
• Version 1 (from Hive 0.12 forward)
• Version 2 (under development)
• The library includes ability to write any file version.
• Enables smooth upgrades across clusters
© 2019 Cloudera, Inc. All rights
reserved.
25
WRITER VERSION
• When fixes or feature additions are made to the
writer, we bump the writer version.
• Allows reader to work around bugs, especially in
index
• Does not affect reader compatibility
• We should require each minor version adds a
new one.
• We also record which writer wrote the file:
• Java, C++, Presto, Go
© 2019 Cloudera, Inc. All rights
reserved.
26
EXAMPLE WORKAROUND FOR HIVE-8746
• Timestamps suck!
• ORC uses an epoch of 01-01-2015 00:00:00.
• Timestamp columns record seconds offset from
epoch
• Unfortunately, the original code use local time
zone.
• If reader and writer were in time zones with the
same rules, it worked.
• Fix involved writing the writer time zone into file.
• Forwards and backwards compatible
ADDITIONAL FEATURES
© 2019 Cloudera, Inc. All rights
reserved.
28
SCHEMA EVOLUTION
• User passes desired schema to RecordReader factory.
• SchemaEvolution class maps between file & reader
schemas.
• The mapping can be positional or name based.
• Conversions based on legacy Hive behavior…
• The RecordReader uses the mapping to translate
• Choosing streams uses the file schema column ids
• Type translation is done by ConvertTreeReaderFactory.
• Adds an additional TreeReader that does conversion.
© 2019 Cloudera, Inc. All rights
reserved.
29
STRIPE CONCATENATION & FLUSH
• ORC has a special operator to concatenate files
• Requires consistent options & schema
• Concatenates stripes without reserialization
• ORC can flush the current contents including a file
footer while still writing to the file.
• Writes a side file with the current offset of the
file tail
• When the file closes the intermediate file footers
are ignored
© 2019 Cloudera, Inc. All rights
reserved.
30
COLUMN ENCRYPTION
• Released in ORC 1.6
• Allows consistent column level access control across engines
• Writes two variants of data
• Encrypted original
• Unencrypted statically masked
• Each variant has its own streams & encodings
• Each column has a unique local key, which is encrypted by
KMS
© 2019 Cloudera, Inc. All rights
reserved.
31
OTHER DEVELOPER TOOLS
• Benchmarks
• Hive & Spark
• Avro, Json, ORC, and Parquet
• Three data sets (taxi, sales, github)
• Docker
• Allows automated builds on all supported Linux
variants
• Site source code is with C++ & Java
USING ORC
© 2019 Cloudera, Inc. All rights
reserved.
33
WHICH VERSION IS IT?
Engine Version ORC Version
Hive 0.11 to 2.2 Hive ORC 0.11 to 2.2
2.3 ORC 1.3
3.0 ORC 1.4
3.1 ORC 1.5
Spark hive * Hive ORC 1.2
Spark native 2.3 ORC 1.4
2.4 to 3.0 ORC 1.5
© 2019 Cloudera, Inc. All rights
reserved.
34
FROM SQL
• Hive:
• Add “stored as orc” to table definition
• Table properties override configuration for ORC
• Spark’s “spark.sql.orc.impl” controls
implementation
• native – Use ORC 1.5
• hive – Use ORC from Hive 1.2
© 2019 Cloudera, Inc. All rights
reserved.
35
FROM JAVA
• Use the ORC project rather than Hive’s ORC.
• Maven group id: org.apache.orc version: 1.6.2
• nohive classifier avoids interfering with Hive’s packages
• Two levels of access
• orc-core – Faster access, but uses Hive’s vectorized API
• orc-mapreduce – Row by row access, simpler OrcStruct API
• MapReduce API implements WritableComparable
• Can be shuffled
• Need to specify type information in configuration for shuffle
or output
© 2019 Cloudera, Inc. All rights
reserved.
36
FROM C++
• Pure C++ client library
• No JNI or JDK so client can estimate and control memory
• Uses pure C++ HDFS client from HDFS-8707
• Reader and writer are stable and in production use.
• Runs on Linux, Mac OS, and Windows.
• Docker scripts for CentOS 6-8, Debian 8-10, Ubuntu 14-18
• CI builds on Mac OS, Ubuntu, and Windows
© 2019 Cloudera, Inc. All rights
reserved.
37
FROM COMMAND LINE
• Using hive –orcfiledump from Hive
• -j -p – pretty prints the metadata as JSON
• -d – prints data as JSON
• Using java -jar orc-tools-*-uber.jar from ORC
• meta -j -p – print the metadata as JSON
• data – print data as JSON
• convert – convert CSV, JSON, or ORC to ORC
• json-schema – scan a set of JSON documents to find
schema
© 2019 Cloudera, Inc. All rights
reserved.
38
DEBUGGING
• Things to look for:
• Stripe size
• Rows/Stripe
• File version
• Writer version
• Width of schema
• Sanity of statistics
• Column encoding
• Size of dictionaries
OPTIMIZATION
© 2019 Cloudera, Inc. All rights
reserved.
40
STRIPE SIZE
• Makes a huge difference in performance
• orc.stripe.size or hive.exec.orc.default.stripe.size
• Controls the amount of buffer in writer. Default is
64MB
• Trade off
• Large = Large more efficient reads
• Small = Less memory and more granular
processing splits
• Multiple files written at the same time will shrink
stripes
© 2019 Cloudera, Inc. All rights
reserved.
41
HDFS BLOCK PADDING
• The stripes don’t align exactly with HDFS
blocks
• Unless orc.write.variable.length.blocks
• HDFS scatters blocks around cluster
• Often want to pad to block boundaries
• Costs space, but improves performance
• orc.default.block.padding
• orc.block.padding.tolerance
© 2019 Cloudera, Inc. All rights
reserved.
42
SPLIT CALCULATION
• BI
Small fast queries
Splits based on HDFS blocks
• ETL
Large queries
Read file footer and apply SearchArg to stripes
Can include footer in splits
(hive.orc.splits.include.file.footer)
• Hybrid
If small files or lots of files, use BI
CONCLUSION
© 2019 Cloudera, Inc. All rights
reserved.
44
FOR MORE INFORMATION
• The orc_proto.proto defines the ORC metadata
• Read code and especially OrcConf, which has all of the knobs
• Website on https://orc.apache.org/
• /bugs ⇒ jira repository
• /src ⇒ github repository
• /specification ⇒ format specification
• Apache email list dev@orc.apache.org
THANK YOU
Owen O’Malley
omalley@apache.org
@owen_omalley

More Related Content

What's hot

Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixRajeshbabu Chintaguntla
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
Thoughts on kafka capacity planning
Thoughts on kafka capacity planningThoughts on kafka capacity planning
Thoughts on kafka capacity planningJamieAlquiza
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
cLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHousecLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHouseAltinity Ltd
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 

What's hot (20)

Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
ORC Files
ORC FilesORC Files
ORC Files
 
Thoughts on kafka capacity planning
Thoughts on kafka capacity planningThoughts on kafka capacity planning
Thoughts on kafka capacity planning
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
cLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHousecLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHouse
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 

Similar to ORC Deep Dive 2020

A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberasodariyabhavesh
 
chapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guidechapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guideSanjeevSaharan5
 
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV Designing Embedded System with 8051...
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV  Designing Embedded System with 8051...SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV  Designing Embedded System with 8051...
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV Designing Embedded System with 8051...Arti Parab Academics
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPrashant Rane
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep divehuguk
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended CutWes McKinney
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh PPlusOrMinusZero
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceAntonio García-Domínguez
 

Similar to ORC Deep Dive 2020 (20)

A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Kafka overview v0.1
Kafka overview v0.1Kafka overview v0.1
Kafka overview v0.1
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
 
Assembler
AssemblerAssembler
Assembler
 
chapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guidechapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guide
 
HadoopFileFormats_2016
HadoopFileFormats_2016HadoopFileFormats_2016
HadoopFileFormats_2016
 
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV Designing Embedded System with 8051...
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV  Designing Embedded System with 8051...SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV  Designing Embedded System with 8051...
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV Designing Embedded System with 8051...
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep dive
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
Kirby, Fabro
Kirby, FabroKirby, Fabro
Kirby, Fabro
 

More from Owen O'Malley

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemOwen O'Malley
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACIDOwen O'Malley
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryptionOwen O'Malley
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 IcebergOwen O'Malley
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column EncryptionOwen O'Malley
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopOwen O'Malley
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to HiveOwen O'Malley
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File IntroductionOwen O'Malley
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduceOwen O'Malley
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroOwen O'Malley
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 

More from Owen O'Malley (20)

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 Iceberg
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Data protection2015
Data protection2015Data protection2015
Data protection2015
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to Hive
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 

Recently uploaded

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

ORC Deep Dive 2020

  • 1. ORC DEEP DIVE Owen O’Malley omalley@apache.org January 2020 @owen_omalley
  • 3. © 2019 Cloudera, Inc. All rights reserved. 3 REQUIREMENTS • Files had to be completely self describing • Schema • File version • Tight compression ⇒ Run Length Encoding (RLE) & compression • Column projection ⇒ segregate column data • Predicate pushdown ⇒ understand & index user’s types • Files had to be easy & fast to divide • Compatible with a write once file systems
  • 4. © 2019 Cloudera, Inc. All rights reserved. 4 FILE STRUCTURE • The file footer contains: • Metadata – schema, file statistics • Stripe information – metadata and location of stripes • Postscript with the compression, buffer size, & file version • ORC file data is divided into stripes. • Stripes are self contained sets of rows organized by columns. • Stripes are the smallest unit of work for tasks. • Default is ~64MB, but often configured larger.
  • 5. © 2019 Cloudera, Inc. All rights reserved. 5 STRIPE STRUCTURE • Within a stripe, the metadata data is in the stripe footer. • List of streams • Column encoding information (eg. direct or dictionary) • Columns are written as a set of streams. There are 3 kinds: • Index streams • Data streams • Dictionary streams
  • 6. © 2019 Cloudera, Inc. All rights reserved. 6 FILE STRUCTURE
  • 7. © 2019 Cloudera, Inc. All rights reserved. 7 READ PATH • The Reader reads last 16k of file, extra as needed • The RowReader reads • Stripe footer • Required streams
  • 8. © 2019 Cloudera, Inc. All rights reserved. 8 STREAMS • Streams are an independent sequence of bytes • Serialization into streams depends on column type & encoding • Optional pipeline stages: • Run Length Encoding (RLE) – first pass integer compression • Generic compression – Zlib, Snappy, LZO, Zstd • Encryption – AES/CTR
  • 10. © 2019 Cloudera, Inc. All rights reserved. 10 COMPOUND TYPES • Compound types are serialized as trees of columns. • struct, list, map, uniontype all have child columns • Types are numbered in a preorder traversal • The column reading classes are called TreeReadera: int, b: map<string, struct<c: string, d: double>>, e: timestamp
  • 11. © 2019 Cloudera, Inc. All rights reserved. 11 ENCODING COLUMNS • To interpret a stream, you need three pieces of information: • Column type • Column encoding (direct, dictionary) • Stream kind (present, data, length, etc.) • All columns, if they have nulls, will have a present stream • Serialized using a boolean RLE • Integer columns are serialized with • A data stream using integer RLE
  • 12. © 2019 Cloudera, Inc. All rights reserved. 12 ENCODING COLUMNS • Binary columns are serialized with: • Length stream of integer RLE • Data stream of raw sequence of bytes • String columns may be direct or dictionary encoded • Direct looks like binary column, but dictionary is different • Dictionary_data is raw sequence of dictionary bytes • Length is an integer RLE stream of the dictionary lengths • Data is an integer RLE stream of indexes into dictionary
  • 13. © 2019 Cloudera, Inc. All rights reserved. 13 ENCODING COLUMNS • Lists and maps record the number of child elements • Length is an integer RLE stream • Structs only have the present stream • Timestamps need nanosecond resolution (ouch!) • Data is an integer RLE of seconds from Jan 2015 • Secondary is an integer RLE of nanoseconds with 0 suppress
  • 14. © 2019 Cloudera, Inc. All rights reserved. 14 RUN LENGTH ENCODING • Goal is to get some cheap quick compression • Handles repeating/incrementing values • Handles integer byte packing • Two versions • Version 1 – relative simple repeat/literal encoding • Version 2 – complex encoding with 4 variants • Column encoding of *_V2 means use RLE version 2
  • 16. © 2019 Cloudera, Inc. All rights reserved. 16 ROW PRUNING • Three levels of indexing/row pruning • File – uses file statistics in file footer • Stripe – uses stripe statistics before file footer • Row group (default of 10k rows) – uses index stream • The index stream for each column includes for each row group • Column statistics (min, max, count, sum) • The start positions of each stream
  • 17. © 2019 Cloudera, Inc. All rights reserved. 17 SEARCH ARGUMENTS • Engines can pass Search Arguments (SArgs) to the RowReader. • Limited set of operations (=, <=>, <, <=, in, between, is null) • Compare one column to literal(s) • Can only eliminate entire row groups, stripes, or files. • Engine must still filter the individual rows afterwards • For Hive, ensure hive.optimize.index.filter is true.
  • 18. © 2019 Cloudera, Inc. All rights reserved. 18 COMPRESSION • All of the generic compression is done in chunks • Codec is reinitialized at start of chunk • Each chunk is compressed separately • Each uncompressed chunk is at most the buffer size • Each chunk has a 3 byte header giving: • Compressed size of chunk • Whether it is the original or compressed
  • 19. © 2019 Cloudera, Inc. All rights reserved. 19 INDEXES • Wanted ability to seek to each row group • Allows fine grain seeking & row pruning • Could have flushed stream compression pipeline • Would have dramatically lowered compression • Instead treat compression & RLE has gray boxes • Use our knowledge of compression & RLE • Always start fresh at beginning of chunk or run
  • 20. © 2019 Cloudera, Inc. All rights reserved. 20 INDEX POSITIONS • Records information to seek to a given row in all of a column’s streams • Includes: • C Compressed bytes • U Uncompressed bytes • V RLE values • C, U, & V jump to RG 4
  • 21. © 2019 Cloudera, Inc. All rights reserved. 21 BLOOM FILTERS • For use cases where you need to find particular values • Sorting by that column allows min/max filtering • But you can only sort on one column effectively • Bloom filters are probabilistic data structures • Only useful for equality, not less than or greater than • Need ~10 bits/distinct value ⇒ opt in • ORC uses a bloom_filter_utf8 stream to record a bloom filter per a row group
  • 22. © 2019 Cloudera, Inc. All rights reserved. 22 ROW PRUNING EXAMPLE • TPC-DS  from tpch1000.lineitem where l_orderkey = 1212000001; Index Rows Read Time Nothing 5,999,989,709 74 sec Min/Max 540,000 4.5 sec Bloom 10,000 1.3 sec
  • 24. © 2019 Cloudera, Inc. All rights reserved. 24 COMPATIBILITY • Within a file version, old readers must be able to read all files. • A few exceptions (eg. new codecs, types) • Version 0 (from Hive 0.11) • Only RLE V1 & string dictionary encoding • Version 1 (from Hive 0.12 forward) • Version 2 (under development) • The library includes ability to write any file version. • Enables smooth upgrades across clusters
  • 25. © 2019 Cloudera, Inc. All rights reserved. 25 WRITER VERSION • When fixes or feature additions are made to the writer, we bump the writer version. • Allows reader to work around bugs, especially in index • Does not affect reader compatibility • We should require each minor version adds a new one. • We also record which writer wrote the file: • Java, C++, Presto, Go
  • 26. © 2019 Cloudera, Inc. All rights reserved. 26 EXAMPLE WORKAROUND FOR HIVE-8746 • Timestamps suck! • ORC uses an epoch of 01-01-2015 00:00:00. • Timestamp columns record seconds offset from epoch • Unfortunately, the original code use local time zone. • If reader and writer were in time zones with the same rules, it worked. • Fix involved writing the writer time zone into file. • Forwards and backwards compatible
  • 28. © 2019 Cloudera, Inc. All rights reserved. 28 SCHEMA EVOLUTION • User passes desired schema to RecordReader factory. • SchemaEvolution class maps between file & reader schemas. • The mapping can be positional or name based. • Conversions based on legacy Hive behavior… • The RecordReader uses the mapping to translate • Choosing streams uses the file schema column ids • Type translation is done by ConvertTreeReaderFactory. • Adds an additional TreeReader that does conversion.
  • 29. © 2019 Cloudera, Inc. All rights reserved. 29 STRIPE CONCATENATION & FLUSH • ORC has a special operator to concatenate files • Requires consistent options & schema • Concatenates stripes without reserialization • ORC can flush the current contents including a file footer while still writing to the file. • Writes a side file with the current offset of the file tail • When the file closes the intermediate file footers are ignored
  • 30. © 2019 Cloudera, Inc. All rights reserved. 30 COLUMN ENCRYPTION • Released in ORC 1.6 • Allows consistent column level access control across engines • Writes two variants of data • Encrypted original • Unencrypted statically masked • Each variant has its own streams & encodings • Each column has a unique local key, which is encrypted by KMS
  • 31. © 2019 Cloudera, Inc. All rights reserved. 31 OTHER DEVELOPER TOOLS • Benchmarks • Hive & Spark • Avro, Json, ORC, and Parquet • Three data sets (taxi, sales, github) • Docker • Allows automated builds on all supported Linux variants • Site source code is with C++ & Java
  • 33. © 2019 Cloudera, Inc. All rights reserved. 33 WHICH VERSION IS IT? Engine Version ORC Version Hive 0.11 to 2.2 Hive ORC 0.11 to 2.2 2.3 ORC 1.3 3.0 ORC 1.4 3.1 ORC 1.5 Spark hive * Hive ORC 1.2 Spark native 2.3 ORC 1.4 2.4 to 3.0 ORC 1.5
  • 34. © 2019 Cloudera, Inc. All rights reserved. 34 FROM SQL • Hive: • Add “stored as orc” to table definition • Table properties override configuration for ORC • Spark’s “spark.sql.orc.impl” controls implementation • native – Use ORC 1.5 • hive – Use ORC from Hive 1.2
  • 35. © 2019 Cloudera, Inc. All rights reserved. 35 FROM JAVA • Use the ORC project rather than Hive’s ORC. • Maven group id: org.apache.orc version: 1.6.2 • nohive classifier avoids interfering with Hive’s packages • Two levels of access • orc-core – Faster access, but uses Hive’s vectorized API • orc-mapreduce – Row by row access, simpler OrcStruct API • MapReduce API implements WritableComparable • Can be shuffled • Need to specify type information in configuration for shuffle or output
  • 36. © 2019 Cloudera, Inc. All rights reserved. 36 FROM C++ • Pure C++ client library • No JNI or JDK so client can estimate and control memory • Uses pure C++ HDFS client from HDFS-8707 • Reader and writer are stable and in production use. • Runs on Linux, Mac OS, and Windows. • Docker scripts for CentOS 6-8, Debian 8-10, Ubuntu 14-18 • CI builds on Mac OS, Ubuntu, and Windows
  • 37. © 2019 Cloudera, Inc. All rights reserved. 37 FROM COMMAND LINE • Using hive –orcfiledump from Hive • -j -p – pretty prints the metadata as JSON • -d – prints data as JSON • Using java -jar orc-tools-*-uber.jar from ORC • meta -j -p – print the metadata as JSON • data – print data as JSON • convert – convert CSV, JSON, or ORC to ORC • json-schema – scan a set of JSON documents to find schema
  • 38. © 2019 Cloudera, Inc. All rights reserved. 38 DEBUGGING • Things to look for: • Stripe size • Rows/Stripe • File version • Writer version • Width of schema • Sanity of statistics • Column encoding • Size of dictionaries
  • 40. © 2019 Cloudera, Inc. All rights reserved. 40 STRIPE SIZE • Makes a huge difference in performance • orc.stripe.size or hive.exec.orc.default.stripe.size • Controls the amount of buffer in writer. Default is 64MB • Trade off • Large = Large more efficient reads • Small = Less memory and more granular processing splits • Multiple files written at the same time will shrink stripes
  • 41. © 2019 Cloudera, Inc. All rights reserved. 41 HDFS BLOCK PADDING • The stripes don’t align exactly with HDFS blocks • Unless orc.write.variable.length.blocks • HDFS scatters blocks around cluster • Often want to pad to block boundaries • Costs space, but improves performance • orc.default.block.padding • orc.block.padding.tolerance
  • 42. © 2019 Cloudera, Inc. All rights reserved. 42 SPLIT CALCULATION • BI Small fast queries Splits based on HDFS blocks • ETL Large queries Read file footer and apply SearchArg to stripes Can include footer in splits (hive.orc.splits.include.file.footer) • Hybrid If small files or lots of files, use BI
  • 44. © 2019 Cloudera, Inc. All rights reserved. 44 FOR MORE INFORMATION • The orc_proto.proto defines the ORC metadata • Read code and especially OrcConf, which has all of the knobs • Website on https://orc.apache.org/ • /bugs ⇒ jira repository • /src ⇒ github repository • /specification ⇒ format specification • Apache email list dev@orc.apache.org