A presentation by Anu Engineer of Cloudera regarding the state of the Ozone subproject. He covers a brief introduction of what Ozone is, and where it's headed.
This is taken from the Apache Hadoop Contributors Meetup on January 30, hosted by LinkedIn in Mountain View.
2. Agenda
● Overview of Ozone
● Scale
● Deployment
● S3 Gateway support
● Ozone I/O path
● Security
● HA
● Current Status and work in progress
● Release Plan
3. Why Ozone?
● HDFS has scaling problems
● Some users have "make your HDFS healthy” day.
● 200 million files for regular users
● companies with committers/core devs - 400-600 million
● New Opportunities and Challenges
○ Cloud
○ Streaming
○ Small files are the norm
4. What is Ozone?
● Object Store for Big Data
● Scale both in terms of objects, IOPS.
● Name node is not a bottleneck anymore.
● A set of micro-services each capable of doing its own stuff.
● Leverage learnings from supporting HDFS across a large set of use cases.
● Apache YARN, Map Reduce, Spark, Hive are all tested and certified to work
with Apache Ozone. No application changes are required to work with Ozone.
● Supports K8s, CSI and ability to run on K8s natively.
● A spiritual successor to HDFS.
5. When should we use Ozone?
● If you have a scale issue - Files or Throughput.
● If you need an archival store for HDFS or large data store.
● If you need S3 or cloud-like presence on-prem.
● If you want to set up dedicated storage clusters.
● If you have lots of small files.
● If you are moving to K8s, and needs a big data capable file system.
6. Current State of Ozone - Road Map
● We are maintaining a two months release cadence.
● Two Alpha Releases - Done
○ Arches Release 0.2.1 - Basic functionality
○ Acadia Release 0.3.0 - S3 Support
● Third Alpha On the way
○ Release 0.4.0 - BadLands - Security Support, Stability improvements - Hopefully in the next 2-
to-3 weeks. In the feature complete state and entering testing phase.
● Follow up with three betas
○ Beta - 1 Release 0.5.0 - Crater Lake - High Availability, First class K8s support, Topology
awareness
○ Beta 2 - In-Place Upgrades, Stability improvements
○ Beta 3 - Erasure Coding Support
● GA
7. What are Ozone’s Microservices?
● Namenode or Ozone Manager which deals with file names.
● Block Server or SCM which deals with block allocation and Physical Servers.
● Fsck Server - Control Plane.
● S3 Gateway
● Datanodes
8.
9. Let us talk about scale
● Ozone is designed for scale. The first release of Ozone will officially support
10 billion keys.
● Ozone achieves this by a combination of factors.
○ Partial namespace in Memory - That is file system metadata is loaded on demand.
○ Off-Heap Memory usage - To avoid too much GC, we rely on off-heap native memory. This
allows us to get away from GC issues.
○ Multiple Ozone Managers and Block Services - Users can scale OM or SCM independently.
The end-users will not even know since the Ozone protocol does this scaling automatically.
○ Creating large aggregations of metadata called Storage containers.
○ Distributing Metadata more evenly across the cluster including Datanodes.
○ Multiple OMs and also will have the ability to read from the secondaries. We are looking very
closely at the work done at Linkedin -- Consistent Reads from Standby.
11. Let us talk about correctness and consistency
● Uses verified protocols like RAFT for consensus
● RocksDB for metadata storage
● Reliance on off-the-shelf, well tested components
● Easy to test and build
● We test with internal applications and HDP test suites
○ Blockade Tests - We are currently running tests that inject errors and failures in a cluster.
○ TPC- DS Tests - Working with Hive/LLAP team to get a 1TB with large number of clients.
○ Starting Alpha deployments with customer proof-of-concept clusters.
○ Porting real-world workloads using Apache Spark to Ozone.
12. S3 is the new NFS
● Data ingestion is the first challenge the users have.
● Bringing data into the cluster from various outside sources.
○ Most simple and straightforward -- NFS
● S3 is the new kid in the block; there are many tools and SDKs and existing
applications that write to S3.
● With HDFS, NFS was an afterthought.
● With Ozone, S3 is the first-class interface, and we encourage our users to
use S3.
13. Ozone - Write Path
Create a file
● Blocks are allocated by OM/SCM.
● Blocks are written directly to data nodes
● Very similar to HDFS
● When a file is closed, it is visible for others to use.
14.
15. Ozone - Read Path
● Reads the block locations from OM.
● The client reads data directly from Datanodes
● AKA, same old HDFS protocol.
● Ozone relies on all things good in HDFS.
● Including source code..
16.
17. Let us talk about Security
● HDFS security is based on Kerberos.
● Kerberos cannot sustain the scale of applications running in a Hadoop
Cluster.
● So HDFS relies on Delegation tokens and block tokens.
● Ozone uses the same, so applications have no change.
● SCM comes with its own Certificate Authority.
● End users do NOT need to know about it.
● Allows us to move away from the need of Kerberos setup for each data node.
We need only Kerberos on OM and SCM.
● Security is on-by-default, Not an afterthought.
● Just merged HDDS-4 into Trunk, next release will have security.
18. Let us talk about HA
● Like HDFS, Ozone will have HA.
● Unlike HDFS, HA is a built-in feature of Ozone.
● Users need to deploy three instances of OM/SCM. That is it.
● HA is automatic even when you run a single node, OM assumes it is in a
single HA configuration mode.
19. Let us talk about Testing
● Ozone uses K8s based clusters for Testing.
● Both long running and ephemeral clusters are regularly tested.
● Uses a load generator called Freon ( earlier called Corona - after the
chemical process that creates ozone)
● Apache Spark, YARN and Hive used to run workloads against Ozone.
● S3AFileSystem and other open-source test suites used to test S3 Gateway
Support.
● Blockade based tests to make sure that error handling and cluster level
failures are tolerated.
20. A Path to GA - Things in progress
● Stability and Scale Testing -
○ Chaos Monkey Testing
○ TPC-DS
○ Scale Testing with some partners.
● TDE - Encryption-at-rest , You will see a patch soon.
● Network Topology Support
● HA Support - HA support patches are landing in the trunk. We have made
excellent progress and we hope have HA support in the next 0.5.0 release.
● In-Place upgrades - Ability to upgrade HDFS clusters to Ozone, In Design
Phase.
● Erasure coding Support, In Design Phase.
22. Ozone HA & Network Topology & TDE
● Ozone will support network topology very similar to HDFS.
● https://issues.apache.org/jira/browse/HDDS-698
● Very good progress made.
● Ozone HA uses Ratis
● Deploying 3 OMs and setting up some configuration is all that is needed.
● Detailed Design Documents posted in https://issues.apache.org/jira/browse/HDDS-505
● TDE - Transparent Data Encryption support is in works and you will see a design and patch soon.
23. In-Place Upgrades
● Compute a mapping from HDFS blocks to SCM Containers.
● Create the OM and SCM metadata from HDFS FSImage.
● SCM communicates this mapping to all Datanodes.
● Datanodes reply with creating hard links to HDFS blocks.
● Once the upgrade is done, data is available in HDFS and Ozone.
● If you delete data in HDFS, it does not affect data in Ozone or vice-versa.
24. Erasure coding Support
● We have a design in place, we will post that soon to JIRAs.
● Very similar to HDFS EC support in the way we do it.
● However, the User Interfaces will be completely different.
● Ozone computes how much to Erasure code
○ To avoid the exact repair problem for EC chunks
○ HDFS pushes that decision to users.
○ Hence in Ozone users don’t have to say EC code this.
○ Ozone can automatically do it, as and when needed and it makes sense.
● Users can also pick a file and ask for explicit Erasure coding support.