Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Iceberg Presentation for the St. Louis Big Data IDEA

Presentation on Apache Iceberg for the February 2021 St. Louis Big Data IDEA. Apache Iceberg is an alternative database platform that works with Hive and Spark.

  • Be the first to comment

  • Be the first to like this

Apache Iceberg Presentation for the St. Louis Big Data IDEA

  1. 1. Apache Iceberg Scott Shaw
  2. 2. 2 © 2021 Cloudera, Inc. All rights reserved. What is Apache Iceberg? • Efficient Table Format – Hidden Partitioning – Schema Evolution – Time Travel • Presto, Hive, Spark • Created at Netflix (2017). • Used at Adobe, Apple, LinkedIn, Experian
  3. 3. 3 © 2021 Cloudera, Inc. All rights reserved. What are the Challenges? • Data Scalability • Atomicity • Performance Degradation • Complexity • Object Stores • Storage and Compute • File System (Listing)
  4. 4. ARCHITECTURE
  5. 5. 5 © 2021 Cloudera, Inc. All rights reserved. Architecture Spark Presto HDFS Object Store Iceberg
  6. 6. 6 © 2021 Cloudera, Inc. All rights reserved. Architecture Snapshot (01) Manifest List Manifest Files Manifest Manifest List Snapshot (02) Files Files
  7. 7. WORKING WITH ICEBERG
  8. 8. 8 © 2021 Cloudera, Inc. All rights reserved. Initial Setup • Catalogs – Working with SQL – System Information
  9. 9. 9 © 2021 Cloudera, Inc. All rights reserved. Spark spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.0 --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog --conf spark.sql.catalog.spark_catalog.type=hive --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.local.type=hadoop --conf spark.sql.catalog.local.warehouse=$PWD/warehouse Adding a Catalog Creating a Table CREATE TABLE local.db.table (id bigint, data string) USING iceberg
  10. 10. 10 © 2021 Cloudera, Inc. All rights reserved. Hive add jar /path/to/iceberg-hive-runtime.jar; Add the jar file Create an External Table CREATE EXTERNAL TABLE table_a STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://some_bucket/some_path/table_a';
  11. 11. REFERENCES
  12. 12. 12 © 2021 Cloudera, Inc. All rights reserved. References Apache Iceberg: https://iceberg.apache.org/ Project Nessie: https://projectnessie.org/ Hive/Iceberg Integration: https://github.com/ExpediaGroup/hiveberg Partitioning: https://developer.ibm.com/technologies/artificial-intelligence/articles/the-why-and-how-of-partitioning-in-apache-iceberg/?utm_source=the newstack&utm_medium=website&utm_campaign=platform Iceberg Explained: https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/

    Be the first to comment

Presentation on Apache Iceberg for the February 2021 St. Louis Big Data IDEA. Apache Iceberg is an alternative database platform that works with Hive and Spark.

Views

Total views

127

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

12

Shares

0

Comments

0

Likes

0

×