Achieving Lakehouse Models with Spark 3.0

•

3 likes•636 views

It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?

Data & Analytics

Achieving Lakehouse Models
with Spark 3.0
Simon Whiteley
Director of Engineering, Advancing Analytics

Agenda
Why Lakehouse?
Kimball Problems
Delta & Spark 3.0
▪ SCD & SQL Merge
▪ Dynamic Partition Pruning
▪ Adaptive Query Execution
Enabling the Lakehouse

The Lakehouse
RAW BASE ENRICHED
DELTA DELTA

When you think Warehouse…
We automatically think of Star
Schemas and Kimball warehousing
approaches.
A large central fact table with smaller
reference dimensions… some of which
aren’t so small

Literally Everyone
(All The time)
“You can’t use Kimball in a
Data Lake”

Three Historical Challenges
▪ Slowly Changing Dimensions
▪ Filtering Dimensions
▪ General SQL Performance

SCD - Enabling the Familiar
PrimaryKey Address Current EffectiveDate EndDate
11 A new customer address TRUE 03/08/2020 null
58 Yet another address TRUE 03/08/2020 null
41 A different address TRUE 03/08/2020 null
PrimaryKey Address Current EffectiveDate EndDate
11 A new customer address FALSE 03/08/2020 22/10/2020
11 An updated address TRUE 22/10/2020 null
58 Yet another address TRUE 03/08/2020 null
41 A different address TRUE 03/08/2020 null

SCD - Merge Commands
MERGE INTO dataai.addresses as original
USING updates
ON original.primaryKey = updates.primaryKey
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED THEN
INSERT *
Available in SQL, Scala and Python APIs - the merge command has made many complex warehousing
jobs accessible to the wider Analytics community
This is enabled by the Delta file format

Spark Partitioning
SELECT * FROM Sales WHERE Month
= 3
SQL Query Action
Filtering performed by
selectively reading files
SALES
Month=1 Month=2
Month=3 Month=4

Cross-Filter Spark 2.4
SELECT * FROM Sales JOIN Date
WHERE DateMonth = 3
SQL Query Action
SALES
Month=1 Month=2
Month=3 Month=4
DimDATE
Partition Keys not hit when
filtering on joined tables

Cross-Filter Spark 3.0
SELECT * FROM Sales JOIN Date
WHERE DateMonth = 3
SQL Query Action
SALES
Month=1 Month=2
Month=3 Month=4
DimDATE
Dynamic Partition Pruning
determines partition filters
during runtime

AQE in Spark 3.0
AQE will speed up common queries in a number of ways:
▪ Coalescing Shuffle Partitions
▪ Switching Join Strategies
▪ Optimizing Skew Joins

Before AQE - Shuffle Coalescing
Read RDDs
Read RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Shuffle RDDs
Write RDDs
Write RDDs
2 Tasks 200 Tasks 2 Tasks

Using Spark 3.0 AQE - Shuffle Coalescing
Read RDDs
Read RDDs
Write RDDs
Write RDDs
2 Tasks 2 Tasks 2 Tasks
Shuffle RDDs
Shuffle RDDs

The Data Lakehouse
Delta & Spark 3.0 enable the Lakehouse through:
▪ Enabling familiar (SQL) patterns
▪ Removing technical barriers
▪ Targeting performance of common
warehousing activities

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

What's hot

Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard

Paris Data Engineers !

Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh? In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry. The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems. This session is targeted for architects, decision-makers, data-engineers, and system designers.

Architect’s Open-Source Guide for a Data Mesh Architecture

Databricks

Building Lakehouses on Delta Lake with SQL Analytics Primer

Databricks

Snowflake Architecture.pptx

chennakesava44

Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.

Databricks Delta Lake and Its Benefits

Databricks

Many had dubbed 2020 as the decade of data. This is indeed an era of data zeitgeist. From code-centric software development 1.0, we are entering software development 2.0, a data-centric and data-driven approach, where data plays a central theme in our everyday lives. As the volume and variety of data garnered from myriad data sources continue to grow at an astronomical scale and as cloud computing offers cheap computing and data storage resources at scale, the data platforms have to match in their abilities to process, analyze, and visualize at scale and speed and with ease — this involves data paradigm shifts in processing and storing and in providing programming frameworks to developers to access and work with these data platforms. In this talk, we will survey some emerging technologies that address the challenges of data at scale, how these tools help data scientists and machine learning developers with their data tasks, why they scale, and how they facilitate the future data scientists to start quickly. In particular, we will examine in detail two open-source tools MLflow (for machine learning life cycle development) and Delta Lake (for reliable storage for structured and unstructured data). Other emerging tools such as Koalas help data scientists to do exploratory data analysis at scale in a language and framework they are familiar with as well as emerging data + AI trends in 2021. You will understand the challenges of machine learning model development at scale, why you need reliable and scalable storage, and what other open source tools are at your disposal to do data science and machine learning at scale.

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...

Databricks

Delta Lake with Azure Databricks

Dustin Vannoy

The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse. Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today. Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow. This is an educational event. Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.

Data Lakehouse Symposium | Day 4

Databricks

Data Lakehouse, Data Mesh, and Data Fabric (r1)

James Serra

Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.

Modernizing to a Cloud Data Architecture

Databricks

Delta lake and the delta architecture

Adam Doyle

Introducing Databricks Delta

Databricks

DW Migration Webinar-March 2022.pptx

Databricks

Data platform modernization with Databricks.pptx

CalvinSim10

This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems. Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/) Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.

Data Mesh Part 4 Monolith to Mesh

Jeffrey T. Pollock

Introducing the Snowflake Computing Cloud Data Warehouse

Snowflake Computing

Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization. Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support. In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.

Data Warehouse or Data Lake, Which Do I Choose?

DATAVERSITY

Time to Talk about Data Mesh

LibbySchulze

Getting Started with Delta Lake on Databricks

Knoldus Inc.

Building an open data platform with apache iceberg

Alluxio, Inc.

What's hot (20)

Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard

Architect’s Open-Source Guide for a Data Mesh Architecture

Building Lakehouses on Delta Lake with SQL Analytics Primer

Snowflake Architecture.pptx

Databricks Delta Lake and Its Benefits

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...

Delta Lake with Azure Databricks

Data Lakehouse Symposium | Day 4

Data Lakehouse, Data Mesh, and Data Fabric (r1)

Modernizing to a Cloud Data Architecture

Delta lake and the delta architecture

Introducing Databricks Delta

DW Migration Webinar-March 2022.pptx

Data platform modernization with Databricks.pptx

Data Mesh Part 4 Monolith to Mesh

Introducing the Snowflake Computing Cloud Data Warehouse

Data Warehouse or Data Lake, Which Do I Choose?

Time to Talk about Data Mesh

Getting Started with Delta Lake on Databricks

Building an open data platform with apache iceberg

Similar to Achieving Lakehouse Models with Spark 3.0

Jump Start on Apache Spark 2.2 with Databricks

Anyscale

Svccg nosql 2011_v4

Sid Anand

Celtra provides a platform for streamlined ad creation and campaign management used by customers including Porsche, Taco Bell, and Fox to create, track, and analyze their digital display advertising. Celtra’s platform processes billions of ad events daily to give analysts fast and easy access to reports and ad hoc analytics. Celtra’s Grega Kešpret leads a technical dive into Celtra’s data-pipeline challenges and explains how it solved them by combining Snowflake’s cloud data warehouse with Spark to get the best of both. Topics include: - Why Celtra changed its pipeline, materializing session representations to eliminate the need to rerun its pipeline - How and why it decided to use Snowflake rather than an alternative data warehouse or a home-grown custom solution - How Snowflake complemented the existing Spark environment with the ability to store and analyze deeply nested data with full consistency - How Snowflake + Spark enables production and ad hoc analytics on a single repository of data

Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks

Grega Kespret

700 Updatable Queries Per Second: Spark as a Real-Time Web Service

Evan Chan

700 Queries Per Second with Updates: Spark As A Real-Time Web Service

Spark Summit

Apache Spark 2.x has laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data. In this introductory part lecture and part hands-on workshop you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas: Apache Spark Fundamentals & Concepts What’s new in Spark 2.x SparkSessions vs SparkContexts Datasets/Dataframes and Spark SQL Introduction to Structured Streaming concepts and APIs

Jump Start with Apache Spark 2.0 on Databricks

Anyscale

Comcast is one of the leading providers of communications, entertainment, and cable products and services. At the heart of it is Comcast RDK providing the backbone of telemetry to the industry. RDK (Reference Design Kit) is pre-bundled opensource firmware for a complete home platform covering video, broadband and IoT devices. RDK team at Comcast analyzes petabytes of data, collected every 15 minutes from 70 million devices (video and broadband and IoT devices) installed in customer homes. They run ETL and aggregation pipelines and publish analytical dashboards on a daily basis to reduce customer calls and firmware rollout. The analysis is also used to calculate WIFI happiness index which is a critical KPI for Comcast customer experience. In addition to this, RDK team also does release tracking by analyzing the RDK firmware quality. SQL Analytics allows customers to operate a lakehouse architecture that provides data warehousing performance at data lake economics for up to 4x better price/performance for SQL workloads than traditional cloud data warehouses. We present the results of the “Test and Learn” with SQL Analytics and the delta engine that we worked in partnership with the Databricks team. We present a quick demo introducing the SQL native interface, the challenges we faced with migration, The results of the execution and our journey of productionizing this at scale.

SQL Analytics Powering Telemetry Analysis at Comcast

Databricks

Apache Spark 2.0 and subsequent releases of Spark 2.1 and 2.2 have laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data. In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas: Agenda: • Overview of Spark Fundamentals & Architecture • What’s new in Spark 2.x • Unified APIs: SparkSessions, SQL, DataFrames, Datasets • Introduction to DataFrames, Datasets and Spark SQL • Introduction to Structured Streaming Concepts • Four Hands On Labs You will use Databricks Community Edition, which will give you unlimited free access to a ~6 GB Spark 2.x local mode cluster. And in the process, you will learn how to create a cluster, navigate in Databricks, explore a couple of datasets, perform transformations and ETL, save your data as tables and parquet files, read from these sources, and analyze datasets using DataFrames/Datasets API and Spark SQL. Level: Beginner to intermediate, not for advanced Spark users. Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB. Introductory or basic knowledge Scala or Python is required, since the Notebooks will be in Scala; Python is optional. Bio: Jules S. Damji is an Apache Spark Community Evangelist with Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, LoudCloud/Opsware, VeriSign, Scalix, and ProQuest, building large-scale distributed systems. Before joining Databricks, he was a Developer Advocate at Hortonworks.

Jump Start on Apache® Spark™ 2.x with Databricks

Databricks

In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas: Agenda: • Overview of Spark Fundamentals & Architecture • What’s new in Spark 2.x • Unified APIs: SparkSessions, SQL, DataFrames, Datasets • Introduction to DataFrames, Datasets and Spark SQL • Introduction to Structured Streaming Concepts • Four Hands On Labs You will use Databricks Community Edition, which will give you unlimited free access to a ~6 GB Spark 2.x local mode cluster. And in the process, you will learn how to create a cluster, navigate in Databricks, explore a couple of datasets, perform transformations and ETL, save your data as tables and parquet files, read from these sources, and analyze datasets using DataFrames/Datasets API and Spark SQL. Level: Beginner to intermediate, not for advanced Spark users. Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB. Introductory or basic knowledge Scala or Python is required, since the Notebooks will be in Scala; Python is optional. Bio: Jules S. Damji is an Apache Spark Community Evangelist with Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, LoudCloud/Opsware, VeriSign, Scalix, and ProQuest, building large-scale distributed systems. Before joining Databricks, he was a Developer Advocate at Hortonworks.

Jumpstart on Apache Spark 2.2 on Databricks

Databricks

In this presentation, attendees will learn how to determine the best approach for tuning SQL statements and other issues by identifying specific causes of slow performance. Real-life case studies will be used to demonstrate: 1.How to use Response Time Analysis (RTA) to quickly identify the biggest problems in a database 2.How to utilize the MDA tables to understand what resources are being used or waited on 3.Tips and techniques on examining and further fine-tuning the execution plan

Advanced ASE Performance Tuning Tips

SAP Technology

The Apache Pulsar community is collaborating with the Delta Lake community to add up to both ecosystems: Pulsar - Delta Lake Connector. In this session, we will first provide an overview of Pulsar - Delta Lake connector and Delta Lake Standalone Reader, then introduce the design of Pulsar - Delta Lake CDC source connector including how to capture data change of Delta lake and how to recover from last checkpoint with the help of Pulsar Function state store . We will also discuss the scalability of this Pulsar - Delta lake CDC source connector.

Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...

StreamNative

NewSQL - Deliverance from BASE and back to SQL and ACID

Tony Rogerson

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2mAKgJi. Ian Nowland and Joel Barciauskas talk about the challenges Datadog faces as the company has grown its real-time metrics systems that collect, process, and visualize data to the point they now handle trillions of points per day. They also talk about how the architecture has evolved, and what they are looking to in the future as they architect for a quadrillion points per day. Filmed at qconnewyork.com. Ian Nowland is the VP Engineering Metrics and Alerting at Datadog. Joel Barciauskas currently leads Datadog's distribution metrics team, providing accurate, low latency percentile measures for customers across their infrastructure.

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

C4Media

It has long been said that business intelligence needs a relational warehouse, but that view is changing. With the Lakehouse architecture being shouted from the rooftops, Databricks have released SQL Analytics, an alternative workspace for SQL-savvy users to interact with an analytics-tuned cluster. But how does it work? Where do you start? What does a typical Data Analyst’s user journey look like with the tool? This session will introduce the new workspace and walk through the various key features – how you set up a SQL Endpoint, the query workspace, creating rich dashboards and connecting up BI tools such as Microsoft Power BI. If you’re truly trying to create a Lakehouse experience that satisfies your SQL-loving Data Analysts, this is a tool you’ll need to be familiar with and include in your design patterns, and this session will set you on the right path.

Getting Started with Databricks SQL Analytics

Databricks

Tech-Spark: Scaling Databases

Ralph Attard

In this knolx session, we will come to know about Delta Lake and its features. Delta Lake is one of the greatest innovations by Databricks that makes existing data lakes more scalable and reliable. Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of our existing data lake and is fully compatible with Apache Spark APIs.

Spark with Delta Lake

Knoldus Inc.

Presto for apps deck varada prestoconf

Ori Reshef

"The common use cases of Spark SQL include ad hoc analysis, logical warehouse, query federation, and ETL processing. Spark SQL also powers the other Spark libraries, including structured streaming for stream processing, MLlib for machine learning, and GraphFrame for graph-parallel computation. For boosting the speed of your Spark applications, you can perform the optimization efforts on the queries prior employing to the production systems. Spark query plans and Spark UIs provide you insight on the performance of your queries. This talk discloses how to read and tune the query plans for enhanced performance. It will also cover the major related features in the recent and upcoming releases of Apache Spark. "

Understanding Query Plans and Spark UIs

Databricks

Sql Server 2005 Business Inteligence

abercius24

Modernise Your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum In this session you will learn how to migrate and modernise your legacy data warehouse, moving from an on-premises server or application, to the cloud. You will learn how to easily migrate your data by leveraging serverless ETL, data cataloging as well as the techniques needed to successfully modernise your data warehouse, reduce costs, and increase performance and scalability. David McAmis, Big Data Consultant, AWS Professional Services, Amazon Web Services

Modernise your Data Warehouse - AWS Summit Sydney 2018

Amazon Web Services

Similar to Achieving Lakehouse Models with Spark 3.0 (20)

Jump Start on Apache Spark 2.2 with Databricks

Svccg nosql 2011_v4

Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks

700 Updatable Queries Per Second: Spark as a Real-Time Web Service

700 Queries Per Second with Updates: Spark As A Real-Time Web Service

Jump Start with Apache Spark 2.0 on Databricks

SQL Analytics Powering Telemetry Analysis at Comcast

Jump Start on Apache® Spark™ 2.x with Databricks

Jumpstart on Apache Spark 2.2 on Databricks

Advanced ASE Performance Tuning Tips

Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...

NewSQL - Deliverance from BASE and back to SQL and ACID

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Getting Started with Databricks SQL Analytics

Tech-Spark: Scaling Databases

Spark with Delta Lake

Presto for apps deck varada prestoconf

Understanding Query Plans and Spark UIs

Sql Server 2005 Business Inteligence

Modernise your Data Warehouse - AWS Summit Sydney 2018

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 1

Databricks

Data Lakehouse Symposium | Day 1 | Part 2

Databricks

Data Lakehouse Symposium | Day 2

Databricks

In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Databricks

Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale. At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including: Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal Performing data quality validations using libraries built to work with spark Dynamically generating pipelines that can be abstracted away from users Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time

Democratizing Data Quality Through a Centralized Platform

Databricks

Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.

Learn to Use Databricks for Data Science

Databricks

Application performance monitoring (APM) has become the cornerstone of software engineering allowing engineering teams to quickly identify and remedy production issues. However, as the world moves to intelligent software applications that are built using machine learning, traditional APM quickly becomes insufficient to identify and remedy production issues encountered in these modern software applications. As a lead software engineer at NewRelic, my team built high-performance monitoring systems including Insights, Mobile, and SixthSense. As I transitioned to building ML Monitoring software, I found the architectural principles and design choices underlying APM to not be a good fit for this brand new world. In fact, blindly following APM designs led us down paths that would have been better left unexplored. In this talk, I draw upon my (and my team’s) experience building an ML Monitoring system from the ground up and deploying it on customer workloads running large-scale ML training with Spark as well as real-time inference systems. I will highlight how the key principles and architectural choices of APM don’t apply to ML monitoring. You’ll learn why, understand what ML Monitoring can successfully borrow from APM, and hear what is required to build a scalable, robust ML Monitoring architecture.

Why APM Is Not the Same As ML Monitoring

Databricks

Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Databricks

In this talk, I will dive into the stage level scheduling feature added to Apache Spark 3.1. Stage level scheduling extends upon Project Hydrogen by improving big data ETL and AI integration and also enables multiple other use cases. It is beneficial any time the user wants to change container resources between stages in a single Apache Spark application, whether those resources are CPU, Memory or GPUs. One of the most popular use cases is enabling end-to-end scalable Deep Learning and AI to efficiently use GPU resources. In this type of use case, users read from a distributed file system, do data manipulation and filtering to get the data into a format that the Deep Learning algorithm needs for training or inference and then sends the data into a Deep Learning algorithm. Using stage level scheduling combined with accelerator aware scheduling enables users to seamlessly go from ETL to Deep Learning running on the GPU by adjusting the container requirements for different stages in Spark within the same application. This makes writing these applications easier and can help with hardware utilization and costs. There are other ETL use cases where users want to change CPU and memory resources between stages, for instance there is data skew or perhaps the data size is much larger in certain stages of the application. In this talk, I will go over the feature details, cluster requirements, the API and use cases. I will demo how the stage level scheduling API can be used by Horovod to seamlessly go from data preparation to training using the Tensorflow Keras API using GPUs. The talk will also touch on other new Apache Spark 3.1 functionality, such as pluggable caching, which can be used to enable faster dataframe access when operating from GPUs.

Stage Level Scheduling Improving Big Data and AI Integration

Databricks

In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks. Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model? The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity. The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters. In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Databricks

There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal. In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.

Scaling your Data Pipelines with Apache Spark on Kubernetes

Databricks

Pipelines have become ubiquitous, as the need for stringing multiple functions to compose applications has gained adoption and popularity. Common pipeline abstractions such as “fit” and “transform” are even shared across divergent platforms such as Python Scikit-Learn and Apache Spark. Scaling pipelines at the level of simple functions is desirable for many AI applications, however is not directly supported by Ray’s parallelism primitives. In this talk, Raghu will describe a pipeline abstraction that takes advantage of Ray’s compute model to efficiently scale arbitrarily complex pipeline workflows. He will demonstrate how this abstraction cleanly unifies pipeline workflows across multiple platforms such as Scikit-Learn and Spark, and achieves nearly optimal scale-out parallelism on pipelined computations. Attendees will learn how pipelined workflows can be mapped to Ray’s compute model and how they can both unify and accelerate their pipelines with Ray.

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Databricks

In this talk about zipline, we will introduce a new type of windowing construct called a sawtooth window. We will describe various properties about sawtooth windows that we utilize to achieve online-offline consistency, while still maintaining high-throughput, low-read latency and tunable write latency for serving machine learning features.We will also talk about a simple deployment strategy for correcting feature drift – due operations that are not “abelian groups”, that operate over change data.

Sawtooth Windows for Feature Aggregations

Databricks

We want to present multiple anti patterns utilizing Redis in unconventional ways to get the maximum out of Apache Spark.All examples presented are tried and tested in production at Scale at Adobe. The most common integration is spark-redis which interfaces with Redis as a Dataframe backing Store or as an upstream for Structured Streaming. We deviate from the common use cases to explore where Redis can plug gaps while scaling out high throughput applications in Spark. Niche 1 : Long Running Spark Batch Job – Dispatch New Jobs by polling a Redis Queue · Why? o Custom queries on top a table; We load the data once and query N times · Why not Structured Streaming · Working Solution using Redis Niche 2 : Distributed Counters · Problems with Spark Accumulators · Utilize Redis Hashes as distributed counters · Precautions for retries and speculative execution · Pipelining to improve performance

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Databricks

In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data. In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.

Re-imagine Data Monitoring with whylogs and Spark

Databricks

Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components. We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure. This allows us to introduce optimization rules that (i) reduce unnecessary computations by passing information between the data processing and ML operators (ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and (iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator. We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.

Raven: End-to-end Optimization of ML Prediction Queries

Databricks

Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis. Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them. Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy. This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.

Processing Large Datasets for ADAS Applications using Apache Spark

Databricks

At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences. What are we storing? Multi Source – Multi Channel Problem Data Representation and Nested Schema Evolution Performance Trade Offs with Various formats Go over anti-patterns used (String FTW) Data Manipulation using UDFs Writer Worries and How to Wipe them Away Staging Tables FTW Datalake Replication Lag Tracking Performance Time!

Massive Data Processing in Adobe Using Delta Lake

Databricks

Detecting advanced email attacks at scale is a challenging ML problem, particularly due to the rarity of attacks, adversarial nature of the problem, and scale of data. In order to move quickly and adapt to the newest threat we needed to build a Continuous Integration / Continuous Delivery pipeline for the entire ML detection stack. Our goal is to enable detection engineers and data scientists to make changes to any part of the stack including joined datasets for hydration, feature extraction code, detection logic, and develop/train ML models. In this talk, we discuss why we decided to build this pipeline, how it is used to accelerate development and ensure quality, and dive into the nitty-gritty details of building such a system on top of an Apache Spark + Databricks stack.

Machine Learning CI/CD for Email Attack Detection

Databricks

Sarah: CEO-Finance-Report pipeline seems to be slow today. Why Jeeves: SparkSQL query dbt_fin_model in CEO-Finance-Report is running 53% slower on 2/28/2021. Data skew issue detected. Issue has not been seen in last 90 days. Jeeves: Adding 5 more nodes to cluster recommended for CEO-Finance-Report to finish in its 99th percentile time of 5.2 hours. Who is Jeeves? An experienced Spark developer? A seasoned administrator? No, Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. This chatbot is powered by advanced AI algorithms and an intuitive conversational interface that together provide answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way conversation with their own personal Spark expert. We presented Jeeves at Spark Summit 2019. In the two years since, Jeeves has grown up a lot. Jeeves can now learn continuously as telemetry information streams in from more and more applications, especially SQL queries. Jeeves now “knows” about data pipelines that have many components. Jeeves can also answer questions about data quality in addition to performance, cost, failures, and SLAs. For example: Tom: I am not seeing any data for today in my Campaign Metrics Dashboard. Jeeves: 3/5 validations failed on the cmp_kpis table on 2/28/2021. Run of pipeline cmp_incremental_daily failed on 2/28/2021. This talk will give an overview of the newer capabilities of the chatbot, and how it now fits in a modern data stack with the emergence of new data roles like analytics engineers and machine learning engineers. You will learn how to build chatbots that tackle your complex data operations challenges.

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Databricks

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Democratizing Data Quality Through a Centralized Platform

Learn to Use Databricks for Data Science

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Machine Learning CI/CD for Email Attack Detection

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Recently uploaded

Capstone Project on IBM Data Analytics Program

MoniSankarHazra

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K to 25K High Profile Escorts In Pune Booking Now open +91- 8005736733 Why you Choose Us- +91- 8005736733 HOT⇄ 8005736733 Mr ashu ji Call Mr ashu Ji +91- 8005736733 (V030524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models • Foreigner Models • TV Actress and Celebrities • Receptionist • Air Hostess • Call Center Working Girls/Women • Hi-Tech Co. Girls/Women • Housewife

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

SUHANI PANDEY

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

amitlee9823

April 2024 - Crypto Market Report's Analysis

manisha194592

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Delhi Call girls

Discover Why Less is More in B2B Research

michael115558

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amritsar No💰Advance Cash On Delivery Service Escorts Service Available Whatsapp Chaya ☎️ : [+91-6367187148] Escorts Service Amritsar are always ready to make their clients happy. Their exotic looks and sexy personalities are sure to turn heads. You can enjoy with them, including massages and erotic encounters.#P12Our area Escorts are young and sexy, so you can expect to have an exotic time with them. They are trained to satiate your naughty nerves and they can handle anything that you want. They are also intelligent, so they know how to make you feel comfortable and relaxed SERVICE ✅ ❣️ ⭐➡️HOT & SEXY MODELS // COLLEGE GIRLS HOUSE WIFE RUSSIAN , AIR HOSTES ,VIP MODELS . AVAILABLE FOR COMPLETE ENJOYMENT WITH HIGH PROFILE INDIAN MODEL AVAILABLE HOTEL & HOME ★ SAFE AND SECURE HIGH CLASS SERVICE AFFORDABLE RATE ★ SATISFACTION,UNLIMITED ENJOYMENT. ★ All Meetings are confidential and no information is provided to any one at any cost. ★ EXCLUSIVE PROFILes Are Safe and Consensual with Most Limits Respected ★ Service Available In: - HOME & HOTEL Star Hotel Service .In Call & Out call SeRvIcEs : ★ A-Level (star escort) ★ Strip-tease ★ BBBJ (Bareback Blowjob)Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without condom ★ANAL SERVICES.

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...

karishmasinghjnh

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊 How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊 The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊 At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505) At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊 If you take an extra dose of misoprostol, it works about 99% of the time. At 10-11 weeks pregnant, it works about 87% of the time. +966572737505) If you take an extra dose of misoprostol, it works about 98% of the time. In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only. +966572737505 Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine. +966572737505 The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion. +966572737505 When can I take the abortion pill?+966572737505 In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505 Why do people choose the abortion pill? Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them. +966572737505 Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you. +966572737505 More questions from patients: Saudi Arabia+966572737505 CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505) Unwanted Kit is a combination of two medici

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Abortion pills in Riyadh +966572737505 get cytotec

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage with sex Booking Contact Details :- WhatsApp Chat :- +91-9920725232 4-May-2024(SMW) Call Girls In Model Towh +91-9920725232 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-9920725232 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9920725232 We are available 24*7 all days of the year. Call us — 9920725232 Thank you for Visiting.

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

amitlee9823

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand

amitlee9823

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -

Pooja Nehwal

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

amitlee9823

Probability Grade 10 Third Quarter Lessons

JoseMangaJr1

Klinik_ Apotek Onlin 085657271886 Solusi Menggugurkan Masalah Kehamilan Anda Jual Obat Aborsi Asli KLINIK ABORSI TERPEECAYA _ Jual Obat Aborsi Cytotec Misoprostol Asli 100% Ampuh Hanya 3 Jam Langsung Gugur || OBAT PENGGUGUR KANDUNGAN AMPUH MANJUR OBAT ABORSI OLINE" APOTIK Jual Obat Cytotec, Gastrul, Gynecoside Asli Ampuh. JUAL ” Obat Aborsi Tuntas | Obat Aborsi Manjur | Obat Aborsi Ampuh | Obat Penggugur Janin | Obat Pencegah Kehamilan | Obat Pelancar Haid | Obat terlambat Bulan | Ciri Obat Aborsi Asli | Obat Telat Bulan | Pil Aborsi Asli | Cara Menggugurkan Konten | Cara Aborsi Tuntas | Harga Obat Aborsi Asli | Pil Aborsi | Jual Obat Aborsi Cytotec | Cara Aborsi Sendiri | Cara Aborsi Usia 1 Bulan | Cara Aborsi Usia 2 Tahun | Cara Aborsi Usia 3 Bulan | Obat Aborsi Usia 4 Bulan | Cara Abrasi Usia 5 Bulan | Cara Menggugurkan Konten | Kandungan Obat Penggugur | Cara Menghitung Usia Konten | Cara Mengatasi Terlambat Bulan | Penjual Obat Aborsi Asli | Obat Aborsi Garansi | Kandungan Obat Peluntur | Obat Telat Datang Bulan | Obat Telat Haid | Obat Aborsi Paling Murah | Klinik Jual Obat Aborsi | Jual Pil Cytotec | Apotik Jual Obat Aborsi | Kandungan Dokter Abrasi | Cara Aborsi Cepat | Jual Obat Aborsi Bergaransi | Jual Obat Cytotec Asli | Obat Aborsi Aman Manjur | Obat Misoprostol Cytotec Asli. "APA ITU ABORSI" “Aborsi Adalah dengan membendung hormon yang di perlukan untuk mempertahankan kehamilan yaitu hormon progesteron, karena hormon ini dibendung, maka jalur kehamilan mulai membuka dan leher rahim menjadi melunak,sehingga mengeluarkan darah yang merupakan tanda bahwa obat telah bekerja || maksimal 1 jam obat diminum || PENJELASAN OBAT ABORSI USIA 1 _7 BULAN Pada usia kandungan ini, pasien akan merasakan sakit yang sedikit tidak berlebihan || sekitar 1 jam ||. namun hanya akan terjadi pada saatdarah keluar merupakan pertanda menstruasi. Hal ini dikarenakan pada usiakandungan 3 bulan,janin sudah terbentuk sebesar kepalan tangan orang dewasa. Cara kerja obat aborsi : JUAL OBAT ABORSI AMPUH dosis 3 bulan secara umum sama dengan cara kerja || DOSIS OBAT ABORSI 2 bulan”, hanya berbedanya selain mengisolasijanin juga menghancurkan janin dengan formula methotrexate dikandungdidalamnya. Formula methotrexate ini sangat ampuh untuk menghancurkan janinmenjadi serpihan-serpihan kecil akan sangat berguna pada saat dikeluarkan nanti. APA ALASAN WANITA MELAKUKAN ABORSI? Aborsi di lakukan wanita hamil baik yang sudah menikah maupun belum menikah dengan berbagai alasan , akan tetapi alasan yang utama adalah alasan-alasan non medis (termasuk aborsi sendiri / di sengaja/ buatan] MELAYANI PEMESANAN OBAT ABORSI SETIAP HARI, SIAP KIRIM KESELURUH KOTA BESAR DI INDONESIA DAN LUAR NEGERI. HUBUNGI PEMESANAN LEBIH NYAMAN VIA WA/: 085657271886

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

ZurliaSoop

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 4-May-2024(SMW) Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

amitlee9823

(NEHA) Call Girls Katra Call Now: 8617697112 Katra Escorts Booking Contact Details WhatsApp Chat: +91-8617697112 Katra Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus, they look fabulously elegant, making an impression. Independent Escorts Katra understands the value of confidentiality and discretion; they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide:

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Call Girls in Nagpur High Profile Call Girls

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , India,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At #K09 Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

amitlee9823

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

amitlee9823

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage with sex Booking Contact Details :- WhatsApp Chat :- +91-9920725232 4-May-2024(SMW) Call Girls In Model Towh +91-9920725232 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-9920725232 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9920725232 We are available 24*7 all days of the year. Call us — 9920725232 Thank you for Visiting.

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...

amitlee9823

Recently uploaded (20)

Capstone Project on IBM Data Analytics Program

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

April 2024 - Crypto Market Report's Analysis

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Discover Why Less is More in B2B Research

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Probability Grade 10 Third Quarter Lessons

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...

Achieving Lakehouse Models with Spark 3.0

1. Achieving Lakehouse Models with Spark 3.0 Simon Whiteley Director of Engineering, Advancing Analytics

2. Agenda Why Lakehouse? Kimball Problems Delta & Spark 3.0 ▪ SCD & SQL Merge ▪ Dynamic Partition Pruning ▪ Adaptive Query Execution Enabling the Lakehouse

3. The Data Lakehouse

4. Analytics Evolution PARQUET Delta Lake

6. The Modern Warehouse RAW BASE PARQUET

7. The Lakehouse RAW BASE ENRICHED DELTA DELTA

8. Lakehouse Barriers

9. When you think Warehouse… We automatically think of Star Schemas and Kimball warehousing approaches. A large central fact table with smaller reference dimensions… some of which aren’t so small

10. Literally Everyone (All The time) “You can’t use Kimball in a Data Lake”

11. Three Historical Challenges ▪ Slowly Changing Dimensions ▪ Filtering Dimensions ▪ General SQL Performance

12. Slowly Changing Dimensions

13. SCD - Enabling the Familiar PrimaryKey Address Current EffectiveDate EndDate 11 A new customer address TRUE 03/08/2020 null 58 Yet another address TRUE 03/08/2020 null 41 A different address TRUE 03/08/2020 null PrimaryKey Address Current EffectiveDate EndDate 11 A new customer address FALSE 03/08/2020 22/10/2020 11 An updated address TRUE 22/10/2020 null 58 Yet another address TRUE 03/08/2020 null 41 A different address TRUE 03/08/2020 null

14. SCD - Merge Commands MERGE INTO dataai.addresses as original USING updates ON original.primaryKey = updates.primaryKey WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * Available in SQL, Scala and Python APIs - the merge command has made many complex warehousing jobs accessible to the wider Analytics community This is enabled by the Delta file format

15. Dynamic Partition Pruning

16. Spark Partitioning SELECT * FROM Sales WHERE Month = 3 SQL Query Action Filtering performed by selectively reading files SALES Month=1 Month=2 Month=3 Month=4

17. Cross-Filter Spark 2.4 SELECT * FROM Sales JOIN Date WHERE DateMonth = 3 SQL Query Action SALES Month=1 Month=2 Month=3 Month=4 DimDATE Partition Keys not hit when filtering on joined tables

18. Cross-Filter Spark 3.0 SELECT * FROM Sales JOIN Date WHERE DateMonth = 3 SQL Query Action SALES Month=1 Month=2 Month=3 Month=4 DimDATE Dynamic Partition Pruning determines partition filters during runtime

19. Adaptive Query Execution

20. AQE in Spark 3.0 AQE will speed up common queries in a number of ways: ▪ Coalescing Shuffle Partitions ▪ Switching Join Strategies ▪ Optimizing Skew Joins

21. Before AQE - Shuffle Coalescing Read RDDs Read RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Shuffle RDDs Write RDDs Write RDDs 2 Tasks 200 Tasks 2 Tasks

22. Using Spark 3.0 AQE - Shuffle Coalescing Read RDDs Read RDDs Write RDDs Write RDDs 2 Tasks 2 Tasks 2 Tasks Shuffle RDDs Shuffle RDDs

23. DEMO: Let’s see it in action

24. The Data Lakehouse Delta & Spark 3.0 enable the Lakehouse through: ▪ Enabling familiar (SQL) patterns ▪ Removing technical barriers ▪ Targeting performance of common warehousing activities

25. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Achieving Lakehouse Models with Spark 3.0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Achieving Lakehouse Models with Spark 3.0

Similar to Achieving Lakehouse Models with Spark 3.0 (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Achieving Lakehouse Models with Spark 3.0