Data Lakes on Public Cloud: Breaking Data Management Monoliths

•

0 likes•289 views

Sharon Dashet (Sr. Data Analytics Solution Lead) @ Google Cloud: The worlds of traditional RDBMS and Data Lake Hadoop systems are converging and moving to public cloud and SaaS offerings. In this session, Sharon will share her personal journey as a data professional since the 90s weaved into the history of data management systems. The session will also cover the differences between on-premise and cloud Data Lakes.

Data & Analytics

Traditional EDW
players
~1995
Big data
vendors
~2005
Cloud
platform vendors
~2010
Specialized
Cloud vendors
~2012
Data Management timeline
Relational
OLTP
80s
Database Developer
Backend Developer
Application DBA
Production DBA
Data Scientist
Data Analysts
BI/OLAP Expert
SQL Expert
Governance
MDM
Big Data Developer
Big Data Architect
ML Engineer
Hadoop Admin
Hadoop Expert
AI Scientist
CDO
Cloud Data Engineer
Cloud Data Architect

HBase
( NoSQL
datastore)
Flume
(Log aggregation and
transport)
Sqoop
(Import and export of
relational data)
Ambari
(Management and
monitoring)
MapReduce (Cluster data processing)
YARN (Cluster resource management)
HDFS (Hadoop Distributed File System)
HCatalog (Metadata)
Oozie
(Workflow
automation)
Zookeeper
(Coordination )
Pig (Scripting) Flink (Streams)
Mahout & Spark ML
(Machine learning)
Presto
(Distributed SQL query)
(Cluster data processing)
Hive (SQL DW)
The Hadoop ecosystem is very popular for Big
Data workloads

Multi-User, Shared Hadoop Cluster
Data
(HDFS)
Temp
Data
(HDFS)
Metadata
(Hive metastore,
RDBMS)
AuthZ Policies,
Audit,
Governance
(Ranger, Atlas)
Compute: YARN
Hive Spark MR R
AuthN
Kerberos,
LDAP
Kafka, Storm,
Flume,
Cassandra,
Hbase, ELK etc.
Typical on-premises deployment

Resource utilization and overall
TCO of on-prem data lakes
becomes unmanageable
Data governance and security issues open up
compliance concerns
Resource intensive data and
analytics processing can lead to
missed SLAs
Analytics experimentation is slow
due to resource provisioning time
TCO Challenges Governance Challenges
Agility ChallengesScaling Challenges
On-prem Data Lakes are struggling to deliver value

Key market players are
struggling to convert
customers.

The need is still there
AI is now capable of extracting
value from unstructured data
Cloud is faster, simpler to
operate, and less expensive
“80 percent of
worldwide data will be
unstructured by 2025”
Data Lake are shifted to the cloud
“By connecting data points, we can
offer advice like hygiene laws for
certain foods, or information on
provenance. We can even integrate
their local weather forecast so a store
doesn't run out of ice cream on a
sunny day."
Sven Lipowski, Unit Owner Customer
Solutions adMETERONOMIDC (source)
“The ability to spin up purpose
driven Hadoop clusters against our
shared datasets and scale them
up/down with demand is a game
changer for us…”
Brett Uyeshiro VP Platform Services,
Pandora

02
Patterns for Data Lakes in
Public Cloud

Beyond HDFS- Storage and Compute separation
Keep your storage on GCS instead of HDFS Benefits:
● Separation of Compute/Storage
● Full HDFS-compliant GCS connector
● Facilitates Job-scoped cost effective workloads
(+ephemeral clusters)
● No need to provision x3 storage for replication
● No unused bytes on disks

Hive Analytics Business ReportingMapReduce ETL Machine Learning
Storage
Cloud Storage
Hive Metastore
Cloud Dataproc
Clusters
Job-Scoped Clusters - Beyond complicated Yarn queues
● Step away from
complicated Yarn
queues and multi
tenancy
● Control cost and
performance per
workload:
○ Ephemeral
Clusters
○ Mix regular and
preemptible VMs
in the worker
pool
○ Different VM
types

Beyond Yarn and into Modern Service Mesh

AI
Platform
Notebook
s
AI
Platform
AI
Platform
Notebook
s
1. Data sources
2. Data Lake storage
3. Data Pipelines
4. Data
Warehouse/Lake
5. ML and analytics
workloads
Converged Smart Analytics

What's hot

platform for Machine Learning

SivapriyaS12

ML-Ops: From Proof-of-Concept to Production Application

Hunter Carlisle

What’s New with Databricks Machine Learning

Databricks

Clinical genomic analytics pipelines using Databricks and the Delta Lake for the benefit of loading individual reads from raw sequencing or base-call files have significant advantages over more traditional methods. Analysis pipelines that perform genomic mapping to purpose-built reference data artifacts persisted to tables allows for enhanced performance that is magnitudes greater than previous mapping methods. These scalable, reproducible, and potentially open sourced methods have the ability to transform bioinformatics and R&D data management / governance.

Managing R&D Data on Parallel Compute Infrastructure

Databricks

In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. This talks explores recent advances in this area in both research and practice. I will explain how deep learning can be applied to recommendation settings, architectures for handling contextual data, side information, and time-based models, and compare deep learning approaches to other cutting-edge contextual recommendation models, and finally explore scalability issues and model serving challenges.

Deep Learning for Recommender Systems with Nick pentreath

Databricks

Purpose of this presentation is to highlight how end to end machine learning looks like in real world enterprise. This is to provide insight to aspiring data scientist who have been through courses or education in ML that mostly focus on ML algorithms and not end to end pipeline. Architecture and components mentioned in Slide 11 will be discussed in detailed in series of post on LinkedIn over the course of next few month To get updates on this follow me on LinkedIn or search/follow hashtag #end2endDS. Post will be active in August 2019 and will be posted till September 2019

Real World End to End machine Learning Pipeline

Srivatsan Srinivasan

Accelerating Innovation with Unified Analytics with Ali Ghodsi

Databricks

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...

Dataconomy Media

AI meets Big Data

Jan Wiegelmann

This presentation covers how to build and drive insights from data by building machine learning models. The session covers how to develop and train models in Python/R using Azure Machine Learning. The session covers how to explore key concepts in data acquisition, preparation, exploration, and visualization, and take a look at how to build a predictive solution using Azure Machine Learning, R, and Python. The session covers tips and tricks on selecting the right algorithm for your data science problem and how to utilize Machine Learning to solve it.

Building predictive models in Azure Machine Learning

Mostafa

Summary introduction to data engineering

Novita Sari

Big Data has emerged as a powerful new technology paradigm. To manage the massive data generated by social media, online transactions, Weblogs, or sensors, Big Data incorporates innovative technologies in data management (unstructured, semi-structured and structured), processing, real-time analytics, and visualization. It is also useful for reporting in circumstances where a relational database approach is not effective or too costly. This Big Data project is to be primarily exposed to utilize the tools. Tool usage, programming, algorithms and application development are covered in relevant courses.

Big Data- Automotive Industry Use Case

Sophie (C.F.) Tsai

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...

Databricks

In this webinar, we talk about Hadoop, big data and SnapReduce 2.0 with SnapLogic Chief Scientist Greg Benson, Professor of Computer Science at the University of San Francisco. This webinar features a dive into SnapReduce, and a discussion about how SnapLogic delivers big data acquisition, better big data preparation and universal big data delivery. To learn more, visit: http://www.snaplogic.com/snapreduce

Hadoop for Humans: Introducing SnapReduce 2.0

SnapLogic

Data & AI Platform Concepts

Ankit Rathi

Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...

Amazon Web Services Korea

Snaplogic Live: Big Data in Motion

SnapLogic

How Cloud is Affecting Data Scientists

CCG

Platform for Data Scientists

datamantra

From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...

Spark Summit

What's hot (20)

platform for Machine Learning

ML-Ops: From Proof-of-Concept to Production Application

What’s New with Databricks Machine Learning

Managing R&D Data on Parallel Compute Infrastructure

Deep Learning for Recommender Systems with Nick pentreath

Real World End to End machine Learning Pipeline

Accelerating Innovation with Unified Analytics with Ali Ghodsi

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...

AI meets Big Data

Building predictive models in Azure Machine Learning

Summary introduction to data engineering

Big Data- Automotive Industry Use Case

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...

Hadoop for Humans: Introducing SnapReduce 2.0

Data & AI Platform Concepts

Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...

Snaplogic Live: Big Data in Motion

How Cloud is Affecting Data Scientists

Platform for Data Scientists

From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...

Similar to Data Lakes on Public Cloud: Breaking Data Management Monoliths

Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.

Building a modern data warehouse

James Serra

What is hadoop

Asis Mohanty

Big Data Analytics with Hadoop, MongoDB and SQL Server

Mark Kromer

In today’s world of exponentially growing big data, enterprises are becoming increasingly more aware of the business utility and necessity of harnessing, storing and analyzing this information. Apache Hadoop has rapidly evolved to become a leading platform for managing and processing big data, with the vital management, monitoring, metadata and integration services required by organizations to glean maximum business value and intelligence from their burgeoning amounts of information on customers, web trends, products and competitive markets. In this session, Hortonworks' Himanshu Bari will discuss the opportunities for deriving business value from big data by looking at how organizations utilize Hadoop to store, transform and refine large volumes of this multi-structured information. Connolly will also discuss the evolution of Apache Hadoop and where it is headed, the component requirements of a Hadoop-powered platform, as well as solution architectures that allow for Hadoop integration with existing data discovery and data warehouse platforms. In addition, he will look at real-world use cases where Hadoop has helped to produce more business value, augment productivity or identify new and potentially lucrative opportunities.

Apache Hadoop and its role in Big Data architecture - Himanshu Bari

jaxconf

Logical Data Warehouse: How to Build a Virtualized Data Services Layer

DataWorks Summit

Big Data tools are becoming a critical part of enterprise architectures and as such securing the data, at rest, and in motion is a necessity. More so, when you’re implementing these solutions in the cloud and the data doesn't reside within the confines of your trusted data center. Also, there is a fine balance between implementing enterprise-grade security and negotiating utmost performance given the overheads of encryption and/or identity management. This session is designed to tackle these challenges head on and explain the various options available in the cloud. The focal points are the implementation of tools like Ranger and Knox for cloud deployments, but we also pay attention to the security features offered in the cloud that complement this process and secure the data in unprecedented ways. Cloud Security + OSS Security tools are a deadly combination, when it comes to securing your Data Lake.

Securing your Big Data Environments in the Cloud

DataWorks Summit

عصر کلان داده، چرا و چگونه؟

datastack

Hadoop workshop

Fang Mac

EMC Isilon Database Converged deck

KeithETD_CTO

More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.

A Reference Architecture for ETL 2.0

DataWorks Summit

Legacy ERP architecture offers an incredibly, efficient means of operational resource management, but a real challenge comes from extracting business insights from them. Over the past 30 years, ERP data system, such as SAP, can be hard to interact with especially at the source database level. Whether initial translation of business logic and hierarchies create significant customizations, as well as, merging those changes into analytical applications. Overall, the entire process of designing self-service reporting with business level context can be quite cumbersome, looking at an example platform like, SAP, which contain pre-packaged modules (MM, SD, PP, etc), integrating these systems into a series of pre-built analytics. The orchestration and integration over a wide range of open source technology solutions with some commercial CDC and reporting solutions into a reference solution that mimics several real customer scenarios today, living on relational platforms. Key considerations of extracting from the operational system of record, especially the merging of multiple systems in different time zones, will be addressed. Furthermore, the integration concerns of an analytics Hadoop platform, using HIVE Acid and Merge, as well as, flattening techniques for dimensional models. Many times a customer is temporarily limited in the range of data their ERP can contain, and older data is often offloaded to secondary systems or cold archiving entities. That goes away, but the opportunities expanded with real-time reporting across all of history, and the expanded use cases with advanced machine learnings methods. Speakers Jordan Martz, Director of Tech Solutions, Attunity David Freriks, Technology Evangelist, Qlik

Migrating legacy ERP data into Hadoop

DataWorks Summit

Thu-310pm-Impetus-SachinAndAjay

Ajay Shriwastava

Integrating Hadoop Into the Enterprise

DataWorks Summit

The power of Hadoop lies in its ability to help users cost effectively analyze all kinds of data. We are now seeing the emergence of a new class of analytic applications that can only be enabled by a comprehensive big data platform. Such a platform extends the Hadoop framework with built-in analytics, robust developer tools, and the integration, reliability, and security capabilities that enterprises demand for complex, large scale analytics. In this session, we will share innovative analytics use cases from actual customer implementations using an enterprise-class big data analytics platform.

Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise

Cloudera, Inc.

Microsoft's Hadoop Story

Michael Rys

Big Data in the Microsoft Platform

Jesus Rodriguez

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform

Hortonworks

Big_SQL_3.0_Whitepaper

Scott Gray

Watch full webinar here: https://bit.ly/3aePFcF Historically data lakes have been created as a centralized physical data storage platform for data scientists to analyze data. But lately the explosion of big data, data privacy rules, departmental restrictions among many other things have made the centralized data repository approach less feasible. In this webinar, we will discuss why decentralized multipurpose data lakes are the future of data analysis for a broad range of business users. Attend this session to learn: - The restrictions of physical single purpose data lakes - How to build a logical multi purpose data lake for business users - The newer use cases that makes multi purpose data lakes a necessity

Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)

Denodo

Modern data warehouse

Stephen Alex

Similar to Data Lakes on Public Cloud: Breaking Data Management Monoliths (20)

Building a modern data warehouse

What is hadoop

Big Data Analytics with Hadoop, MongoDB and SQL Server

Apache Hadoop and its role in Big Data architecture - Himanshu Bari

Logical Data Warehouse: How to Build a Virtualized Data Services Layer

Securing your Big Data Environments in the Cloud

عصر کلان داده، چرا و چگونه؟

Hadoop workshop

EMC Isilon Database Converged deck

A Reference Architecture for ETL 2.0

Migrating legacy ERP data into Hadoop

Thu-310pm-Impetus-SachinAndAjay

Integrating Hadoop Into the Enterprise

Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise

Microsoft's Hadoop Story

Big Data in the Microsoft Platform

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform

Big_SQL_3.0_Whitepaper

Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)

Modern data warehouse

More from Itai Yaffe

Yulia Antonovsky (Senior Software Engineer II) @ Akamai: Our cloud-based ingest pipeline processes over 10 Gb of security events data per second, which demands high-performance processing and analysis. To achieve this, we've implemented efficient partitioning using Java and Spark applications running on AKS and leveraging Kafka. This allows us to provide real-time analytics within two minutes and heavy batch processing for deeper analysis hourly. During this talk, we will cover how we use Kafka to scale our Spark application on K8s, partitioning strategies for high-volume data processing, and how partitioning helps avoid storage throttling issues.

Mastering Partitioning for High-Volume Data Processing

Itai Yaffe

Maayan Gad (Senior Big Data Engineer) @ Wix: Who likes to maintain long SQL files? No one. Column renames, additions, deletions, changes in the KPIs your tables are based on, or the date your table should start from - all are annoying, time consuming tasks your data engineers waste their time on. In this talk we’ll uncover how we solved this problem at Wix, allowing our data engineers to easily create Data Warehouse grade tables using configuration files, and even transfer some of the business-related-only parts to the data analysts.

Solving Data Engineers Velocity - Wix's Data Warehouse Automation

Itai Yaffe

Ada Sharoni (Software Engineering Architect) @ Hunters: Imagine you had to manage thousands of Spark applications that are automatically spinning up on-demand upon every customer interaction. Our unique constraints in Hunters have led us to adopt an architecture and concepts that we believe many other companies will find useful. In this lecture we will share our solutions and insights in running many lightweight, cheap Spark applications on Kubernetes, that can easily survive frequent restarts and smartly share resources on Spot EC2 instances.

Lessons Learnt from Running Thousands of On-demand Spark Applications

Itai Yaffe

Eynav Mass (VP R&D) @ Oribi: When it comes to data solutions, one-size doesn't fit all. Choosing the right best-matching database, or data tools, can be a game-changer for your system. How can you take such a decision effectively? The system, the company, the product, and probably your team - all are evolving, and the best solution for today may not fit tomorrow's needs. In order to pick a data solution for longer term, you should evaluate the optional data tools according to several factors. These factors will reflect the requirements looking forward. At the session, we will discuss these factors, along with sharing some real-life stories and lessons learned, to help you properly plan & prepare your data solutions.

Planning a data solution - "By Failing to prepare, you are preparing to fail"

Itai Yaffe

Evaluating Big Data & ML Solutions - Opening Notes

Itai Yaffe

Jon Bratseth (VP Architect) @ Verizon Media: The big data world has mature technologies for offline analysis and learning from data, but have lacked options for making data-driven decisions in real time. When it is sufficient to consider a single data point model servers such as TensorFlow serving can be used but in many cases you want to consider many data points to make decisions. This is a difficult engineering problem combining state, distributed algorithms and low latency, but solving it often makes it possible to create far superior solutions when applying machine learning. This talk will explain why this is a hard problem, show the advantages of solving it, and introduce the open source Vespa.ai platform which is used to implement such solutions in some of the largest scale problems in the world including the world's third largest ad serving system.

Big data serving: Processing and inference at scale in real time

Itai Yaffe

Orit Alul (Sr. Solutions Architect) @ AWS: As data is growing at an exponential rate, we are interested not only in being able to analyze the past or present but also in predicting the future! In this session, Orit will talk about the power of data combined with machine learning. Building a highly scalable and flexible data architecture in the cloud to collect, process, and analyze data, in order to get timely insights and react quickly to new information. In addition, Orit will present best practices, performance and optimization tips for building a Data Lake in the cloud.

Unleashing the Power of your Data

Itai Yaffe

Data Lake on Public Cloud - Opening Notes

Itai Yaffe

Roi Teveth (Data Engineer) and Itai Yaffe (Tech Lead, Big Data group) @ Nielsen: At Nielsen Identity Engine, we use Spark to process 10’s of TBs of data. Our ETLs, orchestrated by Airflow, spin-up AWS EMR clusters with thousands of nodes per day. In this talk, we’ll guide you through migrating Spark workloads to Kubernetes with minimal changes to Airflow DAGs, using the open-sourced GCP Spark-on-K8s operator and the native integration we recently contributed to the Airflow project.

Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...

Itai Yaffe

Itai Yaffe (Tech Lead, Big Data group) @ Nielsen: Every day, millions of advertising campaigns are happening around the world. As campaign owners, measuring the ongoing campaign effectiveness (e.g "how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?") is super important. However, this task (often referred to as "funnel analysis") is not an easy task, especially if the chronological order of events matters. So, while the combination of Druid and ThetaSketch aggregators can answer some of these questions, it still can’t answer the question "how many distinct users viewed the brand’s homepage FIRST and THEN viewed product X page?" In this talk, we will discuss how we combine Spark, Druid and ThetaSketch aggregators to answer such questions at scale.

DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid

Itai Yaffe

Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)

Itai Yaffe

Introducing Kafka Connect and Implementing Custom Connectors

Itai Yaffe

Benjamin Hopp (Solutions Architect) @ Imply: Druid is an emerging standard in the data infrastructure world, designed for high-performance slice-and-dice analytics (“OLAP”-style) on large data sets. This talk is for you if you’re interested in learning more about pushing Druid’s analytical performance to the limit. Perhaps you’re already running Druid and are looking to speed up your deployment, or perhaps you aren’t familiar with Druid and are interested in learning the basics. Some of the tips in this talk are Druid-specific, but many of them will apply to any operational analytics technology stack. The most important contributor to a fast analytical setup is getting the data model right. The talk will center around various choices you can make to prepare your data to get best possible query performance. We’ll look at some general best practices to model your data before ingestion such as OLAP dimensional modeling (called “roll-up” in Druid), data partitioning, and tips for choosing column types and indexes. We’ll also look at how more can be less: often, storing copies of your data partitioned, sorted, or aggregated in different ways can speed up queries by reducing the amount of computation needed. We’ll also look at Druid-specific optimizations that take advantage of approximations; where you can trade accuracy for performance and reduced storage. You’ll get introduced to Druid’s features for approximate counting, set operations, ranking, quantiles, and more. And we will finish with the latest and greatest Druid news, including details about the latest roadmap and releases.

A Day in the Life of a Druid Implementor and Druid's Roadmap

Itai Yaffe

Dr. Edward (Eddie) Bortnikov (Senior Director of Research) @ Verizon Media: Ingestion and queries of real-time data in Druid are performed by a core software component named Incremental Index (I^2). I^2’s scalability is paramount to the speed of the ingested data becoming queryable as well as to the operational efficiency of the Druid cluster. The current I^2 Implementation is based on the traditional ordered JDK key-value (KV-)map. We present an experimental I^2 implementation that is based on a novel data structure named OakMap - a scalable thread-safe off-heap KV-map for Big Data applications in Java. With OakMap, I^2 can ingest data at almost 2x speed while using 30% less RAM. The project is expected to become GA in 2020.

Scalable Incremental Index for Druid

Itai Yaffe

Itai Yaffe (Tech Lead, Big Data group) @ Nielsen: Every day, millions of advertising campaigns are happening around the world. As campaign owners, measuring the ongoing campaign effectiveness (e.g “how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?”) is super important. However, this task (often referred to as “funnel analysis”) is not an easy task, especially if the chronological order of events matters. So, while the combination of Druid and ThetaSketch aggregators can answer some of these questions, it still can’t answer the question "how many distinct users viewed the brand’s homepage FIRST and THEN viewed product X page?" In this talk, we will discuss how we combine Spark, Druid and ThetaSketch aggregators to answer such questions at scale.

Funnel Analysis with Spark and Druid

Itai Yaffe

Shir Bromberg (Big Data team leader) @ Yotpo: Nowadays, many of an organization’s main applications rely on Spark pipelines. As these applications become more significant to businesses, so does the need to quickly deploy, test and monitor them. The standard way of running spark jobs is to deploy it on a dedicated managed cluster. However, this solution is relatively expensive with potentially high setup time. Therefore, we developed a way to run Spark on any container orchestration platform. This allows us to run Spark in a simple, custom and testable way. In this talk, we will present our open-source dockers for running Spark on Nomad servers. We will cover: * The issues we had running spark on managed clusters and the solution we developed. * How to build a spark docker. * And finally, what you may achieve by using Spark on Nomad.

The benefits of running Spark on your own Docker

Itai Yaffe

Etti Gur (Senior Big Data developer) and Itai Yaffe (Tech Lead, Big Data group) @ Nielsen: At Nielsen Marketing Cloud, we provide our customers (marketers and publishers) real-time analytics tools to measure their ongoing campaigns' efficiency. To achieve that, we need to ingest billions of events per day into our big data stores and we need to do it in a scalable yet cost-efficient manner. In this talk, we will discuss how we significantly optimized our Spark-based in-flight analytics daily pipeline, reducing its total execution time from over 20 hours down to 2 hours, resulting in a huge cost reduction. Topics include: * Ways to identify optimization opportunities * Optimizing Spark resource allocation * Parallelizing Spark output phase with dynamic partition inserts * Running multiple Spark "jobs" in parallel within a single Spark application

Optimizing Spark-based data pipelines - are you up for it?

Itai Yaffe

Ilai Malka from Nielsen at AWS Community Day TLV, December 2019 (https://awscommunitydaytelaviv2019.splashthat.com/): Scheduling big data workloads is challenging. It's extra challenging when running on Serverless infrastructure. At Nielsen Marketing Cloud, we've built a system that uploads 250 billion events per day to partner ad platforms, running on Serverless infrastructure (AWS Lambda and OpenFaaS). Creating a 'scheduler' for this system required: 1. Rate-limiting to prevent flooding partner platforms. 2. High utilization to keep costs low 3. Careful bottleneck management to keep the system humming https://www.linkedin.com/in/ilai-malka-93b06172/ https://twitter.com/IlaiMalka #Nielsen #NielsenMarketingCloud #AWSCommunityDay #Serverless

Scheduling big data workloads on serverless infrastructure

Itai Yaffe

GraphQL API on a Serverless Environment

Itai Yaffe

Ilai Malka (Big Data Developer) & Opher Dubrovsky (Big Data Team lead) @ Nielsen: You too can build a serverless data pipeline processing 250 billion events/day. In this talk you’ll hear details from a real-life ad delivery system we’ve built running on AWS Lambda serverless infrastructure. You’ll hear about: - System design & pitfalls to avoid - Fault tolerance, self-healing and recoverability - CI/CD process & avoiding development velocity slowdown

Serverless data processing built for internet SCALE

Itai Yaffe

More from Itai Yaffe (20)

Mastering Partitioning for High-Volume Data Processing

Solving Data Engineers Velocity - Wix's Data Warehouse Automation

Lessons Learnt from Running Thousands of On-demand Spark Applications

Planning a data solution - "By Failing to prepare, you are preparing to fail"

Evaluating Big Data & ML Solutions - Opening Notes

Big data serving: Processing and inference at scale in real time

Unleashing the Power of your Data

Data Lake on Public Cloud - Opening Notes

Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...

DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid

Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)

Introducing Kafka Connect and Implementing Custom Connectors

A Day in the Life of a Druid Implementor and Druid's Roadmap

Scalable Incremental Index for Druid

Funnel Analysis with Spark and Druid

The benefits of running Spark on your own Docker

Optimizing Spark-based data pipelines - are you up for it?

Scheduling big data workloads on serverless infrastructure

GraphQL API on a Serverless Environment

Serverless data processing built for internet SCALE

Recently uploaded

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊 How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊 The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊 At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505) At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊 If you take an extra dose of misoprostol, it works about 99% of the time. At 10-11 weeks pregnant, it works about 87% of the time. +966572737505) If you take an extra dose of misoprostol, it works about 98% of the time. In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only. +966572737505 Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine. +966572737505 The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion. +966572737505 When can I take the abortion pill?+966572737505 In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505 Why do people choose the abortion pill? Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them. +966572737505 Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you. +966572737505 More questions from patients: Saudi Arabia+966572737505 CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505) Unwanted Kit is a combination of two medici

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Abortion pills in Riyadh +966572737505 get cytotec

In my capstone project, I investigated the impact of COVID-19 on education. Using data analysis and statistical methods, I explored various aspects such as enrollment trends, access to resources, and socioeconomic disparities. I found a significant association between children missing classes and a lack of internet connection at home, as well as between household financial situations and children's enrollment in school. These findings highlight the importance of addressing disparities in internet access, household finances, and geographical location to ensure equal educational opportunities for all students during and beyond the pandemic.

Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION

LakpaYanziSherpa

Context 1. Housing Agent collected resale prices on HDB apartments in Singapore. Objective 2. To predict resale prices in to advise his potential clients. Strategies 3. Explore & Clean data for analysis. 4. Perform K-Means Clustering, in Orange, to find possible segments in the customer data. 5. Tune the model to improve its performance. 6. Visualise the findings, share conclusions, and give insight-driven recommendations. Author: Anthony mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange

ThinkInnovation

Jual Obat Aborsi Cytotec & Gastrul Asli 2024 ⋆ 082223109953 ⋆ Cara Menggugurkan Kandungan Untuk Usia Janin 1-8 Bulan Secara Alami Dan Cepat Dalam 1 Hari Gugur Tuntas KLINIK _ APOTIK ONLINE SOLUSI MENGGUGURKAN MASALAH KEHAMILAN ANDA | JUAL OBAT ABORSI ASLI ( WA – 082223109953 ) KLINIK ABORSI TERPEECAYA _ JUAL OBAT ABORSI CYTOTEC MISOPROSTOL ASLI 100% AMPUH HANYA 3 JAM LANGSUNG GUGUR || OBAT PENGGUGUR JANIN KANDUNGAN AMPUH | JUAL OBAT ABORSI ASLI, AMPUH, MANJUR, TUNTAS | OBAT ABORSI OLINE “APOTIK JUAL OBAT CYTOTEC, GASTRUL, GYNACOSIDE ASLI AMPUH. JUAL ” OBAT ABORSI TUNTAS | OBAT ABORSI MANJUR | OBAT ABORSI AMPUH | OBAT PENGGUGUR JANIN | OBAT PENCEGAH KEHAMILAN | OBAT PELANCAR HAID | OBAT TERLAMBAT BULAN | CIRI OBAT ABORSI ASLI | OBAT TELAT BULAN | PIL ABORSI ASLI | CARA MENGGUGURKAN KANDUNGAN | CARA ABORSI TUNTAS | HARGA OBAT ABORSI ASLI | PIL ABORSI | JUAL OBAT ABORSI CYTOTEC | CARA ABORSI SENDIRI | CARA ABORSI USIA 1 BULAN | CARA ABORSI USIA 2 BULA | CARA ABORSI USIA 3 BULAN | OBAT ABORSI USIA 4 BULAN | CARA ABORSI USIA 5 BULAN | CARA MENGGUGURKAN KANDUNGAN | OBAT PENGGUGUR KANDUNGAN | CARA MENGHITUNG USIA KANDUNGAN | CARA MENGATASI TERLAMBAT BULAN | PENJUAL OBAT ABORSI ASLI | OBAT ABORSI GARANSI | OBAT PELUNTUR KANDUNGAN | OBAT TELAT DATANG BULAN | OBAT TELAT HAID | OBAT ABORSI PALING MURAH | KLINIK JUAL OBAT ABORSI | JUAL PIL CYTOTEC | APOTIK JUAL OBAT ABORSI | DOKTER ABORSI KANDUNGAN | CARA ABORSI CEPAT | JUAL OBAT ABORSI BERGARANSI | JUAL OBAT CYTOTEC ASLI | OBAT ABORSI AMAN MANJUR | OBAT MISOPROSTOL CYTOTEC ASLI

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur

ptikerjasaptiker

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...

nirzagarg

Discover Why Less is More in B2B Research

michael115558

Lecture_2_Deep_Learning_Overview-newone1

ranjankumarbehera14

原版定制【微信:176555708】【圣地亚哥州立大学毕业证（SDSU毕业证书）】【微信:176555708】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！

怎样办理圣地亚哥州立大学毕业证（SDSU毕业证书）成绩单学校原版复制

vexqp

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...

nirzagarg

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh

Abortion pills in Riyadh +966572737505 get cytotec

Gartner's Data Analytics Maturity Model.pptx

chadhar227

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

Elaine Werffeli

Digital advertising, or paid media, encompasses the strategic deployment of online advertisements to reach target audiences efficiently and effectively. This includes any digital platform that supports advertising to deliver unique messages for any objective. Understanding the mechanics of digital advertising platforms, along with insights into audience behaviors and preferences, allows marketers to optimize their ad spend and achieve significant engagement and conversion rates. This lecture is for Advanced Digital & Social Media Strategy (MGMTX 466.05) at UCLA Extension.

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Valters Lauzums

7. Epi of Chronic respiratory diseases.ppt

ibrahimabdi22

Yilin Xia (yilinx2@illinois.edu), Shawn Bowers (bowers@gonzaga.edu), Lan Li (lanl2@illinois.edu), and Bertram Ludäscher (ludaesch@illinois.edu) Presented at IDCC-2024 in Edinburg. ABSTRACT. We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal argumentation framework (AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program PAF whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...

Bertram Ludäscher

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation

EfruzAsilolu

Klinik_ Apotek Onlin 085657271886 Solusi Menggugurkan Masalah Kehamilan Anda Jual Obat Aborsi Asli KLINIK ABORSI TERPEECAYA _ Jual Obat Aborsi Cytotec Misoprostol Asli 100% Ampuh Hanya 3 Jam Langsung Gugur || OBAT PENGGUGUR KANDUNGAN AMPUH MANJUR OBAT ABORSI OLINE" APOTIK Jual Obat Cytotec, Gastrul, Gynecoside Asli Ampuh. JUAL ” Obat Aborsi Tuntas | Obat Aborsi Manjur | Obat Aborsi Ampuh | Obat Penggugur Janin | Obat Pencegah Kehamilan | Obat Pelancar Haid | Obat terlambat Bulan | Ciri Obat Aborsi Asli | Obat Telat Bulan | Pil Aborsi Asli | Cara Menggugurkan Konten | Cara Aborsi Tuntas | Harga Obat Aborsi Asli | Pil Aborsi | Jual Obat Aborsi Cytotec | Cara Aborsi Sendiri | Cara Aborsi Usia 1 Bulan | Cara Aborsi Usia 2 Tahun | Cara Aborsi Usia 3 Bulan | Obat Aborsi Usia 4 Bulan | Cara Abrasi Usia 5 Bulan | Cara Menggugurkan Konten | Kandungan Obat Penggugur | Cara Menghitung Usia Konten | Cara Mengatasi Terlambat Bulan | Penjual Obat Aborsi Asli | Obat Aborsi Garansi | Kandungan Obat Peluntur | Obat Telat Datang Bulan | Obat Telat Haid | Obat Aborsi Paling Murah | Klinik Jual Obat Aborsi | Jual Pil Cytotec | Apotik Jual Obat Aborsi | Kandungan Dokter Abrasi | Cara Aborsi Cepat | Jual Obat Aborsi Bergaransi | Jual Obat Cytotec Asli | Obat Aborsi Aman Manjur | Obat Misoprostol Cytotec Asli. "APA ITU ABORSI" “Aborsi Adalah dengan membendung hormon yang di perlukan untuk mempertahankan kehamilan yaitu hormon progesteron, karena hormon ini dibendung, maka jalur kehamilan mulai membuka dan leher rahim menjadi melunak,sehingga mengeluarkan darah yang merupakan tanda bahwa obat telah bekerja || maksimal 1 jam obat diminum || PENJELASAN OBAT ABORSI USIA 1 _7 BULAN Pada usia kandungan ini, pasien akan merasakan sakit yang sedikit tidak berlebihan || sekitar 1 jam ||. namun hanya akan terjadi pada saatdarah keluar merupakan pertanda menstruasi. Hal ini dikarenakan pada usiakandungan 3 bulan,janin sudah terbentuk sebesar kepalan tangan orang dewasa. Cara kerja obat aborsi : JUAL OBAT ABORSI AMPUH dosis 3 bulan secara umum sama dengan cara kerja || DOSIS OBAT ABORSI 2 bulan”, hanya berbedanya selain mengisolasijanin juga menghancurkan janin dengan formula methotrexate dikandungdidalamnya. Formula methotrexate ini sangat ampuh untuk menghancurkan janinmenjadi serpihan-serpihan kecil akan sangat berguna pada saat dikeluarkan nanti. APA ALASAN WANITA MELAKUKAN ABORSI? Aborsi di lakukan wanita hamil baik yang sudah menikah maupun belum menikah dengan berbagai alasan , akan tetapi alasan yang utama adalah alasan-alasan non medis (termasuk aborsi sendiri / di sengaja/ buatan] MELAYANI PEMESANAN OBAT ABORSI SETIAP HARI, SIAP KIRIM KESELURUH KOTA BESAR DI INDONESIA DAN LUAR NEGERI. HUBUNGI PEMESANAN LEBIH NYAMAN VIA WA/: 085657271886

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

ZurliaSoop

原版定制【微信:153539019】《(曼大毕业证书）曼尼托巴大学毕业证》【微信:153539019】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信153539019】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信153539019】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格

q6pzkpark

+97470301568 Qatar THC Oil and weed in Qatar? Where can I get THC vape in Doha Qatar?WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in France,Germany, Poland Serbia,Romania, Ukraine WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia Qatar Where can I get weed in Qatar? Where can I get THC vape in Doha Qatar?WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in France,Germany, Poland Serbia,Romania, UkraineWhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia Qatar WhatsApp+97470301568 WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Qatar Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia #Singapore #Jordan #Ireland, #Belgium, #United Kingdom, #Iceland, #*Portugal, Spain, China, Japan, Turkey, Canada United States, Morocco, France,Germany, Poland Serbia,Romania, Ukraine, and all countries United Arab Emirates . Our team has succesfully delivered in 26 different countries . All marijuana and Cocaine is double vacuum packed before shipping, making it completely odorless to ensure that it arrives safely to your door. Our distribution crew is expert at making packages that blend in with the rest of the mail. We have also put into place many other security measures to ensure the security of our customers. buy weed Dubai +97470301568buy Weed Qatar #Buy Weed Kuwait #Buy Weed Bahrain #Buy #Weed #Oman #Buy Weed UAE #Buy Weed Abu Dhabi @Buy Weed Doha Qatar #@Buy Weed Ajman #@Buy Weed Online #@Buy Weed UK #@Buy Weed Iceland #*@Buy Weed All Countries Below are the various strains of kush available ; buy hash and Weed in dubai,abu dhabi,sharjah where to buy weed in doha,where can i find weed in jeddah,Can I get weed delivered to Riyadh?,Buy weed Online Jeddah Saudi Arabia,Buy Weed and THC Cannabis Oil online ,QATAR , DOHA buy kush in DOHA , buy kush in DOHAWeed in QATAR # DOHA Buy Weed and THC Cannabis Oil online who delivers at your own location in Qatar Doha ,Kuwait ,Dubai including cannabis / weed,Where can I find weed in Dubai as a tourist?,Is marijuana allowed in Dubai ? How much is medical marijuana in Dubai ?Is weed legal in Dubai ?How to get marijuana in Saudi Arabia Do people in Saudi Arabia smoke weed ? Is Hash legal in Saudi Arabia ? Where is marijuana the most illegal?Can you get weed in Baku? Dubai, United Arab Emirates Canabis smokers - Dubai, Buy Marijuana Products Online in UAE Desertcart ships the Marijuana products in Dubai ,Abu Dhabi, Sharjah, Al Ain, Ajman and more cities in UAE. Get unlimited free shipping in 164+ countries Buy Weed Products Online in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Buy Marijuana Saudi Arabia weed in jeddahis cbd legal in saudi arabia cali tins smoke in saudi arabia drug use saudi arabia Buy weed marijuana. White Widow OG #Kush Sensi Star x ak 47 Afghan Ku

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...

Health

原版定制【微信:153539019】《英国诺森比亚大学毕业证（NU毕业证书）》【微信:153539019】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信153539019】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信153539019】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样

wsppdmt

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...

Discover Why Less is More in B2B Research

Lecture_2_Deep_Learning_Overview-newone1

怎样办理圣地亚哥州立大学毕业证（SDSU毕业证书）成绩单学校原版复制

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh

Gartner's Data Analytics Maturity Model.pptx

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

7. Epi of Chronic respiratory diseases.ppt

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样

Data Lakes on Public Cloud: Breaking Data Management Monoliths

1. Data Lakes in the Public Cloud: Breaking Data Management Monoliths Sharon Dashet, Sr. Data Analytics Solution Lead, GCP https://il.linkedin.com/in/sharon-dashet

2. 01Intro to Data Lakes

3. It all started with RDBMS….

4. Traditional EDW players ~1995 Big data vendors ~2005 Cloud platform vendors ~2010 Specialized Cloud vendors ~2012 Data Management timeline Relational OLTP 80s Database Developer Backend Developer Application DBA Production DBA Data Scientist Data Analysts BI/OLAP Expert SQL Expert Governance MDM Big Data Developer Big Data Architect ML Engineer Hadoop Admin Hadoop Expert AI Scientist CDO Cloud Data Engineer Cloud Data Architect

5. HBase ( NoSQL datastore) Flume (Log aggregation and transport) Sqoop (Import and export of relational data) Ambari (Management and monitoring) MapReduce (Cluster data processing) YARN (Cluster resource management) HDFS (Hadoop Distributed File System) HCatalog (Metadata) Oozie (Workflow automation) Zookeeper (Coordination ) Pig (Scripting) Flink (Streams) Mahout & Spark ML (Machine learning) Presto (Distributed SQL query) (Cluster data processing) Hive (SQL DW) The Hadoop ecosystem is very popular for Big Data workloads

6. Multi-User, Shared Hadoop Cluster Data (HDFS) Temp Data (HDFS) Metadata (Hive metastore, RDBMS) AuthZ Policies, Audit, Governance (Ranger, Atlas) Compute: YARN Hive Spark MR R AuthN Kerberos, LDAP Kafka, Storm, Flume, Cassandra, Hbase, ELK etc. Typical on-premises deployment

7. The apache Data-Processing ecosystem

8. Resource utilization and overall TCO of on-prem data lakes becomes unmanageable Data governance and security issues open up compliance concerns Resource intensive data and analytics processing can lead to missed SLAs Analytics experimentation is slow due to resource provisioning time TCO Challenges Governance Challenges Agility ChallengesScaling Challenges On-prem Data Lakes are struggling to deliver value

9. Key market players are struggling to convert customers.

10. The need is still there AI is now capable of extracting value from unstructured data Cloud is faster, simpler to operate, and less expensive “80 percent of worldwide data will be unstructured by 2025” Data Lake are shifted to the cloud “By connecting data points, we can offer advice like hygiene laws for certain foods, or information on provenance. We can even integrate their local weather forecast so a store doesn't run out of ice cream on a sunny day." Sven Lipowski, Unit Owner Customer Solutions adMETERONOMIDC (source) “The ability to spin up purpose driven Hadoop clusters against our shared datasets and scale them up/down with demand is a game changer for us…” Brett Uyeshiro VP Platform Services, Pandora

11. 02 Patterns for Data Lakes in Public Cloud

12. Beyond HDFS- Storage and Compute separation Keep your storage on GCS instead of HDFS Benefits: ● Separation of Compute/Storage ● Full HDFS-compliant GCS connector ● Facilitates Job-scoped cost effective workloads (+ephemeral clusters) ● No need to provision x3 storage for replication ● No unused bytes on disks

13. Hive Analytics Business ReportingMapReduce ETL Machine Learning Storage Cloud Storage Hive Metastore Cloud Dataproc Clusters Job-Scoped Clusters - Beyond complicated Yarn queues ● Step away from complicated Yarn queues and multi tenancy ● Control cost and performance per workload: ○ Ephemeral Clusters ○ Mix regular and preemptible VMs in the worker pool ○ Different VM types

14. Beyond Yarn and into Modern Service Mesh

15. AI Platform Notebook s AI Platform AI Platform Notebook s 1. Data sources 2. Data Lake storage 3. Data Pipelines 4. Data Warehouse/Lake 5. ML and analytics workloads Converged Smart Analytics

16. Thank you

Data Lakes on Public Cloud: Breaking Data Management Monoliths

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Lakes on Public Cloud: Breaking Data Management Monoliths

Similar to Data Lakes on Public Cloud: Breaking Data Management Monoliths (20)

More from Itai Yaffe

More from Itai Yaffe (20)

Recently uploaded

Recently uploaded (20)

Data Lakes on Public Cloud: Breaking Data Management Monoliths