Submit Search
Upload
Hadoop in adtech
•
0 likes
•
2,033 views
Yuta Imai
Follow
アドテク業界でのHadoopの利用についてまとめてました。前半はHadoopとは?という話で、後半がよくある使われ方です。
Read less
Read more
Technology
Report
Share
Report
Share
1 of 43
Download now
Download to read offline
Recommended
HDP2.5 Updates
HDP2.5 Updates
Yuta Imai
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Recommended
HDP2.5 Updates
HDP2.5 Updates
Yuta Imai
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
HDP-1 introduction for HUG France
HDP-1 introduction for HUG France
Steve Loughran
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Spark Security
Spark Security
Yifeng Jiang
Next Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
DataWorks Summit
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
Running Zeppelin in Enterprise
Running Zeppelin in Enterprise
DataWorks Summit
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
DataWorks Summit
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
YARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
Device Fingerprinting: オンライン広告効果計測への応用
Device Fingerprinting: オンライン広告効果計測への応用
Koji Suganuma
Global Gaming On AWS
Global Gaming On AWS
Yuta Imai
More Related Content
What's hot
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
HDP-1 introduction for HUG France
HDP-1 introduction for HUG France
Steve Loughran
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Spark Security
Spark Security
Yifeng Jiang
Next Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
DataWorks Summit
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
Running Zeppelin in Enterprise
Running Zeppelin in Enterprise
DataWorks Summit
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
DataWorks Summit
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
YARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
What's hot
(20)
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
HDP-1 introduction for HUG France
HDP-1 introduction for HUG France
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
Spark Security
Spark Security
Next Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
Running Zeppelin in Enterprise
Running Zeppelin in Enterprise
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
YARN - Past, Present, & Future
YARN - Past, Present, & Future
Viewers also liked
Device Fingerprinting: オンライン広告効果計測への応用
Device Fingerprinting: オンライン広告効果計測への応用
Koji Suganuma
Global Gaming On AWS
Global Gaming On AWS
Yuta Imai
Hadoop/Spark セルフサービス系の事例まとめ
Hadoop/Spark セルフサービス系の事例まとめ
Yuta Imai
Hadoop最新事情とHortonworks Data Platform
Hadoop最新事情とHortonworks Data Platform
Yuta Imai
Apache Hiveの今とこれから - 2016
Apache Hiveの今とこれから - 2016
Yuta Imai
Benchmark and Metrics
Benchmark and Metrics
Yuta Imai
Dynamic Resource Allocation in Apache Spark
Dynamic Resource Allocation in Apache Spark
Yuta Imai
Apache ambari
Apache ambari
Yuta Imai
IoTアプリケーションで利用するApache NiFi
IoTアプリケーションで利用するApache NiFi
Yuta Imai
Hadoop and Kerberos
Hadoop and Kerberos
Yuta Imai
OLAP options on Hadoop
OLAP options on Hadoop
Yuta Imai
Spark at Scale
Spark at Scale
Yuta Imai
Deep Learning On Apache Spark
Deep Learning On Apache Spark
Yuta Imai
“Septeni×Scala”勉強会#1資料_20150219_寺坂
“Septeni×Scala”勉強会#1資料_20150219_寺坂
ikuyaterasaka
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
Spark Summit
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
Gwen (Chen) Shapira
Scalaに至るまでの物語 - Septeni × Scala 第一回 杉谷
Scalaに至るまでの物語 - Septeni × Scala 第一回 杉谷
Yasuyuki Sugitani
Javaトラブルに備えよう #jjug_ccc #ccc_h2
Javaトラブルに備えよう #jjug_ccc #ccc_h2
Norito Agetsuma
成功したチームと成功しなかったチーム 20160608
成功したチームと成功しなかったチーム 20160608
Keiichi Endo
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Viewers also liked
(20)
Device Fingerprinting: オンライン広告効果計測への応用
Device Fingerprinting: オンライン広告効果計測への応用
Global Gaming On AWS
Global Gaming On AWS
Hadoop/Spark セルフサービス系の事例まとめ
Hadoop/Spark セルフサービス系の事例まとめ
Hadoop最新事情とHortonworks Data Platform
Hadoop最新事情とHortonworks Data Platform
Apache Hiveの今とこれから - 2016
Apache Hiveの今とこれから - 2016
Benchmark and Metrics
Benchmark and Metrics
Dynamic Resource Allocation in Apache Spark
Dynamic Resource Allocation in Apache Spark
Apache ambari
Apache ambari
IoTアプリケーションで利用するApache NiFi
IoTアプリケーションで利用するApache NiFi
Hadoop and Kerberos
Hadoop and Kerberos
OLAP options on Hadoop
OLAP options on Hadoop
Spark at Scale
Spark at Scale
Deep Learning On Apache Spark
Deep Learning On Apache Spark
“Septeni×Scala”勉強会#1資料_20150219_寺坂
“Septeni×Scala”勉強会#1資料_20150219_寺坂
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
Scalaに至るまでの物語 - Septeni × Scala 第一回 杉谷
Scalaに至るまでの物語 - Septeni × Scala 第一回 杉谷
Javaトラブルに備えよう #jjug_ccc #ccc_h2
Javaトラブルに備えよう #jjug_ccc #ccc_h2
成功したチームと成功しなかったチーム 20160608
成功したチームと成功しなかったチーム 20160608
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Similar to Hadoop in adtech
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
Mac Moore
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
VMware Tanzu
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
Krishna-Kumar
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
Running Apache Zeppelin production
Running Apache Zeppelin production
Vinay Shukla
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
DataWorks Summit
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
Hadoop & devOps : better together
Hadoop & devOps : better together
Maxime Lanciaux
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
Hortonworks
Similar to Hadoop in adtech
(20)
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Running Apache Zeppelin production
Running Apache Zeppelin production
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Hadoop & devOps : better together
Hadoop & devOps : better together
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
More from Yuta Imai
Node-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no Internet
Node-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no Internet
Yuta Imai
Spark Streaming + Amazon Kinesis
Spark Streaming + Amazon Kinesis
Yuta Imai
オンラインゲームの仕組みと工夫
オンラインゲームの仕組みと工夫
Yuta Imai
Amazon Machine Learning
Amazon Machine Learning
Yuta Imai
Digital marketing on AWS
Digital marketing on AWS
Yuta Imai
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
Yuta Imai
クラウドネイティブなアーキテクチャでサクサク解析
クラウドネイティブなアーキテクチャでサクサク解析
Yuta Imai
CloudFront経由でのCORS利用
CloudFront経由でのCORS利用
Yuta Imai
More from Yuta Imai
(8)
Node-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no Internet
Node-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no Internet
Spark Streaming + Amazon Kinesis
Spark Streaming + Amazon Kinesis
オンラインゲームの仕組みと工夫
オンラインゲームの仕組みと工夫
Amazon Machine Learning
Amazon Machine Learning
Digital marketing on AWS
Digital marketing on AWS
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
クラウドネイティブなアーキテクチャでサクサク解析
クラウドネイティブなアーキテクチャでサクサク解析
CloudFront経由でのCORS利用
CloudFront経由でのCORS利用
Recently uploaded
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Recently uploaded
(20)
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Hadoop in adtech
1.
Hadoop in adtech
world Yuta Imai Solu,ons Engineer, Hortonworks © Hortonworks Inc. 2011 – 2015. All Rights Reserved
2.
What is Apache Hadoop?
3.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved runs on ETL RDBMS Import/Export Distributed Storage & Processing Framework Secure NoSQL DB SQL on HBase NoSQL DB Workflow Management SQL Streaming Data IngesFon Cluster System OperaFons Secure Gateway Distributed Registry ETL Search & Indexing Even Faster Data Processing Data Management Machine Learning Hadoop Ecosystem
4.
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Data Pla:orm(HDP)
5.
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1st Gen Hadoop: Cost EffecBve Batch at Scale HADOOP 1.0 Built for Web-Scale Batch Apps Single App BATCH HDFS Single App INTERACTIVE Single App BATCH HDFS Silos created for dis,nct use cases Single App BATCH HDFS Single App ONLINE
6.
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Beyond Batch with YARN Single Use Sysztem Batch Apps Mul2 Use Data Pla6orm Batch, InteracFve, Online, Streaming, … A shiH from the old to the new… HADOOP 1 MapReduce (cluster
resource management & data processing) Data Flow Pig SQL Hive Others API, Engine, and System YARN (Data Operating System: resource management, etc.) Data Flow Pig SQL Hive Other ISV Apache Yarn as a Base System Engine API’s 1 ° ° ° ° ° ° ° ° ° ° N HDFS (redundant, reliable storage) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (redundant, reliable storage) Batch MapReduce Tez Tez MapReduce as the Base HADOOP 2
7.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture Enabled by YARN A single set of data across the en,re cluster with mul,ple access methods using “zones” for processing 1 °
° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° n SQL Hive Interac,ve SQL Query for Analy,cs Pig Script-based ETL Algorithm executed in batch to rework data used by Hive and HBase consumers • Maximize compute resources to lower TCO • No standalone, silo’d clusters • Simple management & operations …all enabled by YARN Stream Processing Storm Iden,fy & act on real- ,me events NoSQL Hbase Accumulo Low-latency access serving up a web front end
8.
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Workload EvoluBon Single Use System Batch Apps Mul2 Use Data Pla6orm Batch, InteracFve, Online, Streaming, … A shiH from the old to the new… Mul2 Use Pla6orm Data & Beyond HADOOP
1 YARN HADOOP 2 1 ° ° ° ° ° ° ° ° N HDFS (redundant, reliable storage) 1 ° ° ° ° ° ° N HDFS MapReduce HADOOP.Next YARN ‘ 1 ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (redundant, reliable storage) DATA ACCESS APPS Docker MySQLMR2 Others (ISV Engines) Multiple (Script, SQL, NoSQL, …) MR2 Others (ISV Engines) Multiple (Script, SQL, NoSQL, …) Docker Tomcat Docker Other
9.
Hadoop OperaBons & Tools
10.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How Do
You Operate a Hadoop Cluster? Apache™ Ambari is a pla:orm to provision, manage and monitor Hadoop clusters
11.
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Core
Features and Extensibility Install & Configure Operate, Manage & Administer Develop OpBmize & Tune Developer Data Architect Ambari provides core services for operaBons, development and extensions points for both Extensibility Features Stacks, Blueprints & REST APIs Core Features Install Wizard & Web Web, Operator Views, Metrics & Alerts User Views User Views Views Framework & REST APIs Views Framework Views Framework How? Cluster Admin
12.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New user interface enables fast & easy SQL defini,on and execu,on.
13.
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New User
Views for DevOps Capacity Scheduler View Browse and manage YARN queues Tez View View informa,on related to Tez jobs that are execu,ng on the cluster
14.
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New User Views for Development Pig View Author and execute Pig Scripts. Hive View Author, execute and debug Hive queries. Files View Browse HDFS file system.
15.
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin • Web-based notebook for data engineers, data analysts and data scien,sts •
Brings interac,ve data inges,on, data explora,on, visualiza,on, sharing and collabora,on features to Hadoop and Spark • Modern data science studio • Scala with Spark • Python with Spark • SparkSQL • Apache Hive, and more.
16.
Hadoop use cases in adtech world
17.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoopの多くのユースケースはHive • 例えばWebサービスのアクセスレポートの作成などによく利⽤され、以下の 様なアーキテクチャが⾮常にメジャーだった。 •
クエリにはそれなりに時間がかかることが多く、定期ジョブとして実⾏され ることが多かった。 Web Web Web Hadoop log log log
18.
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoopの多くのユースケースはHive • 例えばWebサービスのアクセスレポートの作成などによく利⽤され、以下の 様なアーキテクチャが⾮常にメジャーだった。 •
クエリにはそれなりに時間がかかることが多く、定期ジョブとして実⾏され ることが多かった。 Web Web Web Hadoop log log log ⼤量のデータに対して⼤きな処理をするために利⽤さ れるのがHadoopでありMapReduceだった。 MySQL Report UI
19.
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SQL on
ビッグデータを⾼速化する試み Hive(MapReduce)の速度はインタラクティブなクエリには不⼗分だった。 • Presto • Impala • Drill • Shark(今のSparkSQL)
20.
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoopの多くのユースケースはHive • PrestoやMySQL(データマートとして)などと組み合わせた構成が⼀般的に なってきている Web Web Web Hadoop log log log Report UI
21.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SQL on
ビッグデータ - クラウドサービスの登場 • Amazon Redshift • Google BigQuery
22.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sub-second ショートクエリで 1秒以下のレスポンスを⽬指す Ã ~Hive1.2.1 – Tez – Cost Based
Optimizer(CBO) – ORC File format – Vectorization à Hive2.0 – LLAP Stinger Initiative Hiveを100倍以上⾼速化 Already available on HDP! もちろんHive⾃⾝も⾼速化している
23.
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hiveの⾼速化 Web Web Web Hadoop log log log Report UI • Hiveで直接インタラクティブクエリを処理できるようになった
24.
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 今では様々なところに利⽤されるHadoopエコシステム Web Web Web Hadoop HDFS log log log Report UI レポート すべてのログの⻑期保存 ETLやもろもろのバッチ処理
25.
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 今では様々なところに利⽤されるHadoopエコシステム Web Web Web Hadoop HDFS log log log Report UI Ads server 配信DB ⼊札やオプティマイゼー ションのモデル⽣成
26.
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 今では様々なところに利⽤されるHadoopエコシステム Web Web Web Hadoop HDFS log log log Report UI Ads server リアルタイムなロ グ収集 リアルタイムトラッキング
27.
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 今では様々なところに利⽤されるHadoopエコシステム Web Web Web Hadoop HDFS log log log Report UI Ads server 配信DB レポート ⼊札やオプティマイゼー ションのモデル⽣成 リアルタイムトラッキング すべてのログの⻑期保存 リアルタイムなロ グ収集 ETLやもろもろのバッチ処理
28.
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 今では様々なところに利⽤されるHadoopエコシステム Web Web Web Hadoop HDFS log log log Report UI Ads server 配信DB レポート ⼊札やオプティマイゼー ションのモデル⽣成 リアルタイムトラッキング すべてのログの⻑期保存 リアルタイムなロ グ収集 ETLやもろもろのバッチ処理 Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Provide layered approach to security through Authen,ca,on, Authoriza,on, Accoun,ng, and Data Protec,on SECURITY GOVERNANCE Deploy and effec,vely manage the plahorm °
° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive Java Scala Cascadin g Stream Storm Search Solr NoSQL HBase Accumulo BATCH, INTERACTIVE & REAL-TIME DATA ACCESS In- Memory Spark Others ISV Engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) Tez Slider Slider Tez Tez OPERATIONS
29.
Key highlights in recent
Hadoop evolution
30.
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 昨今のHadoopの進化 Ã LLAP Ã
HCatalog Stream Mutation API Ã Cloudbreak
31.
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 昨今のHadoopの進化 Ã Hive – LLAP – ACID, HCatalog
Stream Mutation API Ã Cloudbreak
32.
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hive: Fast Facts Most Queries Per Hour 100,000 Queries Per Hour AnalyBcs Performance 100 Million rows/s Per Node (with Hive LLAP) Largest Hive Warehouse 300+ PB Raw Storage (Facebook) Largest Cluster 4,500+ Nodes (Yahoo)
33.
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SQL evolution
on Hadoop Capabilities Batch SQL OLAP / Cube Interactive SQL Sub-Second SQL ACID / MERGE Speed Feature Hive0.x (MapReduce) Hive1.2- (Tez, Vectorize, ORC, CBO) Hive2.0 (LLAP) Presto Impala Drill Spark SQL HAWQ MPP Kylin Druid Commercial Kyvos Insights AtScale Source
34.
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Architecture Overview Deep Storage HDFS S3 + Other HDFS Compa,ble Filesystems YARN Cluster LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache (Shared Across All Users)
35.
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Architecture Overview Deep Storage HDFS S3 + Other HDFS Compa,ble Filesystems YARN Cluster LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache (Shared Across All Users) MPP型に近いアーキテクチャを取りながら・・・ •
キャッシュレイヤを持ったり • YARNによるスケール機能を利⽤したり • 低いレイテンシが必要ないクエリは通常のTezコンテナで処理できたりと いろいろおいしいどころどりな設計
36.
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 0 5 10 15 20 25 30 35 40 45 50 0 50 100 150 200 250 Speedup (x Factor) Query Time(s) (Lower is Beper) Hive 2 with LLAP averages 26x faster than Hive 1 Hive 1 / Tez Time (s) Hive 2 / LLAP Time(s)
Speedup (x Factor) Hive 2 with LLAP: 25+x Performance Boost
37.
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive ACID ProducBon-Ready with HDP 2.5 Ã Tested at mul,-TB scale using TPC-H benchmark. –
Reliably ingest 400GB+ per day within a par,,on. – 10TB+ raw data in a single par,,on. – Simultaneous ingest, delete and query. Ã 70+ stabiliza,on improvements. Ã Supported: – SQL INSERT, UPDATE, DELETE. – Streaming API. Ã Future: SQL MERGE under development (HIVE-10924). Notable Improvements 0 MB 1 TB 1 TB 2 TB 2 TB 3 TB 3 TB 4 TB 4 TB 5 TB 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 16/05/24 16/05/25 16/05/26 16/05/27 16/05/28 16/05/29 16/05/30 16/05/31 16/06/01 Time (s) Query Time versus Data Size Run,me for All Queries (s) Total Compressed Data 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 16/05/23 16/05/24 16/05/25 16/05/26 16/05/27 16/05/28 16/05/29 16/05/30 16/05/31 16/06/01 Time (s) Times for Inserts and Deletes ,me_insert_lineitem ,me_insert_orders ,me_delete_lineitem ,me_delete_orders
38.
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive ACID ProducBon-Ready with HDP 2.5 Ã Tested at mul,-TB scale using TPC-H benchmark. –
Reliably ingest 400GB+ per day within a par,,on. – 10TB+ raw data in a single par,,on. – Simultaneous ingest, delete and query. Ã 70+ stabiliza,on improvements. Ã Supported: – SQL INSERT, UPDATE, DELETE. – Streaming API. Ã Future: SQL MERGE under development (HIVE-10924). Notable Improvements 0 MB 1 TB 1 TB 2 TB 2 TB 3 TB 3 TB 4 TB 4 TB 5 TB 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 16/05/24 16/05/25 16/05/26 16/05/27 16/05/28 16/05/29 16/05/30 16/05/31 16/06/01 Time (s) Query Time versus Data Size Run,me for All Queries (s) Total Compressed Data 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 16/05/23 16/05/24 16/05/25 16/05/26 16/05/27 16/05/28 16/05/29 16/05/30 16/05/31 16/06/01 Time (s) Times for Inserts and Deletes ,me_insert_lineitem ,me_insert_orders ,me_delete_lineitem ,me_delete_orders 分析/集計⽤DBのつらいところとして、データをバッチ処理的に投⼊して やる必要があった。ストリームインサートができるのは⼤きなメリット。
39.
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HCatalog Stream
Mutation API ORC ORC ORC ORC ORC ORC HDFS Table Bucket Bucket Bucket ORC
40.
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 昨今のHadoopの進化 Ã Hive – LLAP – ACID, HCatalog
Stream Mutation API Ã Cloudbreak
41.
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cloudbreak BI / AnalyBcs (Hive) IoT Apps (Storm, HBase, Hive) Dev / Test (all HDP services) Data Science (Spark) Cloudbreak 1. Pick a Blueprint 2.
Choose a Cloud 3. Launch HDP! Example Ambari Blueprints: IoT Apps, BI / Analy,cs, Data Science, Dev / Test クラウドへのHDPデプロイの実⾏を容易に
42.
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 昨今のHadoopの進化:まとめると・・・ Ã Hive – LLAP – ACID, HCatalog
Stream Mutation API Ã Cloudbreak
43.
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 昨今のHadoopの進化: クラウドとうまく共存できる⽅向に Cache Cache Cache リアルタイムなデータ収集 クラウド内外への オンデマンドなクラスタデプロイ クラウドストレージを活 ⽤しながら低レイテンシ なクエリ処理
Download now