Submit Search
Upload
Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive
•
Download as PPTX, PDF
•
4 likes
•
2,377 views
DataWorks Summit/Hadoop Summit
Follow
Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive
Read less
Read more
Technology
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 32
Download now
Recommended
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
DataWorks Summit/Hadoop Summit
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
Large-scaled telematics analytics
Large-scaled telematics analytics
DataWorks Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
Recommended
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
DataWorks Summit/Hadoop Summit
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
Large-scaled telematics analytics
Large-scaled telematics analytics
DataWorks Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
How do you decide where your customer was?
How do you decide where your customer was?
DataWorks Summit/Hadoop Summit
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
DataWorks Summit/Hadoop Summit
Data Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystem
DataWorks Summit/Hadoop Summit
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
DataWorks Summit/Hadoop Summit
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
DataWorks Summit/Hadoop Summit
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
From Device to Data Center to Insights
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Industrial Automation System introduction
Industrial Automation System introduction
Md. Mashiur Rahman
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
More Related Content
What's hot
How do you decide where your customer was?
How do you decide where your customer was?
DataWorks Summit/Hadoop Summit
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
DataWorks Summit/Hadoop Summit
Data Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystem
DataWorks Summit/Hadoop Summit
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
DataWorks Summit/Hadoop Summit
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
DataWorks Summit/Hadoop Summit
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
From Device to Data Center to Insights
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
What's hot
(20)
How do you decide where your customer was?
How do you decide where your customer was?
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
Data Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystem
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
From Device to Data Center to Insights
From Device to Data Center to Insights
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
Apache Spark Crash Course
Apache Spark Crash Course
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
Similar to Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive
Industrial Automation System introduction
Industrial Automation System introduction
Md. Mashiur Rahman
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
NTT DATA OSS Professional Services
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
MapR Technologies
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Apache Apex
IoT Analytics
IoT Analytics
Anjana Fernando
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Karthik Murugesan
Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache ...
Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache ...
Hitachi, Ltd. OSS Solution Center.
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
Wenrui Meng
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
nitin anjankar
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
Databricks
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
Chief Analytics Officer Forum
Similar to Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive
(20)
Industrial Automation System introduction
Industrial Automation System introduction
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Analytics
IoT Analytics
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache ...
Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache ...
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Recently uploaded
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
hans926745
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
apidays
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Recently uploaded
(20)
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive
1.
© Hitachi, Ltd.
2016. All rights reserved. Hitachi, Ltd. OSS Solution Center 2016/10/27 Yusuke Furuyama Yang Xie Leveraging smart meter data for electric utilities: Comparison of Spark SQL with Hive
2.
© Hitachi, Ltd.
2016. All rights reserved. 1. Leveraging smart meter data[Sample use case for electric utilities] 2. Performance evaluation of MapReduce and Spark 1.6 (using Hive and Spark SQL) 3. Additional evaluation with Spark 2.0 Contents 1 4. Summary
3.
2© Hitachi, Ltd.
2016. All rights reserved. 1. Leveraging smart meter data [Sample use case for electric utilities]
4.
3© Hitachi, Ltd.
2016. All rights reserved. 1-1 Hitachi’s Social innovation business approach
5.
4© Hitachi, Ltd.
2016. All rights reserved. 1-2 Situation of electric utilities and their needs • Electric utilities have to adapt to competitive free market • Request for price cut of power transmission fee from government Liberalization of the retail Electric Power Market Needs to cut the cost for transmission and distribution equipment • Transmission and distribution equipment was replaced periodically • Decide the timing of replace by the condition of equipment Maintenance team needs: Obtain the load status of each equipment Electricity rate Power plant Transmission Lines Substation Distribution Lines Home Transmission and Distribution Companies Power transmission feeElectric Power Generation Companies Electricity retailers
6.
5© Hitachi, Ltd.
2016. All rights reserved. 1-3 Situation of electric utilities and their needs (future) • Decreasing nuclear plant as a stable power supplier • Increasing renewable energy supply Unstable power supply Needs for high level Demand Response • Rates by time zone (current demand response) • Many and small renewable energy suppliers • Near real-time demand response for each distribution system Planning team needs: Obtain near real-time load status of each equipment Electricity rate Power plant Transmission Lines Substation Distribution Lines Home Transmission and Distribution Companies Power transmission feeElectric Power Generation Companies Electricity retailers
7.
6© Hitachi, Ltd.
2016. All rights reserved. 1-4 Leveraging big data for electric utilities Power Grid Transformer (500/Distribution Line) Switch (20/Distribution Line) Meter Data Management System Data from smart meters(every 30min.) Analyze the data from smart meters to grasp the load status of equipment Meet the needs of electric utilities Substation (1,000) Distribution Lines (5/Substation) Smart Meter (4/Trans) ・・・0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 Data Analysis System Total 10,000,000 Concern about performance needed for near real-time processing
8.
7© Hitachi, Ltd.
2016. All rights reserved. Data processing platform (Hadoop, Spark) Data Analysis System Target of this session Planner (Equipment/Demand response) Meter Data Management System 1-5 System Component Raw Data Processed Data Data visualization tool (Pentaho)
9.
8© Hitachi, Ltd.
2016. All rights reserved. 2. Performance evaluation of MapReduce and Spark 1.6 (using Hive and Spark SQL)
10.
9© Hitachi, Ltd.
2016. All rights reserved. 2-1 Use Case for electric utilities Points of analysis Needs Point of analysis Find the equipment that needs to be replaced Find the equipment that has heavy workload Estimate the timing for replacement Check the trend of load status Select the proper capacity of new equipment Extract the peak of the load Needs of electric utilities (recap) Needs ・Analyze the data from smart meters to grasp the load status of equipment ・Near real-time
11.
10© Hitachi, Ltd.
2016. All rights reserved. 2-2 Contents of performance evaluation Point of evaluation for near real-time processing Check if MapReduce and Spark can process the data in 30min. Point of analysis Aggregate per Find the equipment has heavy workload Equipment Check the trend of load status Term Extract the peak of the load Time zone Items for evaluation ・Distribution system ・Substation ・Switch ・Transformer ・Meter ・1day ・1month(30days) ・1year(365days) ・Specific 30min of each day ・24h • Concern about performance for processing 10,000,000 meters • Data comes from smart meter every 30min (48/Day, spec of smart meter)
12.
11© Hitachi, Ltd.
2016. All rights reserved. 2-3 Target of performance evaluation Target of evaluation Data Processing Platform (MapReduce, Spark 1.6) Data Analysis System End Data visualization tool(Pentaho) Meter Data Management System Target Aggregation batch (Hive / Spark SQL) Start Time from start of aggregation batch for meter data to end of the batch
13.
12© Hitachi, Ltd.
2016. All rights reserved. 2-4 Target of performance evaluation System Configuration Spec Master Node CPU Core 2 Memory 8 GB Capacity of disk 80 GB # of disk 1 Per slave node total CPU Core 16 64 Memory 128 GB 512 GB Capacity of disk 900 GB - # of disk 6 24 Total capacity of disks 5.4 TB (5,400 GB) 21.6 TB (21,600 GB) 10Gbps LAN 10Gbps SW 1Gbps LAN disk disk ・・・ disk 4 Slave Nodes (Physical Machines) 1 Master Node (Virtual Machine)
14.
13© Hitachi, Ltd.
2016. All rights reserved. 2-5 Dataset Smart meter data Term # of records Size (CSV) Size (ORC File) 365days (1year) 3,650 million 2.475 TB 1.325 TB 30days (1month) 300 million 0.205 TB 0.158 TB 1day 10 million 0.007 TB 0.005 TB 0:00-0:30 power usage 0:30-1:00 power usage ・・・ 23:30-0:00 power usage Meter mgmt. infometer1 0:00-0:30 power usage 0:30-1:00 power usage ・・・ 23:30-0:00 power usage Meter mgmt. infometer2 ・ ・ ・ 0:00-0:30 power usage 0:30-1:00 power usage ・・・ 23:30-0:00 power usage Meter mgmt. info meter 10,000,000 Smart meter data /day Data size 48 columns (every 30min) 10,000,000 records/day
15.
14© Hitachi, Ltd.
2016. All rights reserved. 2-6 Contents of performance evaluation (recap) Items for evaluation Target of evaluation Point of analysis Aggregate per Find the equipment has heavy workload Equipment Check the trend of load status Term Extract the peak of the load Time zone ・Distribution system ・Substation ・Switch ・Transformer ・Meter ・1day ・1month(30days) ・1year(365days) ・Specific 30min of each day ・24h Point of evaluation Time from start of aggregation batch for meter data to end of the batch Check if MapReduce and Spark can process the data in 30min.
16.
15© Hitachi, Ltd.
2016. All rights reserved. Time from start of aggregation batch for meter data to end of the batch 2-6 Contents of performance evaluation (recap) Point of evaluation Check if MapReduce and Spark can process the data in 30min. Items for evaluation Point of analysis Aggregate per Find the equipment has heavy workload Equipment Check the trend of load status Term Extract the peak of the load Time zone ・Distribution system ・Substation ・Switch ・Transformer ・Meter ・1day ・1month(30days) ・1year(365days) ・Specific 30min of each day ・24h + File type ・Text (CSV) ・ORCFile (Column-based) Target of evaluation
17.
16© Hitachi, Ltd.
2016. All rights reserved. 2-7 Comparison of txt with ORCFile (MapReduce) • Couldn’t finish processing in 30min • Performance improvement by ORCFile 62 sec 224 sec 3286 sec 99 sec 338 sec 6962 sec 0 sec 2000 sec 4000 sec 6000 sec 1day 30days 365days Hive + TXT Hive + ORCFile 31 sec 69 sec 492 sec 32 sec 138 sec 3015 sec 0 sec 2000 sec 4000 sec 1day 30days 365days Hive + TXT Hive + ORCFile 30min 30min Result (24h) Result (0.5h)
18.
17© Hitachi, Ltd.
2016. All rights reserved. 2-8 Comparison of txt with ORCFile (Spark 1.6) 28 sec 50 sec 993 sec 47 sec 154 sec 1408 sec 0 sec 400 sec 800 sec 1200 sec 1600 sec 1day 30days 365days Spark SQL + TXT Spark SQL + ORCFile • Could finish processing in 30min (1800s) • Performance improvement by ORCFile 26 sec 32 sec 169 sec 30 sec 111 sec 1263 sec 0 sec 400 sec 800 sec 1200 sec 1600 sec 1day 30days 365days Spark SQL + TXT Spark SQL + ORCFile Result (24h) Result (0.5h)
19.
18© Hitachi, Ltd.
2016. All rights reserved. 2-9 Review of the results Why the processing was fast with ORCFile Processing 0.5h data ・ ・ ・ ・・・ Smart meter data ・ ・ ・ 0:00 23:30 0:00 0:30 0:00 0:00 0:00 0:00 0:00 0:00 0:00 Use 1 column only Processing 24h data ・ ・ ・ ・ ・ ・ + ・・・ ・ ・ ・ ・ ・ ・ 0:30 23:30 SUM/day + Use all 48 columns + • Processing big data with ORCFile was more effective than processing small data Results • Processing 0.5h data with ORCFile was more effective than processing 24h data Compression Reading specific column Features of ORC File + + + + + + + + + SUM/day SUM/day SUM/day SUM/day Smart meter data 0:00 0:30 0:30 0:30 0:30 23:30 23:30 23:30 23:30 0:00 0:00 0:00 0:00 0:00 0:30 0:30 0:30 0:30 23:30 23:30 23:30 23:30
20.
19© Hitachi, Ltd.
2016. All rights reserved. 32 sec 104 sec 69 sec 556 sec 0 sec 500 sec 1000 sec Distribution system Per equipment Hive Spark SQL 2-10 Comparison of MapReduce with Spark 1.6 50 sec 145 sec 224 sec 718 sec 0 sec 500 sec 1000 sec Distribution system Per equipment Hive Spark SQL 993 sec 1359 sec 3286 sec 4131 sec 0 sec 1000 sec 2000 sec 3000 sec 4000 sec Distribution system Per equipment Hive Spark SQL ・Could finish processing the data from 10,000,000 meter in 30min using Spark 1.6! ・Spark’s good performance with “per equipment” processing Result (30days / 24h) Result (30days / 0.5h) Result (365days / 24h) 30min 80% down 78% down 81% down 54% down
21.
20© Hitachi, Ltd.
2016. All rights reserved. Trans 2,500,000 Switch 100,000 2-11 Review of the results Why the processing per equipment was more effective than the processing for entire distribution system when using spark? Per equipment ・ ・ ・ ・・・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ 0:00 0:00 0:00 0:00 0:00 0:30 0:30 0:30 0:30 0:30 23:30 23:30 23:30 23:30 23:30 Trans1 Trans2 Trans3 Switch1 Switch2 Line1 Substa.1 10005000 ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ 0:00 0:00 0:00 0:30 23:30 23:30 23:30 ・・・ 0:00 0:00 0:30 0:30 23:30 23:30 For entire distribution system JOIN ・Less disk I/O than MapReduce ・Smaller data (including re-distributing data) than total memory of cluster Smart meter data Transformer Switch Distribution Line Substation Distribution System Distribution System JOIN JOIN JOIN JOIN Smart meter data 0:30 0:30
22.
21© Hitachi, Ltd.
2016. All rights reserved. 3. Additional evaluation with Spark 2.0
23.
22© Hitachi, Ltd.
2016. All rights reserved. 3-1 Evaluation environment System Configuration Spec Master Node CPU Core 16 Memory 12 GB Capacity of disk 900 GB # of disk 8 Per slave node total CPU Core 20 120 Memory 384 GB 2304 GB Capacity of disk 1200 GB - # of disk 10 60 Total capacity of disks 12.0 TB (12,000 GB) 72.0 TB (72,000 GB) 10Gbps LAN 10Gbps SW 6 Slave Nodes (Physical) 1 Master Node (Physical) disk disk ・・・ disk
24.
23© Hitachi, Ltd.
2016. All rights reserved. 3-2 Comparison of Spark 2.0 with Spark 1.6 316 sec 322 sec 0 sec 100 sec 200 sec 300 sec 400 sec Per Equipment Spark 1.6 Spark 2.0 87 sec 102 sec 0 sec 30 sec 60 sec 90 sec 120 sec Per Equipment Spark 1.6 Spark 2.0 71 sec 89 sec 0 sec 20 sec 40 sec 60 sec 80 sec 100 sec Per Equipment Spark 1.6 Spark 2.0 ・Performance improvement roughly 20% (including disk I/O) ・More effective with small data Result (365days / 24h) Result (30days / 24h) Result (1day / 24h) Time from start of aggregation batch for meter data to end of the batch
25.
24© Hitachi, Ltd.
2016. All rights reserved. Demo
26.
25© Hitachi, Ltd.
2016. All rights reserved. Data visualization tool (Pentaho) Aggregated Data (29 days) Demo: Data aggregation and visualization ② Execute aggregation batch (Spark SQL) ① Show 29 days data Aggregated Data (30 days) ③ Show 30 days data Raw Data (1day) ④ Outlier detected!!
27.
26© Hitachi, Ltd.
2016. All rights reserved. 4. Summary
28.
27© Hitachi, Ltd.
2016. All rights reserved. 4 Summary Leveraging data from 10,000,000 smart meters for electric utilities - Built data analysis system - Concern about performance Evaluate the performance of batch processing - Spark could process the data from 10,000,000 meters in 30min (4node) Evaluate the performance of Spark 2.0 - Performance improvement roughly 20% (compared to 1.6) Data Processing Platform (MapReduce, Spark) Data Analysis System Data visualization tool(Pentaho) Aggregation batch (Hive / Spark SQL) Meter Data Management System
29.
© Hitachi, Ltd.
2016. All rights reserved. Hitachi, Ltd. OSS Solution Center Comparison of Spark SQL with Hive Leveraging smart meter data for electric utilities: 2016/10/27 Yusuke Furuyama Yang Xie END 28
30.
29© Hitachi, Ltd.
2016. All rights reserved. 他社商品名、商標等の引用に関する表示 Hadoop、HiveおよびSparkは、Apache Software Foundationの米国およびその他の国における登録商標または商 標です。 その他記載の会社名、製品名などは、それぞれの会社の商標もしくは登録商標です。
31.
32.
31© Hitachi, Ltd.
2016. All rights reserved. Appendix. Difficulty with large data to be shuffled Attempted to aggregate raw (48 columns) meter data per equipment - Extremely slow (Spark 2.0) or Job failed (Spark 1.6) - Processing: Iteration of JOIN and GROUP BY+SUM - Huge data to be shuffled (spilled out from page cache) 76 sec 223 sec 13281 sec 85 sec 236 sec 3650 sec 0 sec 2000 sec 4000 sec 6000 sec 8000 sec 10000 sec 12000 sec 14000 sec 1day 30days 365days Processingtime Target data Before adding After adding Add HDFS disks as disks for shuffle - Performance Improved (365days) - Performance degraded (1day/30days) ・Data for Spark (including temporary data) should be smaller than memory. ・Had better to process as a trial to estimate Adding disks for shuffle (Spark 2.0.0) Heavy load on a local disk (OS disk) by shuffle
Download now