Powering a Virtual Power Station with Big Data

•Download as PPTX, PDF•

2 likes•770 views

DataWorks Summit/Hadoop Summit

Technology

0
5
10
15
20
25
30
35
Installed Capacity (GW) Generation (GW)

0
2
4
6
8
10
12
14
16
18
20
0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30
MW
Total Power
Average upwards flex – 120%
Average downwards flex – 35%

• 25-40k messages processed per second
• Total size of data 500TB-800TB
Open Energi in the coming year:

• 25-40k messages processed per second
• Total size of data 500TB-800TB
Open Energi in the coming year:
Perspective: here’s what “big data” means to Boeing [1]:
• ~64k messages per second from each aircraft
• Total size of data over 100 petabytes
[1]: http://bit.ly/18kQlMn

0
20
40
60
80
100
120
Open Energi Boeing
Size of data (PB)
Our data is not huge at the moment…

…but after domestic demand-side response (or something else on that scale)
0
20
40
60
80
100
120
Open Energi Boeing
Size of data (PB)

Why Hortonworks Data Platform
• Can scale quickly to respond to market demands
• Interoperability with existing code
• Fantastic data integration
• Knowledgeable technical support
• Security and data governance

Batch | Our HDP setup
Flume
Asset Data
National
Electricity Data
Market data
Other “live”
timeseries data
Hive
Streaming
Hive
other
Applications

Real-time | (Work ongoing)
Asset Data
ML models
HDFS, cache,
Elasticsearch
…
Update ML Models
Correlate Events
Enrich

Apache Hive | Example
CREATE EXTERNAL TABLE semi_structured_stuff (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = ‘semi/structured',
'es.index.auto.create' = 'false') ;
SELECT something FROM semi_structured_stuff
JOIN metadata m ON …
LEFT JOIN timeseries t ON …
Index semi-structured data
(Elasticsearch)
Use Hive to integrate this with
timeseries data and other metadata
Farm out complex analytics to
Python
SELECT transform(something)
USING ‘insane_maths.py’
AS (result)

Benefits
• Reduced storage cost compared to SAN + SQL Server
• Better utilisation of infrastructure thanks to YARN
• Pain-free integration of multiple data sources with external tables
in Hive
• Scale up/down on demand
• Re-use existing Python code = low development overhead

Dynamic
Demand
Predict
&
Forecast
Optimise
&
Explore
Verify
Alert Simulations
Insights via web
Machine learning
Statistical Analysis
Event correlation
Expert system
Real-time aggregation
Real-time web feed

What's hot

To The Cloud and Back: A Look At Hybrid AnalyticsDataWorks Summit/Hadoop Summit

What's new in Hadoop Common and HDFS DataWorks Summit/Hadoop Summit

October 2014 HUG : Hive On SparkYahoo Developer Network

TriHUG Feb: Hive on sparktrihug

February 2014 HUG : Hive On TezYahoo Developer Network

A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit

Hadoop from Hive with Stinger to TezJan Pieter Posthuma

Empower Data-Driven OrganizationsDataWorks Summit/Hadoop Summit

Streaming SQLDataWorks Summit/Hadoop Summit

HPE Hadoop Solutions - From use cases to proposalDataWorks Summit

Real-time Analytics with Trino and Apache PinotXiang Fu

Hudi architecture, fundamentals and capabilitiesNishith Agarwal

Interactive Analytics at Scale in Apache Hive Using DruidDataWorks Summit/Hadoop Summit

HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon

HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.

Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko

Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit

Hd insight essentials quick viewRajesh Nadipalli

What's hot (20)

To The Cloud and Back: A Look At Hybrid Analytics

What's new in Hadoop Common and HDFS

October 2014 HUG : Hive On Spark

TriHUG Feb: Hive on spark

February 2014 HUG : Hive On Tez

A Container-based Sizing Framework for Apache Hadoop/Spark Clusters

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...

Hadoop from Hive with Stinger to Tez

Empower Data-Driven Organizations

Streaming SQL

HPE Hadoop Solutions - From use cases to proposal

Real-time Analytics with Trino and Apache Pinot

Hudi architecture, fundamentals and capabilities

Interactive Analytics at Scale in Apache Hive Using Druid

HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace

HBaseCon 2012 | HBase for the Worlds Libraries - OCLC

Imply at Apache Druid Meetup in London 1-15-20

Hadoop in the Cloud: Real World Lessons from Enterprise Customers

Hd insight essentials quick view

Viewers also liked

Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit

HDFS: Optimization, Stabilization and SupportabilityDataWorks Summit/Hadoop Summit

The Future of Apache StormDataWorks Summit/Hadoop Summit

Data Process Systems, connecting everythingDataWorks Summit/Hadoop Summit

The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit

Log I am your fatherDataWorks Summit/Hadoop Summit

Cooperative Data Exploration with iPython NotebookDataWorks Summit/Hadoop Summit

Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit

Apache Hive 2.0: SQL, Speed, ScaleDataWorks Summit/Hadoop Summit

Protecting Enterprise Data in Apache HadoopDataWorks Summit/Hadoop Summit

The Heterogeneous Data lakeDataWorks Summit/Hadoop Summit

A Continuously Deployed Hadoop Analytics Platform?DataWorks Summit/Hadoop Summit

Hadoop EverywhereDataWorks Summit/Hadoop Summit

Practical advice to build a data driven companyDataWorks Summit/Hadoop Summit

Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit

NLP Structured Data Investigation on Non-TextDataWorks Summit/Hadoop Summit

Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit

Using a Data Lake at the core of a Life Assurance businessDataWorks Summit/Hadoop Summit

Architecting a multi-tenanted platform DataWorks Summit/Hadoop Summit

Hadoop Platform at YahooDataWorks Summit/Hadoop Summit

Viewers also liked (20)

Taming the Elephant: Efficient and Effective Apache Hadoop Management

HDFS: Optimization, Stabilization and Supportability

The Future of Apache Storm

Data Process Systems, connecting everything

The key to unlocking the Value in the IoT? Managing the Data!

Log I am your father

Cooperative Data Exploration with iPython Notebook

Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...

Apache Hive 2.0: SQL, Speed, Scale

Protecting Enterprise Data in Apache Hadoop

The Heterogeneous Data lake

A Continuously Deployed Hadoop Analytics Platform?

Hadoop Everywhere

Practical advice to build a data driven company

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks

NLP Structured Data Investigation on Non-Text

Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...

Using a Data Lake at the core of a Life Assurance business

Architecting a multi-tenanted platform

Hadoop Platform at Yahoo

Similar to Powering a Virtual Power Station with Big Data

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks

Future Grid Overview 2018Chris J Law

Using the Open Science Data Cloud for Data Science ResearchRobert Grossman

Cloud Computing ...changes everythingLew Tucker

Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick

Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services

apidays LIVE Helsinki & North 2022_Apps without APIsapidays

Enterprise Data LakesFarid Gurbanov

Turning Business Drivers into BusinessPanduit

TBuntel WebDU 2011 PresoTim Buntel

JDD2014: Real Big Data - Scott MacGregorPROIDEA

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.

Data Culture Series - Keynote - 3rd DecJonathan Woodward

The hidden engineering behind machine learning products at HelixaAlluxio, Inc.

Big Data and Analytics Innovation SummitMartin Yan

Alluxio Data Orchestration Platform for the CloudShubham Tagra

Streaming SQL w/ Apache Calcite Hortonworks

Streaming SQL with Apache CalciteJulian Hyde

Google Cloud infrastructure in Conrad Connect by Google & waylayVeselin Pizurica

Presentation architecting virtualized infrastructure for big datasolarisyourep

Similar to Powering a Virtual Power Station with Big Data (20)

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...

Future Grid Overview 2018

Using the Open Science Data Cloud for Data Science Research

Cloud Computing ...changes everything

Petascale Analytics - The World of Big Data Requires Big Analytics

Big Data and High Performance Computing Solutions in the AWS Cloud

apidays LIVE Helsinki & North 2022_Apps without APIs

Enterprise Data Lakes

Turning Business Drivers into Business

TBuntel WebDU 2011 Preso

JDD2014: Real Big Data - Scott MacGregor

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Data Culture Series - Keynote - 3rd Dec

The hidden engineering behind machine learning products at Helixa

Big Data and Analytics Innovation Summit

Alluxio Data Orchestration Platform for the Cloud

Streaming SQL w/ Apache Calcite

Streaming SQL with Apache Calcite

Google Cloud infrastructure in Conrad Connect by Google & waylay

Presentation architecting virtualized infrastructure for big data

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

CloudStudio User manual (basic edition):comworks

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

"ML in Production",Oleksandr BaganFwdays

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Story boards and shot lists for my a level piececharlottematthew16

Commit 2024 - Secret Management made easyAlfredo García Lavilla

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand

Powerpoint exploring the locations used in television show Time Clash

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Unleash Your Potential - Namagunga Girls Coding Club

CloudStudio User manual (basic edition):

SIP trunking in Janus @ Kamailio World 2024

Nell’iperspazio con Rocket: il Framework Web di Rust!

"ML in Production",Oleksandr Bagan

The Future of Software Development - Devin AI Innovative Approach.pdf

Human Factors of XR: Using Human Factors to Design XR Systems

What's New in Teams Calling, Meetings and Devices March 2024

SAP Build Work Zone - Overview L2-L3.pptx

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Story boards and shot lists for my a level piece

Commit 2024 - Secret Management made easy

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Anypoint Exchange: It’s Not Just a Repo!

Vertex AI Gemini Prompt Engineering Tips

Advanced Test Driven-Development @ php[tek] 2024

DMCC Future of Trade Web3 - Special Edition

Powering a Virtual Power Station with Big Data

1. Powering a Virtual Power Station with Big Data Michael Bironneau April 2016

2. 0 5 10 15 20 25 30 35 Installed Capacity (GW) Generation (GW)

6. 0 2 4 6 8 10 12 14 16 18 20 0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30 MW Total Power Average upwards flex – 120% Average downwards flex – 35%

7. ? ?

8. • 25-40k messages processed per second • Total size of data 500TB-800TB Open Energi in the coming year:

9. • 25-40k messages processed per second • Total size of data 500TB-800TB Open Energi in the coming year: Perspective: here’s what “big data” means to Boeing [1]: • ~64k messages per second from each aircraft • Total size of data over 100 petabytes [1]: http://bit.ly/18kQlMn

10. 0 20 40 60 80 100 120 Open Energi Boeing Size of data (PB) Our data is not huge at the moment…

11. …but after domestic demand-side response (or something else on that scale) 0 20 40 60 80 100 120 Open Energi Boeing Size of data (PB)

12. Why Hortonworks Data Platform • Can scale quickly to respond to market demands • Interoperability with existing code • Fantastic data integration • Knowledgeable technical support • Security and data governance

13. Batch | Our HDP setup Flume Asset Data National Electricity Data Market data Other “live” timeseries data Hive Streaming Hive other Applications

14. Real-time | (Work ongoing) Asset Data ML models HDFS, cache, Elasticsearch … Update ML Models Correlate Events Enrich

15. Apache Hive | Example CREATE EXTERNAL TABLE semi_structured_stuff (...) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = ‘semi/structured', 'es.index.auto.create' = 'false') ; SELECT something FROM semi_structured_stuff JOIN metadata m ON … LEFT JOIN timeseries t ON … Index semi-structured data (Elasticsearch) Use Hive to integrate this with timeseries data and other metadata Farm out complex analytics to Python SELECT transform(something) USING ‘insane_maths.py’ AS (result)

16. Benefits • Reduced storage cost compared to SAN + SQL Server • Better utilisation of infrastructure thanks to YARN • Pain-free integration of multiple data sources with external tables in Hive • Scale up/down on demand • Re-use existing Python code = low development overhead

17. Dynamic Demand Predict & Forecast Optimise & Explore Verify Alert Simulations Insights via web Machine learning Statistical Analysis Event correlation Expert system Real-time aggregation Real-time web feed

18. Dynamic Demand Predict & Forecast Optimise & Explore Verify Alert Simulations Insights via web Machine learning Statistical Analysis Event correlation Expert system Real-time aggregation Real-time web feed

19. Thanks for listening. Any questions?

Editor's Notes

There is a powerful economic case to distribute demand more efficiently using DSR technology, regardless of the future generation mix The capital cost of building a new peaking power station can be up to £5 million per megawatt of power The current costs to aggregate a megawatt via Dynamic Demand sit at around £200,000 It provides a no-build approach to capacity challenges which is cleaner, cheaper, more secure and faster than the alternatives.
- Open Energi is turning the energy system on it’s head, so that instead of supply adjusting to meet demand, demand adjusts to meet supply By harnessing small amounts of flexible energy demand from energy-intensive equipment we can create a virtual power station and displace fossil-fuelled peaking power stations This is enabling a user-led transformation in how our energy system works, so that businesses and consumers are not only making it happen, but also seeing the benefits It’s a vital part of our transition to a zero carbon economy because we cannot maximise our use of renewables unless our demand for energy becomes more responsive
Dynamic Demand can deliver approx £85,000 per MW/Yr FCDM / Static FFR £22,000 - £26,000 per MW/Yr STOR - £10,000 - £15,000 per MW/Yr
We capture data at finest grain level. Stored as COV. The challenge is then aggregating multiple timeseries without downsampling. We also need to downsample all these series to multiple resolutions. They are all irregularly sampled. Hence the challenge, which prevents us from using timeseries databases.
Confidence that our data platform can scale quickly if needed The markets we operate in are unpredictable When domestic market takes off, our data could increase by two orders of magnitude! Fantastic data integration support Can easily wrap our existing codebase Reduce our £/GB by 80% for archival data while retaining ability to query Extensibility New tools being added to the ecosystem on a regular basis More and more developers trained in Hadoop ecosystem means easier on-boarding Knowledgeable support from Hortonworks Security and governance built into platform
This is ongoing work and in particular we haven’t quite figured out the “asset data” -> storm bit.
Not limited by storage cost – able to enrich data to reduce cost of processing Better utilisation of infrastructure compared to VMs dedicated to a single service – here YARN means we can really get the most out of everything Ability to mix Python with SQL means easier/maintainable aggregation/downsampling Interactive querying of multiple data sources with Spark in Jupyter Easy ingestion process using multiple Flume agents Can still use Elasticsearch for small timeseries
Now let’s have a look at where HDP fits in to our big “wheel of data”.
Not limited by storage cost – able to enrich data to reduce cost of processing Ability to mix Python with SQL means easier/maintainable aggregation/downsampling Interactive querying of multiple data sources with Spark in Jupyter Easy ingestion process using multiple Flume agents Can still use Elasticsearch for small timeseries

Powering a Virtual Power Station with Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Powering a Virtual Power Station with Big Data

Similar to Powering a Virtual Power Station with Big Data (20)

More from DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded

Recently uploaded (20)

Powering a Virtual Power Station with Big Data

Editor's Notes