SlideShare a Scribd company logo
1 of 22
Enterprise Data and Analytics
Architecture Overview
for
Electric Utility
Dr. Prajesh Bhattacharya
enSustain
Copyright enSustain
Summary
Copyright enSustain
1
The Overall Architecture
Copyright enSustain
Data Warehouse
(ONLY data required for high
performing production
reporting)
Enterprise Nomenclature
Data Lake
(ALL data)
Enterprise Nomenclature
Data Translation Layer (MDMS/EI)
Data Translation Layer (MDMS/EI)
DMS & OMS
Customer Data
Smart Meter
(metadata, readings)
Asset Data
(Location, config)
Financial
Data
Data Historian
SCADA
DG metadata
DG generation
data
HR Data
Usual EDW
process and
structure
Usual EDW
process and
structure
Discovery and
indexing/Tagging
Discovery and
indexing/Tagging
Production
Reports
Projects Data Explorers
(Engineers, Data scientists)
Weather Data
Misc. Sensor
head ends
Security
Data
Transmission
Planning Data
Maintenance
Data
Demand
Response Data
Transmission OE
and Dispatch Data
EMS
Transmission
Market Data
IT Asset Ops Data
IT Support Data
Project
Documents
Marketing &
Sales Data
Catch-All Other
Applications
Email &
chat logs
Facility
Data
Fleet
DataCopyright enSustain
Possible Point-To-Point Exceptions
Purpose oriented connection. Example:
oHistorian facilitates connection with SCADA
o EMS  SCADA connection is latency-sensitive
Application requiring access to only one system
oDMS applications running off of DMS data
o Historian applications running off of Historian data
Copyright enSustain
2
Implementation
Copyright enSustain
The Approach for Implementation
Main Challenges
Siloed data
Solution Part 1:
Standard Data
Model
Solution Part 2:
User view of unified
data
Lack of analytics
ideas
Solution:
Close partnership
between IT and
business
Lack of budget
Solution Part 1:
Tax each new
project
Solution Part 2:
Take baby steps
Copyright enSustain
Necessary Condition for Success
• At the beginning, implement the new mechanism, ONLY to serve the new
requirements
• Keep the existing connections working and unaffected
• Eventually, some of the existing connections will be deemed not-required, by the
business
• The rest of the existing connections can be converted as part of application
maintenance/overhaul/upgrade, but not in the beginning phase of the initiative
Do NOT touch the existing and the working systems first
• Do not try to implement all the necessary new components at once.
• Good quality on small scope is better than mediocre quality on large scope.
• It might require more overhead, but it is often worth.
Scope the smallest possible piece and do it well
Copyright enSustain
Possible Steps for Implementation
A new data
connectivity
requirement
comes in
Identify the
source system
Define the
enterprise
nomenclature
for the source
system to align
with industry
standard
Load
MDMS/EI with
the dictionary
Configure EI to
act as the data
virtualization
layer for the
source system
Release for
production
use with
appropriate
support
mechanism
Milestone: One project is now using this new mechanism for one source system
Repeat 1 for every new
data connectivity
request
As more source systems
are brought into the
scope, resolve
discrepancies, if any
arises
The virtualization layer
might experience
performance issue as
data load increases
Research and Plan the
Data Lake
For every new data source
implementation for the
virtualization, implement
the corresponding ETL for
Data Lake
Open the Data lake to
users that prefer getting
their data from the Data
Lake (delayed but faster)
over virtualization
Implement Data Lake
Analytics (say ML based
on Spark) for a single use
case
Copyright enSustain
3
MDMS, EI, Data Virtualization,
Data Warehouse
Copyright enSustain
Skip This Section
Most utilities already use these systems and are familiar with them
Copyright enSustain
Hence, we will not discuss them
For specific questions, please contact prajesh@ensustain.com
4
Data Lake
Copyright enSustain
Why the Data Lake?
• Some of the SOR systems might not be capable of handling as
much data request
• Access to some of the SOR systems might not be practical
• Implementation of data quality check on virtualized data is
hard (at the least, it would slow down queries)
• Data travel over network: larger in a virtualized environment
than in a Data Lake designed and used in a specific way
• Bottom line: go for Data Lake only if it is foreseen to be needed
If the MDMS/EI layer virtualizes the data, then access to standardized data across the
enterprise is already established.
What additional value does the Data Lake bring?
Data Lake – not the immediate need, but the eventual destination
Copyright enSustain
Data Lake: Market Offering Landscape
Copyright enSustain
Data Lake: Getting Data Into the Lake
Copyright enSustain
HDFS
Enterprise Data
 Shared across the company based on
security policy
 Fully managed and maintained
 Tight SLA
 100% Enterprise taxonomy based tagging
User data
 Results of ad-hoc analyses
 Some maintenance/control/SLA
 Folksonomy based tagging
Project/Group Data
 Enterprise standards might be too
restrictive to fulfill the requirements of the
project
 Shared among a handful of users
 Medium maintenance/control/SLA
 Folksonomy + some governance
Data
Governance
Tagging
Tool
MDMS
Data
Loader
Streaming
Data
Manual
Data
D
A
T
A
S
O
U
R
C
E
S
Hadoop Ecosystem Relevant To Utility
Copyright enSustain
H
D
F
S
YARN
Map-Reduce Application in Java
Hive
Spark Streaming
Spark SQL
Spark ML
Sqoop
Oozie Falcon
Hadoop native client
Storm
QueryIO
Waterline
Data
Attivio
Apache
Atlas
FUNCTIONALITY
COLOR LEGEND
Data Loading
Job Management
Data Governance
Data Reading
Map-Reduce
Data Storage
Data Lake
Vendor Solutions
5
Analytics
Copyright enSustain
Taxonomy 3
Taxonomy 2
Taxonomy 1
The Analytics Tool Landscape
Analytics
Tools
Production
Data Write-
back
Read-only
Project (semi-
production)
Data Write-
back
Read-only
Ad-hoc
Data Write-
back
Read-only
Analytics Tools
Managed
(Server based)
Unmanaged
(Desktop based)
Analytics
Tools
Coding heavy
Configuration
heavy
Copyright enSustain
Sample Analytics Opportunities …
Copyright enSustain
6
Appendix
Copyright enSustain
References
• http://ceur-ws.org/Vol-1497/PoEM2015_ShortPaper4.pdf
• http://smartgrid.epri.com/doc/Utility%20Enterprise%20Architecture%20Best%20Practices%20-
%20webcast.pdf
• http://www.navigantresearch.com/wordpress/wp-content/uploads/2011/10/SGEA-11-Brochure.pdf
• http://www.gridwiseac.org/pdfs/forum_papers/114_127_paper_final.pdf
• http://www.iec.ch/smartgrid/standards/
• https://www.boozallen.com/content/dam/boozallen/documents/Data_Lake.pdf
• Data Warehousing in the Age of Big Data, Krish Krishnan
• https://es.slideshare.net/hortonworks/hortonworks-and-waterline-data-webinar
• http://www.ibmbigdatahub.com/blog/charting-data-lake-rethinking-data-models-data-lakes
• https://www.slideshare.net/fabien_gandon/ontologies-in-computer-science-and-on-the-web
• http://www.ibmbigdatahub.com/blog/charting-data-lake-rethinking-data-models-data-lakes
• https://upside.tdwi.org/articles/2016/03/23/data-lake-become-swamp-1.aspx
• Many other sources
• Indigenous experiments
• Real-world experience
Copyright enSustain
Thank you!
Questions?
prajesh@ensustain.com
Copyright enSustain

More Related Content

What's hot

Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
Full stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorFull stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorSquared Up
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Cloudera, Inc.
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Cloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big dataJack (Yaakov) Bezalel
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningCloudera, Inc.
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceTony Baer
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntSteven Moy
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse OptimisationBigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse OptimisationExcelerate Systems
 

What's hot (20)

Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Full stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorFull stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure Monitor
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning

 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Real timedata
Real timedataReal timedata
Real timedata
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine Learning
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse OptimisationBigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
 

Similar to Enterprise Data and Analytics Architecture Overview for Electric Utility

DRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
DRAFT - Enterprise Data and Analytics Architecture Overview for Electric UtilityDRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
DRAFT - Enterprise Data and Analytics Architecture Overview for Electric UtilityPrajesh Bhattacharya
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSDenodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.Gerardo Pardo-Castellote
 
Managing The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing StorageManaging The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing StorageDell World
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3Parviz Vakili
 

Similar to Enterprise Data and Analytics Architecture Overview for Electric Utility (20)

DRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
DRAFT - Enterprise Data and Analytics Architecture Overview for Electric UtilityDRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
DRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.Introduction to DDS: Context, Information Model, Security, and Applications.
Introduction to DDS: Context, Information Model, Security, and Applications.
 
Managing The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing StorageManaging The Data Deluge By Optimizing Storage
Managing The Data Deluge By Optimizing Storage
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Enterprise Data and Analytics Architecture Overview for Electric Utility

  • 1. Enterprise Data and Analytics Architecture Overview for Electric Utility Dr. Prajesh Bhattacharya enSustain Copyright enSustain
  • 4. Data Warehouse (ONLY data required for high performing production reporting) Enterprise Nomenclature Data Lake (ALL data) Enterprise Nomenclature Data Translation Layer (MDMS/EI) Data Translation Layer (MDMS/EI) DMS & OMS Customer Data Smart Meter (metadata, readings) Asset Data (Location, config) Financial Data Data Historian SCADA DG metadata DG generation data HR Data Usual EDW process and structure Usual EDW process and structure Discovery and indexing/Tagging Discovery and indexing/Tagging Production Reports Projects Data Explorers (Engineers, Data scientists) Weather Data Misc. Sensor head ends Security Data Transmission Planning Data Maintenance Data Demand Response Data Transmission OE and Dispatch Data EMS Transmission Market Data IT Asset Ops Data IT Support Data Project Documents Marketing & Sales Data Catch-All Other Applications Email & chat logs Facility Data Fleet DataCopyright enSustain
  • 5. Possible Point-To-Point Exceptions Purpose oriented connection. Example: oHistorian facilitates connection with SCADA o EMS  SCADA connection is latency-sensitive Application requiring access to only one system oDMS applications running off of DMS data o Historian applications running off of Historian data Copyright enSustain
  • 7. The Approach for Implementation Main Challenges Siloed data Solution Part 1: Standard Data Model Solution Part 2: User view of unified data Lack of analytics ideas Solution: Close partnership between IT and business Lack of budget Solution Part 1: Tax each new project Solution Part 2: Take baby steps Copyright enSustain
  • 8. Necessary Condition for Success • At the beginning, implement the new mechanism, ONLY to serve the new requirements • Keep the existing connections working and unaffected • Eventually, some of the existing connections will be deemed not-required, by the business • The rest of the existing connections can be converted as part of application maintenance/overhaul/upgrade, but not in the beginning phase of the initiative Do NOT touch the existing and the working systems first • Do not try to implement all the necessary new components at once. • Good quality on small scope is better than mediocre quality on large scope. • It might require more overhead, but it is often worth. Scope the smallest possible piece and do it well Copyright enSustain
  • 9. Possible Steps for Implementation A new data connectivity requirement comes in Identify the source system Define the enterprise nomenclature for the source system to align with industry standard Load MDMS/EI with the dictionary Configure EI to act as the data virtualization layer for the source system Release for production use with appropriate support mechanism Milestone: One project is now using this new mechanism for one source system Repeat 1 for every new data connectivity request As more source systems are brought into the scope, resolve discrepancies, if any arises The virtualization layer might experience performance issue as data load increases Research and Plan the Data Lake For every new data source implementation for the virtualization, implement the corresponding ETL for Data Lake Open the Data lake to users that prefer getting their data from the Data Lake (delayed but faster) over virtualization Implement Data Lake Analytics (say ML based on Spark) for a single use case Copyright enSustain
  • 10. 3 MDMS, EI, Data Virtualization, Data Warehouse Copyright enSustain
  • 11. Skip This Section Most utilities already use these systems and are familiar with them Copyright enSustain Hence, we will not discuss them For specific questions, please contact prajesh@ensustain.com
  • 13. Why the Data Lake? • Some of the SOR systems might not be capable of handling as much data request • Access to some of the SOR systems might not be practical • Implementation of data quality check on virtualized data is hard (at the least, it would slow down queries) • Data travel over network: larger in a virtualized environment than in a Data Lake designed and used in a specific way • Bottom line: go for Data Lake only if it is foreseen to be needed If the MDMS/EI layer virtualizes the data, then access to standardized data across the enterprise is already established. What additional value does the Data Lake bring? Data Lake – not the immediate need, but the eventual destination Copyright enSustain
  • 14. Data Lake: Market Offering Landscape Copyright enSustain
  • 15. Data Lake: Getting Data Into the Lake Copyright enSustain HDFS Enterprise Data  Shared across the company based on security policy  Fully managed and maintained  Tight SLA  100% Enterprise taxonomy based tagging User data  Results of ad-hoc analyses  Some maintenance/control/SLA  Folksonomy based tagging Project/Group Data  Enterprise standards might be too restrictive to fulfill the requirements of the project  Shared among a handful of users  Medium maintenance/control/SLA  Folksonomy + some governance Data Governance Tagging Tool MDMS Data Loader Streaming Data Manual Data D A T A S O U R C E S
  • 16. Hadoop Ecosystem Relevant To Utility Copyright enSustain H D F S YARN Map-Reduce Application in Java Hive Spark Streaming Spark SQL Spark ML Sqoop Oozie Falcon Hadoop native client Storm QueryIO Waterline Data Attivio Apache Atlas FUNCTIONALITY COLOR LEGEND Data Loading Job Management Data Governance Data Reading Map-Reduce Data Storage Data Lake Vendor Solutions
  • 18. Taxonomy 3 Taxonomy 2 Taxonomy 1 The Analytics Tool Landscape Analytics Tools Production Data Write- back Read-only Project (semi- production) Data Write- back Read-only Ad-hoc Data Write- back Read-only Analytics Tools Managed (Server based) Unmanaged (Desktop based) Analytics Tools Coding heavy Configuration heavy Copyright enSustain
  • 19. Sample Analytics Opportunities … Copyright enSustain
  • 21. References • http://ceur-ws.org/Vol-1497/PoEM2015_ShortPaper4.pdf • http://smartgrid.epri.com/doc/Utility%20Enterprise%20Architecture%20Best%20Practices%20- %20webcast.pdf • http://www.navigantresearch.com/wordpress/wp-content/uploads/2011/10/SGEA-11-Brochure.pdf • http://www.gridwiseac.org/pdfs/forum_papers/114_127_paper_final.pdf • http://www.iec.ch/smartgrid/standards/ • https://www.boozallen.com/content/dam/boozallen/documents/Data_Lake.pdf • Data Warehousing in the Age of Big Data, Krish Krishnan • https://es.slideshare.net/hortonworks/hortonworks-and-waterline-data-webinar • http://www.ibmbigdatahub.com/blog/charting-data-lake-rethinking-data-models-data-lakes • https://www.slideshare.net/fabien_gandon/ontologies-in-computer-science-and-on-the-web • http://www.ibmbigdatahub.com/blog/charting-data-lake-rethinking-data-models-data-lakes • https://upside.tdwi.org/articles/2016/03/23/data-lake-become-swamp-1.aspx • Many other sources • Indigenous experiments • Real-world experience Copyright enSustain