How would you go about creating an enterprise data and analytics architecture for electric utility that 1) will be relevant in the long run, 2) will be easy to implement and 3) will start bringing value to the organization fairly quickly? What will be the components? Who will be the users? The operation of electric utility will change significantly by 2025. How will you future-proof the architecture?
4. Data Warehouse
(ONLY data required for high
performing production
reporting)
Enterprise Nomenclature
Data Lake
(ALL data)
Enterprise Nomenclature
Data Translation Layer (MDMS/EI)
Data Translation Layer (MDMS/EI)
DMS & OMS
Customer Data
Smart Meter
(metadata, readings)
Asset Data
(Location, config)
Financial
Data
Data Historian
SCADA
DG metadata
DG generation
data
HR Data
Usual EDW
process and
structure
Usual EDW
process and
structure
Discovery and
indexing/Tagging
Discovery and
indexing/Tagging
Production
Reports
Projects Data Explorers
(Engineers, Data scientists)
Weather Data
Misc. Sensor
head ends
Security
Data
Transmission
Planning Data
Maintenance
Data
Demand
Response Data
Transmission OE
and Dispatch Data
EMS
Transmission
Market Data
IT Asset Ops Data
IT Support Data
Project
Documents
Marketing &
Sales Data
Catch-All Other
Applications
Email &
chat logs
Facility
Data
Fleet
DataCopyright enSustain
5. Possible Point-To-Point Exceptions
Purpose oriented connection. Example:
oHistorian facilitates connection with SCADA
o EMS SCADA connection is latency-sensitive
Application requiring access to only one system
oDMS applications running off of DMS data
o Historian applications running off of Historian data
Copyright enSustain
7. The Approach for Implementation
Main Challenges
Siloed data
Solution Part 1:
Standard Data
Model
Solution Part 2:
User view of unified
data
Lack of analytics
ideas
Solution:
Close partnership
between IT and
business
Lack of budget
Solution Part 1:
Tax each new
project
Solution Part 2:
Take baby steps
Copyright enSustain
8. Necessary Condition for Success
• At the beginning, implement the new mechanism, ONLY to serve the new
requirements
• Keep the existing connections working and unaffected
• Eventually, some of the existing connections will be deemed not-required, by the
business
• The rest of the existing connections can be converted as part of application
maintenance/overhaul/upgrade, but not in the beginning phase of the initiative
Do NOT touch the existing and the working systems first
• Do not try to implement all the necessary new components at once.
• Good quality on small scope is better than mediocre quality on large scope.
• It might require more overhead, but it is often worth.
Scope the smallest possible piece and do it well
Copyright enSustain
9. Possible Steps for Implementation
A new data
connectivity
requirement
comes in
Identify the
source system
Define the
enterprise
nomenclature
for the source
system to align
with industry
standard
Load
MDMS/EI with
the dictionary
Configure EI to
act as the data
virtualization
layer for the
source system
Release for
production
use with
appropriate
support
mechanism
Milestone: One project is now using this new mechanism for one source system
Repeat 1 for every new
data connectivity
request
As more source systems
are brought into the
scope, resolve
discrepancies, if any
arises
The virtualization layer
might experience
performance issue as
data load increases
Research and Plan the
Data Lake
For every new data source
implementation for the
virtualization, implement
the corresponding ETL for
Data Lake
Open the Data lake to
users that prefer getting
their data from the Data
Lake (delayed but faster)
over virtualization
Implement Data Lake
Analytics (say ML based
on Spark) for a single use
case
Copyright enSustain
10. 3
MDMS, EI, Data Virtualization,
Data Warehouse
Copyright enSustain
11. Skip This Section
Most utilities already use these systems and are familiar with them
Copyright enSustain
Hence, we will not discuss them
For specific questions, please contact prajesh@ensustain.com
13. Why the Data Lake?
• Some of the SOR systems might not be capable of handling as
much data request
• Access to some of the SOR systems might not be practical
• Implementation of data quality check on virtualized data is
hard (at the least, it would slow down queries)
• Data travel over network: larger in a virtualized environment
than in a Data Lake designed and used in a specific way
• Bottom line: go for Data Lake only if it is foreseen to be needed
If the MDMS/EI layer virtualizes the data, then access to standardized data across the
enterprise is already established.
What additional value does the Data Lake bring?
Data Lake – not the immediate need, but the eventual destination
Copyright enSustain
15. Data Lake: Getting Data Into the Lake
Copyright enSustain
HDFS
Enterprise Data
Shared across the company based on
security policy
Fully managed and maintained
Tight SLA
100% Enterprise taxonomy based tagging
User data
Results of ad-hoc analyses
Some maintenance/control/SLA
Folksonomy based tagging
Project/Group Data
Enterprise standards might be too
restrictive to fulfill the requirements of the
project
Shared among a handful of users
Medium maintenance/control/SLA
Folksonomy + some governance
Data
Governance
Tagging
Tool
MDMS
Data
Loader
Streaming
Data
Manual
Data
D
A
T
A
S
O
U
R
C
E
S
16. Hadoop Ecosystem Relevant To Utility
Copyright enSustain
H
D
F
S
YARN
Map-Reduce Application in Java
Hive
Spark Streaming
Spark SQL
Spark ML
Sqoop
Oozie Falcon
Hadoop native client
Storm
QueryIO
Waterline
Data
Attivio
Apache
Atlas
FUNCTIONALITY
COLOR LEGEND
Data Loading
Job Management
Data Governance
Data Reading
Map-Reduce
Data Storage
Data Lake
Vendor Solutions
18. Taxonomy 3
Taxonomy 2
Taxonomy 1
The Analytics Tool Landscape
Analytics
Tools
Production
Data Write-
back
Read-only
Project (semi-
production)
Data Write-
back
Read-only
Ad-hoc
Data Write-
back
Read-only
Analytics Tools
Managed
(Server based)
Unmanaged
(Desktop based)
Analytics
Tools
Coding heavy
Configuration
heavy
Copyright enSustain