More Related Content Similar to IBM Z for the Digital Enterprise - IBM Z Open Data Analytics (20) More from DevOps for Enterprise Systems (20) IBM Z for the Digital Enterprise - IBM Z Open Data Analytics1. David Rice
IzODA Chief Iteration Manager & Technical Lead of Scale Adoption
drice@us.ibm.com
October 2018
IBM Open Data Analytics for z/OS: z Conference
2. © 2017 IBM Corporation
2
Trends in the industry: Increasing focus on Real Time
Ø Pervasiveness of Analytics
Ø Business growth
Ø Risk Mitigation
Ø Need for Real-Time
Ø Insight at point of impact
Source & Full Forrester paper: https://www-03.ibm.com/systems/z/solutions/real-time-analytics/data-analysis.html
3. © 2017 IBM Corporation
3
z/OS
• DB2, IMS, VSAM
• Transactional
Data from
Operational
Systems
• History Data
• Warehouses
Mobile
Chat
Call
Center
Social / Public
Data Scientist
Distributed
• Warehouses
• ODS
• Client Facing Apps
• Departmental
Datamarts
Ø Data / Analytic Currency
Ø Increased security,
governance, privacy risk
Ø Longer ROI for analytic
insights
Ø Added development costs
Ø Data coherency of the lake
Ø Ability to quickly adapt to
suit analytical needs (new
data sources, schemas,
freshness, etc.)
Today’s Typical Current State: migrate all endpoint data to a data ‘lake’, then analyze
• Using an ETL-only approach results in costly side-effects: risk, reduced efficiency and missed opportunity
Challenges
4. © 2017 IBM Corporation
4
Where do enterprise transactions & data originate?
Data Gravity: Co-locate analytics with data based on value,
volume, rate of change, security…
92 of world’s top 100
banks
10 out of the top 10
insurance organizations
87% of all credit card
transactions and nearly
$8 trillion payments a
year
More than 30 billion
transactions a day,
more than number of
Google searches
64% of Fortune 500 80% of world’s corporate
data
5. © 2017 IBM Corporation
5
Use Cases Well-Aligned with Analytics on IBM Z
Predominance of data
originates on IBM Z,
z/OS (transactions,
member info,…)
Data volume is large,
distilling data
provides operational
efficiencies
Real-time / near real-
time insights are
valuable
Performance matters
for variety of data on
and off IBM Z
Core transactional
systems of record ae
on IBM Z
Data Gravity
Security / data privacy
needs to be preserved
Podcast: http://www.ibmbigdatahub.com/podcast/making-data-simple-what-data-gravity
6. © 2017 IBM Corporation
6
Cross Industry Use Case: Modernization, Data Exploration, Hybrid Integration
DB2
z/OS
z/OS
Result Store:
• Frequent
Refresh
• Ease of
Integration
• TCO
advantage
VSAM IMS Hadoop
• Easily blend data from Z and non-Z
• Limit data movement
• Enrich reporting and ad-hoc queries
• Leverage modern, open technologies, skill
Warehouses
Optimized Data Layer
Dashboards, Spreadsheets
Examples: Cognos, Tableau
Ø More current data leveraged across entire infrastructure
Ø Reduced raw data movement costs
Ø Security & data privacy advantages
IBM Open Data Analytics for z/OS
Existing Data Lakes
Business Interfaces
Cloud Platforms
StandardInterfaces
7. © 2017 IBM Corporation
7
Insurance: Real-Time State of the Business Views
Real-Time Insights
Value: Real-time visualization of state of the business across clients, industries, geographies, products, etc. to determine
profitability, risk assessment, etc. Potential to have current view along with 15-30-60-90 day views for trend analysis
How: Leverage analytics of data in place across various systems, using both internal & external sources
Client 1:
• Life insurance coverage
• Accident coverage
Client 2:
• Vision Coverage
• Accident Risk
Client 3:
• Dental Coverage
• Home coverage
Client 4:
• Disability coverage
• Life Insurance covergae
ProfitabilityView
Activity View
weather
geopolitical
By Industry, product
8. © 2017 IBM Corporation
8
Use Case - Banking: Enhanced Card Fraud Detection
Existing Rules Engine
• Apply in-house rules for detect
• Invoke 3rd party scores (FICO)
• Apply custom scoring
• Determine Disposition
IBM z/OS
VSAMDB2 IMS
Core Card Process
• Verify, augment data
• Manage workload
• Ensure scale
• Likely: CICS, IMS
Today: Models refreshed periodically, deployment path requires custom coding
Challenge: Emerging fraud pattern detection delayed, model deployment & refresh not agile
Benefit: Current data for modeling, intra-day model refresh, flexibility to add new data via configuration
Point of
sale
systems
ETL
Warehouse
Warehouse
DB2 IMS VSAM
Real Time Analytics: leverage in-place current
access to variety of data sources
• Create Models
• Apply Data Science
• Refresh Models
• Schedule
Deployment
Coding
Deploy
IBM z/OS
9. © 2017 IBM Corporation
9
Example: Real-Time ACH Analytics for Banking Clients
ACH Processing:
• ACH Payment origination & receipt
• Interaction with Automated Clearing House
verification
• Implementation of NACHA rules
• Defined data formats for exchange of info
IBM z/OS
ACH format
ACH format
ACH format
“All Items”: ACH, POS,
WEB, etc
Batch
Posting
Process
Future:
Real Time
Process
Real-Time Insights
Real-Time Analytics
• Real-time payment and
ACH analytics on RT
payments
• Increased granularity of
compliance / risk / fraud
analytics
• Integration across ACH
and core banking systems
Today: Largely post processed, multi-day verification of ACH rejects, fraud / risk assessment, delay in insights
Challenge: Same-day payments creates requirement to address rejects, fraud immediately, in real-time scope
Benefit: In-place, real-time analytics of ACH data for compliance / fraud risk to address same-day payments, accessing
source data as well as off platform data via federation
1
Warehouse
10. © 2017 IBM Corporation
10
DB2 z/OS IMS VSAM
z/OS
Optimized
Analytics
Runtime
Enterprise Data
Environments
Ø Leverage most current data, in
place
Ø Flexible structure, rich analytics
runtime co-located data
Ø TCO advantages
Ø Leverage leading open source
technologies & skills
Ø Enable advanced solutions
from IBM and partners
Ø Integrate and differentiate
Apache Spark for
z/OS
Python / Anaconda
Open Source stack
Optimized Data Layer
z/OS
WarehousesHadoop
Distributed
IBM Machine
Learning for
z/OS
Solutions from
SIs & Business
Partners
Other IBM based
solutions &
Client Solutions
Solutions
Example: Federated Analytics, Access to Wide Variety of Data: Modernization, Exploration, Integration2
Optimized Data Layer: Integrated Access to DB2, IMS, IMS raw read , VSAM, PS, PDSE, ADABAS,
IDMS, CICS Queues, Virtual Tape, SMF, Syslog, Oracle Enterprise, Teradata, HDFS… etc
11. © 2017 IBM Corporation
11
Abstracted
access to z/OS
Data
} from VSAM
} from DB2
Modern Analytic Frameworks &
Tools
3
12. © 2017 IBM Corporation
12
Value: Reduce Risk à via Simplified Data Privacy via Configuration
Cust_ID Avg
Daily TX
Education Education
Group
Social Security
Number
Investment Avg TX
AMT
Churn Label Age
1009530860 3.9145 2 BS 123-84-9015 114368 2090.32 N 84
1009544000 4.28 2 BS 122-49-3821 90298 2095.04 N 44
1009534260 1.23 2 BS 931-29-0612 94881 1723.59 Y 23
1009574010 0.95 2 BS 491-19-2102 112099 1297.41 Y 24
1009578620 2.73 5 DR 813-90-4183 84638 1333.18 N 67
Features FeaturesNot Feature Not Feature, PII
Cust_ID Avg
Daily TX
Education Education
Group
Investment Avg TX
AMT
Churn Label Age
1009530860 3.9145 2 BS 114368 2090.32 N 84
1009544000 4.28 2 BS 90298 2095.04 N 44
1009534260 1.23 2 BS 94881 1723.59 Y 23
1009574010 0.95 2 BS 112099 1297.41 Y 24
1009578620 2.73 5 DR 84638 1333.18 N 67
View of Table Visible to Data Scientists
Original Table
Sensitive Data
– View presented to
data science
teams can be
different than
original
– Via UI
configuration,
obfuscate or
remove select
columns
– Configure for
varying levels of
access based on
PII designations
– Flexibility for data
protection
4
13. © 2017 IBM Corporation
13
Apache Spark z/OS: Cost Efficiency & Powerful Data-in-Place Analytics
§ Spark on z/OS joins multiple data types for fast,
complete analytics, without moving the data
§ Test of >350M rows read, parsed, analyzed, and
summarized (approx. 60gig)
§ Average Spark processing times – average of 3
minutes on a single z13 LPAR with 1 GP, 13 zIIPS
and 512Gb memory:
– DB2: 2.35 minutes (4.1 mins.
maximum)
– Flat File: 2.95 minutes (3.2 mins. Maximum)
– VSAM: 2.80 minutes (3.3 mins. Maximum)
DB2
z/OS
Flat file
VSAM
z/OS
JDBC
JDBC
JDBC
88% zIIP
offload
97% zIIP
offload
97% zIIP
offload
Use Case: Large Data Pull --- bring back all 350Million rows from each data
source, touch each data element and run Spark aggregation across all data
Source: IBM Competitive Project Office
5
14. © 2017 IBM Corporation
14
Apache Spark z/OS: Cost Efficiency & Powerful Data-in-Place Analytics
Trade
166GB
Brokerage aggregation query
workload across Trades tables
from 3 exchanges (over 5
Billion trades, 500GB)
* 3-Year TCA includes 3-year US prices for Hardware, Software, Maintenance and
Support as of 05/16/2016. Price and performance for x86 environment includes cost of
ETL and elapsed time to transfer the data. This is based on an IBM internal study
designed to replicate a typical IBM customer workload usage in the marketplace.
z13-606 + 11 zIIPs
z13-605 Competitor x86 System
Intel E5-2697 v2 2.7GHz 12co
lower TCA*For systems compared67%
$2,105,990
(3 yr. TCA)
$697,106
(3 yr. TCA)
Linux
Apache
Spark
Parquet
z/OS
CICS
DB2
z/OS
CICS
DB2
Apache
Spark
ETL
15. © 2017 IBM Corporation
15
Minimizing Impact to Production6
Ø Current Challenges:
q Current status quo ETL processes consume GP MIPS, often run during batch window cycles that causes potential
issues for client batch workloads
q Analytics off platform that accesses z/OS data often goes through standard subsystem interfaces for DB2 & IMS,
interfering with bufferpools and resulting in lower zIIP eligibility
Ø Analytics on z/OS has unique features to minimize impact to production workloads:
1. Limit Analytic Workloads’ Access to resources via capping zIIPs & memory; leverage WLM classifications
2. Leverage Unique “Raw-Read” Features – avoid impact to IMS & DB2 subsystems, high zIIP eligibility
3. Leverage Unique DataFrame Store – separate well-formed analytics, persist result, enable off platform
ad-hoc analytics to DataFrame store
4. Analytic workloads are all read-only (no locks held)
17. © 2017 IBM Corporation
17
Ø Machine Learning and z Systems:
Ø https://www.youtube.com/watch?v=T2HtyNX7aHc
Ø Machine Learning Launch Event interview:
Ø https://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p
Ø Gaining Agility with Spark Analytics on z Systems
Ø https://www.youtube.com/watch?v=Y7HQbKBR_l4
Ø Youtube of IBM Edge Analytics Segment featuring State of California and Jack Henry Associates
Ø https://www.youtube.com/watch?v=ws9rLnXyb3g&feature=youtu.be (Analytics segment starts 26:25 into the video)
Ø IBM z/OS Platform for Apache Spark
Ø https://www-03.ibm.com/systems/z/os/zos/apache-spark.html
Ø IBM Knowledge Center: z/OS Platform for Apache Spark
Ø https://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.azk/azk.htm
Ø IBM Knowledge Center: IBM Machine Learning for z/OS
Ø https://www.ibm.com/support/knowledgecenter/SS9PF4_1.1.0/src/tpc/mlz_home.html
Ø Redbook: Apache Spark Implementation on IBM z/OS
Ø http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf
Ø IBM Machine Learning for z/OS Marketplace
Ø https://www.ibm.com/us-en/marketplace/machine-learning-for-zos
Useful Links
18. © 2017 IBM Corporation
18
Comments & Questions?