Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Oil and gas big data edition
1. Big Data and
The Informatica Platform
9/8/2015
David Ramirez
Senior Solution Architect
Oil and Gas Accounts
2. About Informatica
• Founded: 1993 INFA Nasdaq
• 2014 Revenue: $1.2b
• Partners: 450+
• Major SI, ISV, OEM and On-Demand
Leaders
• Customers: 5,000+
• > 70% of the Global 500
• Customers in 82 Countries
• Direct Presence in 26 Countries
• # 1 in Customer Loyalty Rankings (7
Years in a Row)
2
3. B2B Data Exchange
Informatica supports the
requirements of cross-organizational
data exchange, so users apply
familiar & trusted data integration
tools and techniques to the growing
practice of B2B data integration.
Cloud Data IntegrationEnterprise Data Integration
Complex Event Processing
Informatica received high praise for
its services from customers. For
deployments involving systems
monitoring use cases, Informatica
offers a five-day stand‐up of
RulePoint.
Ultra Messaging
In spite of the new entrants,
Informatica remains the market
leader in this highly demanding part
of the messaging market.
Data Quality Master Data Management
Application ILM
Proven Technology Leadership
3
4. Problem:
• Analytics teams spend most
of their time looking for and
preparing data not analyzing
it
• Impacts project delays, cost
overruns, missed
opportunities
Data Lake Solution
• A single place to manage the
supply and demand of data
• Converts raw big data into fit-
for-purpose, trusted, and
secure information
Intelligent Data Lake
Manage Supply & Demand of Data
5. 80% of the work in big data projects
is data intelligence
“I spend more than half my time
integrating, cleansing, and
transforming data without doing
any actual analysis.”
“80% of the work in any data
project is in cleaning the data”
“70% of my value is an ability
to pull the data, 20% of my
value is using data-science…”
Sources: (1) DJ Patil, Data Jujitsu; (2-3) Kandel, et al. Enterprise Data Analysis and Visualization: An
Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012
6. First Pilot(s)
Data
Warehouse
Optimization
Data
Discovery
Real-Time
Operational
Intelligence
Lower operational
IT costs
Big Data
Analytics
Operationalize
Big Data
Insights
Predictive
Maintenance
Lower Total
Cost of Care
Customer
X/Up-Sell
Public Safety
Fraud
Detection
Machine
Device, Cloud
Documents
and Emails
Relational,
Mainframe
Social Media,
Web Logs
DrivenbyITDrivenbyBusiness
Lower Infrastructure Cost Added Business Value
What’s Hadoop?
Intelligent Data Lake
Intelligent Data Lake
Platform for Big Data Projects
7. Informatica knows the Data Lifecycle
Related Challenges
Source:- Gartner
Informatica
Platform
Data
Ingestion
Refinement
Mastery/
Delivery
Data
Security
Data
Retirement
• Data Quality
•Exception Management
• Any Platform, Appication
•Structured, Unstructured
•Any latency
• Master Data Management
• Data Integration Hub
• Data Archive
•Records Retention/Discovery
•Data Masking
8. Informatica Platform Overview
Relational
DB
.pdf,
email,
email
Dev
Test
Prod Archive
3. Analyze
1. Profile
2. Define
Targets
5. Monitor
4. Build
Rules
D
A
T
A
Q
U
A
L
I
T
Y
S
E
C
U
R
I
T
Y
E
T
L
M
D
M
MaterialsWellhead Customer
Customer
Customer
Wellhead
Wellhead
Materials
Materials
Databases
Unstructured
Data
Big Data
Cloud
Visualizations
9. Application Database Partner Data
SWIFT NACHA HIPAA …
Cloud Computing Unstructured
Data
Warehouse
Data
Migration
Test Data
Management
& Archiving
Master Data
Management
Data
Synchronization
B2B Data
Exchange
Data
Consolidation
The Informatica DI Platform
Comprehensive, Unified, Open and Economical platform
10. Data Sources Applications
Data
Warehouse
MDM /
PIM
Data Ingestion
Visualization
Data
Governance
Data Security
Archiving
Replication
Data Streaming
Change Data
Capture
Batch Load
Data
Virtualization
Event-Based
Processing
Data
Integration
Hub
Data
Integration &
Data Quality
Agile Analytics
Advanced
Analytics
Machine
Learning
Virtual Data
Machine
Data Management Data Delivery
Machine Device,
Cloud
Documents and
Emails
Relational, Mainframe
Social Media, Web
Logs
Mobile Apps
Visualization
& Analytics
Real-Time
Alerts
Batch Load
Pub / Sub
Data Service
Integrate &
Prepare
Loose Coupling &
Abstraction
12. Logical Data Objects
PRODUCT …CUSTOMER ORDER
Jumpstart/Accelerate Projects
Data SourceData SourceData Source
1 Instant Business-IT
Collaboration with Analyst Tool 2 Profile to Discover Data
Patterns and Issues
3
4
Prototype and Validate
Results
Data Source
Fine-tune and Deploy
Desired Solution in Days
Business
IT
IT
Business
Business IT
Business
IT
Common
Repository
Entire Life Cycle Supported by PowerCenter Standard Edition 9.
14. Scale-up As Your Needs Grow
14
IT
IT
IT
ITHigh
Availability
Pushdown
Optimization
Enterprise
Grid
Concurrent
Users
Partitioned
Data
IT
Included in PowerCenter Advanced Edition 9.6
15. 15
Manage Metadata for Better Data Insights
Data
Lineage
Consolidated
Metadata Catalog
Federated
Business Glossary
Mainframe Flat FilesDatabase Data Modeling BI ToolsERP
Metadata
Repository
Custom
Metadata
Reports
3rd party BI
Metadata
Bookmarks
16. 16
Common Biz Language Via Business Glossary
Provide a common
vocabulary of
business terms
Easily search for
glossary assets with
workflow
Manage
relationships with
other assets
Manage business
policies governing
the assets
Analyst
18. Improve Operational Confidence
With Automated Testing and Monitoring
18
End-to-End Agility
Requirements
Gathering
Prototype
& Validate
Deploy
IT
IT
Business
IT
IT
Business
Satisfied
Business-IT
Collaboration
Develop
Business
IT
IT
Self
Service
Monitor
IT
Test
IT
19. Automate Data Validation Testing
Data Validation Testing Capability
Enterprise Data
PowerCenter
Execute
Tests
DVO Repository
& Warehouse
ReportsDatabase
Views
Id: name
name: string
Price: integer
Date in: date
Date out: date
Salary: float
V_Summary
Id: name
name: string
Price: integer
Date in: date
Date out: date
Salary: float
V_Tests
Id: name
name: string
Price: integer
Date in: date
Date out: date
Salary: float
V_Results
Define
Tests
DVO Clients
Write
Results
Data
Accessed
• Relational databases
• Flat files
• Mainframe data
• DW Appliances
• Cloud-based data
20. Proactively Monitor with PowerCenter 9.6
20
PowerCenter
WS Hub
Send Alerts to
Stakeholders
Environnent
Information
Get Operating System,
Database Statistics
PowerCenter
Repository Automated Monitoring
and Detection
(Source Feeds, Rules/Templates, Watchlists, Alerts)
Analyst
IT
IT Operations
Analyst
Configure / Build
Rules
1
2
4
Get PowerCenter
Statistics
Monitor PowerCenter
Operations3
21. 1. Entire Informatica mapping
translated to optimal open source
project
2. Currently, MapReduce submitted to
Hadoop cluster.
3. Advanced mapping transformations
executed on Hadoop through User
Defined Functions using Vibe
MapReduce
UDF
Informatica on Hadoop
Informatica Execution on Hadoop Architecture
Flink
22. INFA’s Unified Platform = Strong Time-to-Value
“Informatica and Microsoft are so much more consistent than their competitors [because] the
platforms provided by these companies support transferable skills across projects more
flexibly than do their rivals.“
23. TCO – Informatica vs. Hand Coding
$8,500
$11,500
$0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
Informatica
Hand Coding
Average Costs (3-year TCO) per project per end point
24. 2.4
1
2.4
0.7
5.3
1.2
2.7
0.8
0 2 4 6
Hand coding
Informatica
Master Data management
Data Warehousing
Data Migration
Application Integration
Informatica is Far More Productive than Hand Coding
Source: “ Comparative Costs and Uses for Data Integration Platforms”
Bloor Research, March 2014 24
Average Time to Develop by Project Type (Weeks)
Depending on the project hand coding can take more than 4 weeks longer to
develop!
25. • Demo – Data Profiling on Hadoop
https://www.youtube.com/watch?v=Nd6UfuteiTY
Big Data – Data Profiling on Hadoop
25