SlideShare a Scribd company logo
1 of 16
Sponsored by:
BDW Meetup:
Big MDM Entity Resolution,
Matching and Linking in Hadoop
6:30 Networking
Grab some food and drink... Make some friends.
6:45 Joe Caserta
President
Caserta Concepts
Welcome + Intro to Big MDM
About the Meetup
7:15 George Corugedo,
CTO
RedPoint Global
Introduction and Overview of RedPoint
Demo of Customer Matching on Hadoop
8:00 Q&A Ask Questions, Share your experience
8:15 More Networking
Don’t leave until you make at least one new Data Nerd friend!
Agenda
• Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Founded by Caserta Concepts
• November 10, 2012
• Next BDW Meetup:
• March 3rd
• Topic: Graph Databases for MDM
• Location: NWC
About the BDW Meetup #BDWmeetup
#maximizeDataValue
@CasertaConcepts
@RedPointGlobal
Top 20 Big Data
Consulting - CIO Review
Launched Big Data practice
Co-author, with Ralph Kimball, The
Data Warehouse ETL Toolkit (Wiley)
Dedicated to Data Warehousing,
Business Intelligence since 1996
Began consulting database
programing and data modeling 25+ years hands-on experience
building database solutions
Founded Caserta Concepts in NYC
Web log analytics solution published
in Intelligent Enterprise
Formalized Alliances / Partnerships –
System Integrators
Partnered with Big Data vendors
Cloudera, Hortonworks, IBM, Cisco,
Datameer, Basho more…
Launched Training practice, teaching
data concepts world-wide
Laser focus on extending Data
Warehouses with Big Data solutions
1986
2004
1996
2009
2001
2010
2013
Launched Big Data Warehousing
(BDW) Meetup-NYC ~1500 Members
2012
2014
Established best practices for big
data ecosystem implementation –
Healthcare, Finance, Insurance
Top 20 Most Powerful
Big Data consulting firms
Dedicated to Data Governance
Techniques on Big Data (Innovation)
Caserta Timeline
About Caserta Concepts
• Award-winning technology innovation consulting with
expertise in:
• Big Data Solutions
• Data Warehousing
• Business Intelligence
• Core focus in the following industries:
• eCommerce / Retail / Marketing
• Financial Services / Insurance
• Healthcare / Ad Tech / Higher Ed
• Established in 2001:
• Increased growth year-over-year
• Industry recognized work force
• Strategy, Implementation
• Writing, Education, Mentoring
• Data Science & Analytics
• Cloud Computing
• Data Interaction & Visualization
Does this word cloud excite you?
Speak with us about our open positions: leslie@casertaconcepts.com
Help Wanted
Spark
Big Data Architect NoSQL
EC2,EMR,Redshift
Data
User
Interface
Services
WorkflowRules
Security
Members Providers
Agents Plans
Policies
Consistent Policy
Enforcement and Security
Integration with exiting
ecosystem
Data Governance through
Workflow Management
Data Quality enforcement
through metadata-driven
rules
Time-Variant Hierarchies
and attributes
High Performance,
Flexible, Scalable
Database – Think Graph!
Master Data Management Components
How MDM Works
Standardization
Matching
Survivorship
Validation
Publication
Staging
Library
Consolidated
Library
Standardization Matching
Integrated
Library
Survivorship
Source ID Name Home Address Birth Date SSN
SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789
SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789
SYS C XYZ James Stag NULL 8/20/1959 NULL
Source ID Name Home Address Birth Date SSN Std Name Std Addr MDM ID
SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789 James Stagnitto 123 Main Street 1
SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789 James Stagnitto 132 Main Street 1
SYS C XYZ James Stag NULL 8/20/1959 NULL James Stag NULL 1
MDM ID Name Home Address Birth Date SSN
1 James Stagnitto 123 Main Street 8/20/1959 123-45-6789
Mastering Data
Validation
Informational
Master Data
MDM Information Ecosystem
10
Operational
Master Data
Holistic
Master Data
Service
Leads
Policies
Claims
Enrolls
Sales
Finance
DW
Dimensions &
Cross-References
Marketing
Insights
11
Traditional Approaches to MDM
• Registry
• Transactional
• Co-Existence
App A App B
App A App BSingle Version of
the Truth
App A App BNon-Sensitive
Shared
Sensitive
App
New Master Data Management
 Traditional MDM will do depending on your data size and
requirements:
 Relational is awkward, extreme normalization, poor usability and
performance
NoSQL stores like HBase has benefits
 If you need super high performance low millisecond response
times to incorporate into your Big Data ETL
 Flexible Schema
 Graph database is near perfect fit. Relationships and graph
analysis bring master data to life!
Data quality and matching processes are required
Little to no community or vendor support
Achievable in Hadoop with YARN
MDM is best represented in a Graph database
Online
Phone
In Store
Levenshtein, others
Hadoop
Mastered Customer Design Approach
Technical Approach to Big MDM
Application
Integration
Data
Integration
Messaging Interface
(Kafka, Storm, Spark Streaming, etc.)
Bulk Interface
(Sqoop, Spark,
Map Reduce, etc.)
Customer Integration Service
(Hadoop, NoSQL)
Decision Support
(Hive, Impala, SQL
MPP, others)
Thank You
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
(914) 261-3648
@joe_Caserta
Free Stuff!!!
RAFFLE!!!

More Related Content

Viewers also liked

Architecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsArchitecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsAmazon Web Services
 
BVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige startBVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige startThierry Debels
 
Boston Devops Meetup June 22nd
Boston Devops Meetup June 22ndBoston Devops Meetup June 22nd
Boston Devops Meetup June 22ndmdilawari
 
Why choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOMWhy choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOMAnil Gupta (AJ) - vExpert
 
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin CenterDeploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin CenterWASdev Community
 
Pre-Con Ed: Learn What's New in CA Spectrum®
Pre-Con Ed: Learn What's New in CA Spectrum®Pre-Con Ed: Learn What's New in CA Spectrum®
Pre-Con Ed: Learn What's New in CA Spectrum®CA Technologies
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti Technologies Ltd
 
Dataiku pig - hive - cascading
Dataiku   pig - hive - cascadingDataiku   pig - hive - cascading
Dataiku pig - hive - cascadingDataiku
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...VMware Tanzu
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Nano Server First Step
Nano Server First StepNano Server First Step
Nano Server First StepKazuki Takai
 
Security at Scale with AWS - AWS Summit Cape Town 2017
Security at Scale with AWS - AWS Summit Cape Town 2017 Security at Scale with AWS - AWS Summit Cape Town 2017
Security at Scale with AWS - AWS Summit Cape Town 2017 Amazon Web Services
 
Fracture du pied chez l'enfant
Fracture du pied chez l'enfantFracture du pied chez l'enfant
Fracture du pied chez l'enfantAyoub EL KADDOURI
 
Building an ai with raspberry pi
Building an ai with raspberry piBuilding an ai with raspberry pi
Building an ai with raspberry piHaesung Lee
 

Viewers also liked (15)

Architecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsArchitecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi Accounts
 
BVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige startBVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige start
 
Boston Devops Meetup June 22nd
Boston Devops Meetup June 22ndBoston Devops Meetup June 22nd
Boston Devops Meetup June 22nd
 
Why choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOMWhy choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOM
 
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin CenterDeploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
 
Pre-Con Ed: Learn What's New in CA Spectrum®
Pre-Con Ed: Learn What's New in CA Spectrum®Pre-Con Ed: Learn What's New in CA Spectrum®
Pre-Con Ed: Learn What's New in CA Spectrum®
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
 
Dataiku pig - hive - cascading
Dataiku   pig - hive - cascadingDataiku   pig - hive - cascading
Dataiku pig - hive - cascading
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Nano Server First Step
Nano Server First StepNano Server First Step
Nano Server First Step
 
Cloud developer evolution
Cloud developer evolutionCloud developer evolution
Cloud developer evolution
 
Security at Scale with AWS - AWS Summit Cape Town 2017
Security at Scale with AWS - AWS Summit Cape Town 2017 Security at Scale with AWS - AWS Summit Cape Town 2017
Security at Scale with AWS - AWS Summit Cape Town 2017
 
Fracture du pied chez l'enfant
Fracture du pied chez l'enfantFracture du pied chez l'enfant
Fracture du pied chez l'enfant
 
Building an ai with raspberry pi
Building an ai with raspberry piBuilding an ai with raspberry pi
Building an ai with raspberry pi
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Big MDM: Entity Resolution, Matching and Linking in Hadoop

  • 1. Sponsored by: BDW Meetup: Big MDM Entity Resolution, Matching and Linking in Hadoop
  • 2. 6:30 Networking Grab some food and drink... Make some friends. 6:45 Joe Caserta President Caserta Concepts Welcome + Intro to Big MDM About the Meetup 7:15 George Corugedo, CTO RedPoint Global Introduction and Overview of RedPoint Demo of Customer Matching on Hadoop 8:00 Q&A Ask Questions, Share your experience 8:15 More Networking Don’t leave until you make at least one new Data Nerd friend! Agenda
  • 3. • Big Data is a complex, rapidly changing landscape • We want to share our stories and hear about yours • Great networking opportunity for like minded data nerds • Founded by Caserta Concepts • November 10, 2012 • Next BDW Meetup: • March 3rd • Topic: Graph Databases for MDM • Location: NWC About the BDW Meetup #BDWmeetup #maximizeDataValue @CasertaConcepts @RedPointGlobal
  • 4. Top 20 Big Data Consulting - CIO Review Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Dedicated to Data Warehousing, Business Intelligence since 1996 Began consulting database programing and data modeling 25+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise Formalized Alliances / Partnerships – System Integrators Partnered with Big Data vendors Cloudera, Hortonworks, IBM, Cisco, Datameer, Basho more… Launched Training practice, teaching data concepts world-wide Laser focus on extending Data Warehouses with Big Data solutions 1986 2004 1996 2009 2001 2010 2013 Launched Big Data Warehousing (BDW) Meetup-NYC ~1500 Members 2012 2014 Established best practices for big data ecosystem implementation – Healthcare, Finance, Insurance Top 20 Most Powerful Big Data consulting firms Dedicated to Data Governance Techniques on Big Data (Innovation) Caserta Timeline
  • 5. About Caserta Concepts • Award-winning technology innovation consulting with expertise in: • Big Data Solutions • Data Warehousing • Business Intelligence • Core focus in the following industries: • eCommerce / Retail / Marketing • Financial Services / Insurance • Healthcare / Ad Tech / Higher Ed • Established in 2001: • Increased growth year-over-year • Industry recognized work force • Strategy, Implementation • Writing, Education, Mentoring • Data Science & Analytics • Cloud Computing • Data Interaction & Visualization
  • 6. Does this word cloud excite you? Speak with us about our open positions: leslie@casertaconcepts.com Help Wanted Spark Big Data Architect NoSQL EC2,EMR,Redshift
  • 7. Data User Interface Services WorkflowRules Security Members Providers Agents Plans Policies Consistent Policy Enforcement and Security Integration with exiting ecosystem Data Governance through Workflow Management Data Quality enforcement through metadata-driven rules Time-Variant Hierarchies and attributes High Performance, Flexible, Scalable Database – Think Graph! Master Data Management Components
  • 9. Staging Library Consolidated Library Standardization Matching Integrated Library Survivorship Source ID Name Home Address Birth Date SSN SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789 SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789 SYS C XYZ James Stag NULL 8/20/1959 NULL Source ID Name Home Address Birth Date SSN Std Name Std Addr MDM ID SYS A 123 Jim Stagnitto 123 Main St 8/20/1959 123-45-6789 James Stagnitto 123 Main Street 1 SYS B ABC J. Stagnitto 132 Main Street 8/20/1959 123-45-6789 James Stagnitto 132 Main Street 1 SYS C XYZ James Stag NULL 8/20/1959 NULL James Stag NULL 1 MDM ID Name Home Address Birth Date SSN 1 James Stagnitto 123 Main Street 8/20/1959 123-45-6789 Mastering Data Validation
  • 10. Informational Master Data MDM Information Ecosystem 10 Operational Master Data Holistic Master Data Service Leads Policies Claims Enrolls Sales Finance DW Dimensions & Cross-References Marketing Insights
  • 11. 11 Traditional Approaches to MDM • Registry • Transactional • Co-Existence App A App B App A App BSingle Version of the Truth App A App BNon-Sensitive Shared Sensitive App
  • 12. New Master Data Management  Traditional MDM will do depending on your data size and requirements:  Relational is awkward, extreme normalization, poor usability and performance NoSQL stores like HBase has benefits  If you need super high performance low millisecond response times to incorporate into your Big Data ETL  Flexible Schema  Graph database is near perfect fit. Relationships and graph analysis bring master data to life! Data quality and matching processes are required Little to no community or vendor support Achievable in Hadoop with YARN MDM is best represented in a Graph database
  • 14. Technical Approach to Big MDM Application Integration Data Integration Messaging Interface (Kafka, Storm, Spark Streaming, etc.) Bulk Interface (Sqoop, Spark, Map Reduce, etc.) Customer Integration Service (Hadoop, NoSQL) Decision Support (Hive, Impala, SQL MPP, others)
  • 15. Thank You Joe Caserta President, Caserta Concepts joe@casertaconcepts.com (914) 261-3648 @joe_Caserta

Editor's Notes

  1. Workflow: OpenSymphony Rules: Drools Database: Neo4j Interface: Cytoscape
  2. Describe the workings of a full-feature MDM – near-real-time operational features, batch features Extensibility into other forms of master data: i.e.: Product, etc.