SlideShare a Scribd company logo
1 of 17
Download to read offline
THREE
Big Data
CASE STUDIES
Great use cases of Big Data
Big Data Exploration
Find, visualize, understand all big
data to improve decision making
Enhanced 3600 View
of the Customer
Extend existing customer views
(CRM, etc) by incorporating
additional internal and external
information sources
Security/Intelligence Extension
Lower risk, detect fraud and
monitor cyber security in real-time
Data Warehouse Augmentation
Integrate big data and data
warehouse capabilities to increase
operational efficiency
Operations Analysis
Analyze a variety of machine
data for improved business results
• Greater efficiencies
in business
processes
• New insights from
combining and
analyzing data
types in new ways
• Develop new
business models
with resulting
increased market
presence and
revenue
Why Big Data
File Systems
Relational Data
Content Mgmt
Email
CRM
Supply Chain
ERP
RSS Feeds
Cloud
Custom SourcesDataViews
Applications/
Users
Atidan Approach
Implement a
Hadoop-
centric
reference
architecture
Move
enterprise
batch
processing to
Hadoop
Make Hadoop
the single
point of truth
Massively
reduce ETL by
transforming
within
Hadoop
Move results
and
aggregates
back to legacy
systems for
consumption
Retain, within
Hadoop,
source files at
the finest
granularity for
re-use
Top Criteria
• Allow users to use familiar consumption interfaces (web, mobile)
• Enable businesses to unlock previously unusable data
Unlock Big
Data
Simplify
Your
Warehouse
Preprocess
Raw Data
Ingest
BigData
ArchitectureHighlevel
Atidan Case Study
Usage Analysis using Hadoop
• Business Need
• A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs
• The logs received from IIS were stored in multiple files e.g. Daily logs
• The data had free text, it was unstructured and it also contained irrelevant data
• The exact analysis criteria/parameters/desired outcome were not pre-known
• Solution
• Traditional RDBMS could not handle the problem due to the type and volume of the data and the
uncertainty around ultimate analysis criteria
• Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily
• The solution was fault tolerant to data inconsistencies
• Hadoop provided elasticity to incremental data addition
• Scalability in the range of Peta Bytes
• Based on data size and complexity, the processing can be scaled from one node to 100 nodes
• Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage
in the project
• The organization got completely new and unexpected insights on employee, customer and vendor/partner
behavior
• Correlations between employee’s usage pattern and attrition as well as productivity were established
Atidan Case Study
Usage Analysis using Hadoop
0
2000
4000
6000
8000
10000
12000
14000
Accepted…
BadRequest…
Created(201)
Forbidden…
Not…
NotFound…
OK(200)
Unauthorise…
Request Types
0
200
400
600
800
1000
1200
January
March
May
July
September
November
January
March
May
July
September
November
2001 2002
Monthly Requests
0
200000
400000
600000
Amare
Amit
Bhagat
Mukesh
Praneel
Sanjog
Vimal
Users
• The size of data being collected
and analyzed in industry for
business intelligence (BI) is
growing rapidly making
traditional warehousing solution
prohibitively expensive
• Map Reduce is low level and
complex to write
• Hive provides high level query
language like SQL
• This allows for ad-hoc analysis
• Business need not know patterns
to look for in advance
Big Query - Hive
Atidan Case Study
Customer data collection (KYC) using Hadoop
• Business Need
• A financial institution had to periodically collect customer data
• Customers are very reluctant to provide updated data
• This customer data has to be cross-checked against the billions of transactions they receive per day
• They want to collate data that is available in public domain from known social media sites
• The data had free text, it was unstructured and it also contained irrelevant data
• Solution
• A graph database is constructed over the extracted social data to analyze transactions
• Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database
• Aggregate customer information from existing sources, social media, government sources
• Analyzed transaction to find hidden patterns
• Enable link analysis, risk monitoring
• Facilitate decision making(new products) and customer discovery
Atidan Case Study
Customer data collection (KYC) using Hadoop
Big Data Processing
Graph Database
Customer Clustering
Income/Expense changes
Corporate structure
changes
AML
Peer group analysis
Pattern Analysis
Customer InformationWeb
Social
Channel
Partners
Utility
Providers
Aadhar
UIDAI
• Lowers cost of follow-up with users
• Reduces loses by highlighting risky
users early
• Graph database based AML
• Insights into
• New products
• New customers
• New loans to existing customers
• New investment opportunities for
customers
• Reduces operational errors
• Traceability of data source
Advantages
of Hadoop (KYC) Solution to Banks
AML
Graph
Queries
Due
Diligence
Risk
Credit
Scoring
Mitigation
Analysis
Peer
groups
New
Prospects
Insights
New
Products
New
Customers
Atidan Case Study
Email scanning and categorization using MongoDB
Business Need
Retrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s
page for frontend access
The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM
Solution
Atidan proposed a MongoDB-Drupal based solution with the following approach:
• Scheduler was created to pull only headers from the all-user common webmail account
• Stored them into the intermediate Catalog in MongoDB
• Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered
records and saved into the final Catalog in MongoDB
• Emails from the final catalog pushed into the front end platform (Drupal)
Key Takeaways
• Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible
• The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
Atidan Case Study
Email scanning and categorization using MongoDB
• Node.js (data transformation)
• MongoDB (database)
• Schema-less
• RESTFUL service to access data from the browser
• Drupal (Frontend)
• Basic unit of data storage and transfer was JSON object
• Storage and querying
• NoSQL/Simple/Schema-less database
• Advantages
• highly scalable, very flexible, simple
• Connectivity
• node.js
 Server side Javascript
Technologies used
Thank you!
www.atidan.com
social@atidan.com

More Related Content

What's hot

Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Big Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewBig Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewPietro Leo
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Big data Presentation
Big data PresentationBig data Presentation
Big data PresentationAswadmehar
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sectorAnil Rana
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 

What's hot (20)

Big Data
Big DataBig Data
Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Big Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewBig Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of View
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data
Big DataBig Data
Big Data
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sector
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 

Similar to Three Big Data Case Studies

Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Empowering Businesses through Big Data Analytics
Empowering Businesses through  Big Data AnalyticsEmpowering Businesses through  Big Data Analytics
Empowering Businesses through Big Data AnalyticsOlha Hrytsay
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsDataWorks Summit
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Foundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .pptFoundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .pptRoshni814224
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013IBM Sverige
 

Similar to Three Big Data Case Studies (20)

Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Empowering Businesses through Big Data Analytics
Empowering Businesses through  Big Data AnalyticsEmpowering Businesses through  Big Data Analytics
Empowering Businesses through Big Data Analytics
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Bi orientations
Bi orientationsBi orientations
Bi orientations
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Foundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .pptFoundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .ppt
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Three Big Data Case Studies

  • 2. Great use cases of Big Data Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 3600 View of the Customer Extend existing customer views (CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Operations Analysis Analyze a variety of machine data for improved business results
  • 3. • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue Why Big Data File Systems Relational Data Content Mgmt Email CRM Supply Chain ERP RSS Feeds Cloud Custom SourcesDataViews Applications/ Users
  • 4. Atidan Approach Implement a Hadoop- centric reference architecture Move enterprise batch processing to Hadoop Make Hadoop the single point of truth Massively reduce ETL by transforming within Hadoop Move results and aggregates back to legacy systems for consumption Retain, within Hadoop, source files at the finest granularity for re-use Top Criteria • Allow users to use familiar consumption interfaces (web, mobile) • Enable businesses to unlock previously unusable data Unlock Big Data Simplify Your Warehouse Preprocess Raw Data Ingest BigData ArchitectureHighlevel
  • 5.
  • 6. Atidan Case Study Usage Analysis using Hadoop • Business Need • A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs • The logs received from IIS were stored in multiple files e.g. Daily logs • The data had free text, it was unstructured and it also contained irrelevant data • The exact analysis criteria/parameters/desired outcome were not pre-known • Solution • Traditional RDBMS could not handle the problem due to the type and volume of the data and the uncertainty around ultimate analysis criteria • Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily • The solution was fault tolerant to data inconsistencies • Hadoop provided elasticity to incremental data addition • Scalability in the range of Peta Bytes • Based on data size and complexity, the processing can be scaled from one node to 100 nodes • Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage in the project • The organization got completely new and unexpected insights on employee, customer and vendor/partner behavior • Correlations between employee’s usage pattern and attrition as well as productivity were established
  • 7. Atidan Case Study Usage Analysis using Hadoop 0 2000 4000 6000 8000 10000 12000 14000 Accepted… BadRequest… Created(201) Forbidden… Not… NotFound… OK(200) Unauthorise… Request Types 0 200 400 600 800 1000 1200 January March May July September November January March May July September November 2001 2002 Monthly Requests 0 200000 400000 600000 Amare Amit Bhagat Mukesh Praneel Sanjog Vimal Users
  • 8. • The size of data being collected and analyzed in industry for business intelligence (BI) is growing rapidly making traditional warehousing solution prohibitively expensive • Map Reduce is low level and complex to write • Hive provides high level query language like SQL • This allows for ad-hoc analysis • Business need not know patterns to look for in advance Big Query - Hive
  • 9.
  • 10. Atidan Case Study Customer data collection (KYC) using Hadoop • Business Need • A financial institution had to periodically collect customer data • Customers are very reluctant to provide updated data • This customer data has to be cross-checked against the billions of transactions they receive per day • They want to collate data that is available in public domain from known social media sites • The data had free text, it was unstructured and it also contained irrelevant data • Solution • A graph database is constructed over the extracted social data to analyze transactions • Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database • Aggregate customer information from existing sources, social media, government sources • Analyzed transaction to find hidden patterns • Enable link analysis, risk monitoring • Facilitate decision making(new products) and customer discovery
  • 11. Atidan Case Study Customer data collection (KYC) using Hadoop Big Data Processing Graph Database Customer Clustering Income/Expense changes Corporate structure changes AML Peer group analysis Pattern Analysis Customer InformationWeb Social Channel Partners Utility Providers Aadhar UIDAI
  • 12. • Lowers cost of follow-up with users • Reduces loses by highlighting risky users early • Graph database based AML • Insights into • New products • New customers • New loans to existing customers • New investment opportunities for customers • Reduces operational errors • Traceability of data source Advantages of Hadoop (KYC) Solution to Banks AML Graph Queries Due Diligence Risk Credit Scoring Mitigation Analysis Peer groups New Prospects Insights New Products New Customers
  • 13.
  • 14. Atidan Case Study Email scanning and categorization using MongoDB Business Need Retrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s page for frontend access The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM Solution Atidan proposed a MongoDB-Drupal based solution with the following approach: • Scheduler was created to pull only headers from the all-user common webmail account • Stored them into the intermediate Catalog in MongoDB • Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered records and saved into the final Catalog in MongoDB • Emails from the final catalog pushed into the front end platform (Drupal) Key Takeaways • Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible • The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
  • 15. Atidan Case Study Email scanning and categorization using MongoDB
  • 16. • Node.js (data transformation) • MongoDB (database) • Schema-less • RESTFUL service to access data from the browser • Drupal (Frontend) • Basic unit of data storage and transfer was JSON object • Storage and querying • NoSQL/Simple/Schema-less database • Advantages • highly scalable, very flexible, simple • Connectivity • node.js  Server side Javascript Technologies used