SlideShare a Scribd company logo
1 of 14
Is the Elephant in the room?

                                         Regunath B

                               regunathb@gmail.com
                                Twitter : @RegunathB
Quick read 1.8 million words?




The story is about a battle between great kings and sons, with the principal characters being
Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc.
                                                            Source : The Gramener blog for visualizations –
                                                Analysis of the entire text contained in the Mahabharatha
                                                       (http://blog.gramener.com/category/visualisations)
Insights from Social Media




                         Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile)
                                         (http://ttwick.com/blog/bill-gates-twitter-social-media/)
Insights from Social Media




                                         Source : Impact page of Satyamevjayate
                             (http://www.satyamevjayate.in/impact/impact.php/)
What is Big Data?

●   Big Data challenges and opportunities arise when information in an enterprise
    demonstrates following characteristics:

     –   Volume
          ●   Transaction data from enterprise systems
                   –   For example : Financial transactions, Orders
     –   Variety
          ●   Structured and Unstructured data
                   –   For example : Customer contact, Social Media, Biometrics
     –   Velocity
          ●   High information arrival rates
                   –   For example : Application events, Tagging, Rating of content



●   Big Data opportunities arise when the enterprise is able to derive Value from the
    data characteristics defined above
Food for thought.... on theorems and laws
●   Do hardware and technology trends affect your technology selection?
     –   CPU, RAM and disk size double every 18-24 months [Moore’s law]
     –   Disk seek time remains nearly constant at around 5% speed-up per year


●   Data Seek vs. Data transfer
     –   Software that leverage one of the above (or) a combination
         B+ tree index, LSM tree index, “Fractal tree”


●   CAP theorem effect – ability to achieve only 2 of 3 properties of shared-
    data systems : data Consistency, system Availability and tolerance to
    network Partitions


●   Bandwidth is the most scare commodity in a Data Center
Aadhaar Patterns & Technologies
•
    Principles
      •
         POJO based application implementation
      •
         Light-weight, custom application container
      •
         Http gateway for APIs

•
    Compute Patterns
     •
       Data Locality
     •
       Distribute compute (within a OS process and across)

•
    Compute Architectures
     •
       SEDA – Staged Event Driven Architecture
     •
       Master-Worker(s) Compute Grid

•
    Data Access types
     •
        High throughput streaming : bio-dedupe, analytics
     •
        High volume, moderate latency : workflow, UID records
     •
        High volume , low latency : auth, demo-dedupe,
                         search – eAadhaar, KYC
Aadhaar Architecture
                              •
                                  Real-time monitoring using Events


•
    Work distribution
    using SEDA &
    Messaging
•
    Ability to scale within
    JVM and across
•
    Recovery through
    check-pointing




•
    Sync Http based Auth
    gateway
•
    Protocol Buffers &
    XML payloads
•
    Sharded clusters

                                                   •
                                                       Near Real-time data delivery to warehouse
                                                   •
                                                       Nightly data-sets used to build dashboards, data
                                                       marts and reports
Putting data to work at Aadhaar
Deployment Monitoring
Big Data at Flipkart
 ●   Website traffic
      –   Millions of page hits per day – product catalogs, item availability, promotions,
          search
      –   Millions of active sessions and shopping carts
      –   Latencies measured in low digit milliseconds
 ●   Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...)
      –   Electronic inventory – MP3, eBooks, movies
 ●   New business models, newer channels
 ●   Understanding users, user profiles, social media, experience
      –   Tera bytes of logs containing browsing behavior, data from multiple
          engagement channels
      –   Recommendations based on millions of possible item matches and relevance
          algorithms
Is the Elephant in the room?




From Wikipedia:

"Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored
or goes unaddressed.




Big Data opportunities and challenges are real and present -
It is the Elephant in the room.
Some takeaways from experience


●   Make everything API based
●   Everything fails (hardware, software, network, storage)
     –   System must recover, retry transactions, and sort of self-heal
●   Security and privacy should not be an afterthought
●   Scalability does not come from one product
     –   Watch out for solution and technology stereotyping
●   Open scale out is the only way to go
     –   Heterogeneous, multi-vendor, commodity compute, growing linear fashion.
         Nothing else can adapt!

More Related Content

Viewers also liked

practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome themsaipriyadonthula
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres Regunath B
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantageRegunath B
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 

Viewers also liked (7)

practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
What database
What databaseWhat database
What database
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Is the elephant in the room

  • 1. Is the Elephant in the room? Regunath B regunathb@gmail.com Twitter : @RegunathB
  • 2. Quick read 1.8 million words? The story is about a battle between great kings and sons, with the principal characters being Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc. Source : The Gramener blog for visualizations – Analysis of the entire text contained in the Mahabharatha (http://blog.gramener.com/category/visualisations)
  • 3. Insights from Social Media Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile) (http://ttwick.com/blog/bill-gates-twitter-social-media/)
  • 4. Insights from Social Media Source : Impact page of Satyamevjayate (http://www.satyamevjayate.in/impact/impact.php/)
  • 5. What is Big Data? ● Big Data challenges and opportunities arise when information in an enterprise demonstrates following characteristics: – Volume ● Transaction data from enterprise systems – For example : Financial transactions, Orders – Variety ● Structured and Unstructured data – For example : Customer contact, Social Media, Biometrics – Velocity ● High information arrival rates – For example : Application events, Tagging, Rating of content ● Big Data opportunities arise when the enterprise is able to derive Value from the data characteristics defined above
  • 6. Food for thought.... on theorems and laws ● Do hardware and technology trends affect your technology selection? – CPU, RAM and disk size double every 18-24 months [Moore’s law] – Disk seek time remains nearly constant at around 5% speed-up per year ● Data Seek vs. Data transfer – Software that leverage one of the above (or) a combination B+ tree index, LSM tree index, “Fractal tree” ● CAP theorem effect – ability to achieve only 2 of 3 properties of shared- data systems : data Consistency, system Availability and tolerance to network Partitions ● Bandwidth is the most scare commodity in a Data Center
  • 7. Aadhaar Patterns & Technologies • Principles • POJO based application implementation • Light-weight, custom application container • Http gateway for APIs • Compute Patterns • Data Locality • Distribute compute (within a OS process and across) • Compute Architectures • SEDA – Staged Event Driven Architecture • Master-Worker(s) Compute Grid • Data Access types • High throughput streaming : bio-dedupe, analytics • High volume, moderate latency : workflow, UID records • High volume , low latency : auth, demo-dedupe, search – eAadhaar, KYC
  • 8. Aadhaar Architecture • Real-time monitoring using Events • Work distribution using SEDA & Messaging • Ability to scale within JVM and across • Recovery through check-pointing • Sync Http based Auth gateway • Protocol Buffers & XML payloads • Sharded clusters • Near Real-time data delivery to warehouse • Nightly data-sets used to build dashboards, data marts and reports
  • 9. Putting data to work at Aadhaar
  • 11. Big Data at Flipkart ● Website traffic – Millions of page hits per day – product catalogs, item availability, promotions, search – Millions of active sessions and shopping carts – Latencies measured in low digit milliseconds ● Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...) – Electronic inventory – MP3, eBooks, movies ● New business models, newer channels ● Understanding users, user profiles, social media, experience – Tera bytes of logs containing browsing behavior, data from multiple engagement channels – Recommendations based on millions of possible item matches and relevance algorithms
  • 12.
  • 13. Is the Elephant in the room? From Wikipedia: "Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored or goes unaddressed. Big Data opportunities and challenges are real and present - It is the Elephant in the room.
  • 14. Some takeaways from experience ● Make everything API based ● Everything fails (hardware, software, network, storage) – System must recover, retry transactions, and sort of self-heal ● Security and privacy should not be an afterthought ● Scalability does not come from one product – Watch out for solution and technology stereotyping ● Open scale out is the only way to go – Heterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt!