SlideShare a Scribd company logo
1 of 15
Emerging Technologies
      DIY Analytics
IBM Software for a Smarter Planet


       Emerging Technology - What Do We Do?


       Innovation/collaborations in technologies
       that we hope garner broad industry
       adoption in timeframe of 12 -18 months

       Our technology initiatives are refined based
       on the marketplace & evolution of web
       technologies

       Voice of the Customer – early & direct
       customer engagements (POCs) to iterate
       on both the technology and the business
       value




IBM Confidential                                Chart   2   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   3   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   4   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   5   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       New Intelligence




                      DIY Analytics
     Making Hadoop accessible
    to the business professionals




IBM Confidential                           Chart   6   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       New Intelligence - New Class of Application On Horizon

        Hear business users asking for the
        ability to directly manipulate, analyze &
        remix massive data sources & services
        • LOB “… Google wetted my appetite...I
             want more customizable analytics with
             me in the drivers seat…”                               Rich
                                                                  Spectrum
                                                                 DIY Analytic
        Leveraging easy-to-use, rich data
        manipulation metaphors like                              Applications
        spreadsheets, etc..                                       Emerging


        Rich visualizations to quickly identify
        insights




IBM Confidential                                     Chart   7            © 2009 IBM Corporation
IBM Software for a Smarter Planet


       IBM Emerging Technology Project: BigSheets

        What is it?
        An insight engine for enabling ad-hoc business insights for
        business users - at web scale


        How does it work?
        Discovery Process
        1. point BigSheets to data sources of interests
           • unstructured web data, feeds, XML, etc..
        2. transform data into a form that can be analyzed
           • Unstructured data becomes semi-structured data
           • Example: name: Rod Smith, employer: IBM, state: GA
           • Apply analytics - enriching the data
        3. “what if tooling” - browser-based visual front end - spreadsheet
           metaphor to create worksheets for exploring/visualizing the big data



        What’s different?
        • Unlocking insights embedded in unstructured data
        • Analyzing data previously unavailable to analyze


IBM Confidential                                                  Chart   8       © 2009 IBM Corporation
IBM Software for a Smarter Planet


       BigSheets: Framework on Hadoop


      Expanding upon the Hadoop stack
      • Visual tooling builds extensively on Pig

      Big Sheets Architecture Characteristics:
      • Extensible via UDFs
      • REST API for customer choice of analytic service/
           engine
      •    REST APl for choice of visualization packages
      •    Export content as feeds, XML, etc..
      •    ...more to come




IBM Confidential                                           Chart   9   © 2009 IBM Corporation
IBM Software for a Smarter Planet


        BigSheets in action

                                                   Crowd sourcing - Nikon: what are folks on
                                                   twitter saying about our cameras - by model




[                      Input
    Gather Daily Tweets for May
    • 64 million tweets per day
    •   ~210 terabytes a month              ][
                                             •
                                             •
                                                            Map
                                                 Split data across cluster
                                                 Emit tweets mentioning Nikon
                                                 cameras (key=Nikon D90, …)     ][
                                                                                 •
                                                                                 •
                                                                                 •
                                                                                     model
                                                                                             Reduce

                                                                                     D90: 300 tweets
                                                                                     D3000: 68 tweets             ]
                                                                                     Aggregate tweets for each Nikon

                                                                                                                       •
                                                                                                                       •
                                                                                                                               Output
                                                                                                                       Perform sediment analysis
                                                                                                                       • “..Wow, Great, Incredible…”
                                                                                                                           “..Lousy, sucks, ... “
                                                                                                                           “..no RAW support...”




IBM Confidential                                                     Chart 10
                                                                            3                                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       A Demonstration of BigSheets in action

                                              Crowd sourcing - What do people want to buy?

                   What do people want to buy

                   • Gather

                   • Created an analysis model, using IBM Content      Analytics, looking for ʻbuy signalsʼ:

                    • Verb phrase indicating the desire to get something
                      • “I would really love a...”
                    • Buy Target (“I would really love to get myself a cool new phone”)
                    • Brand, Company, and opinion statements in the context of this buy statement

                    • Deployed the analysis model into BigSheets where it gets deployed across the Hadoop
                      cloud

                    ★In BigSheets each analysis model is considered a macro

                    • Visualize the results

IBM Confidential                                            Chart 11
                                                                   3                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Marketplace Application Example - British Library

                                                                               The Goal
                                                                               Can an ET technology project &
                      Web Archive Opportunity                                  IBM’s Classification Module (ICM)
                                                                               electronically classify & tag web
       Libraries & archives are interested in                                  content & enable/create
       collecting & preserving the web data                                    visualizations
       • British Library has opened the UK Web Archive
            portal for researchers & historians to explore
            preserved web content
       • Parliament nearing vote to give the British Library
            the nod to archive all .uk domain data, spanning 4
            million sites & ~128TB today.
            • Today, web page classification for the 5000 British
                   Library web sites is performed by 30 folks




                                                                               Web Content To Gather:
                                                                               • British Library gathered 1.48 TB of data - 4
                                                                                 web archive files comprising ~400,000 web
                                                                                 pages from 300 archived websites

                                                                               • 4 machines (dual core), HD 1TB, 8 GBs
                                                                                 RAM


IBM Confidential                                                    Chart 12                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Marketplace Application Example: AmEx or IBM
                                                                   Business Questions
                                                                   • Ongoing tracking of acquisitions and
                                                                     associated IP
                                                                   • Visualizations, e.g. corporate
                                                                     genealogy




                                  Project:                         Knowledge of Interest:
                   Improve IP Portfolio Analysis for               •   Corporate genealogies
                       Mergers & Acquisitions                      •   IP ownership roll-up
                                                                   •   Patents ranked by citation
                                                                   •   Augment analysis with items affecting IP
                     “...please collect all US Patent                  value, inventor affiliation, citation rank by
                         filings… then let’s do…”                      time




                                                                   Web Content To Gather:
                                                                   •   SEC filings, e.g. annual and quarterly reports
                                                                   •   USPTO patents, assignments and trademarks
                                                                   •   Company press releases
                                                                   •   Other M&A, inventor information from
                                                                       feeds, webpages


IBM Confidential                                        Chart 13                                            © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Let’s Talk Customers: AmEx or IBM
                                             American Express:
                             Evaluating IP with large amounts of public and private data
     Gathered 1,400,000 U.S. Patents on record from
     2002 - 2009
                                                                          ★ 90 were cited/referenced of AMEX cited patents, 24
     •      The 1,400,000 cited/referenced another 6,100,000                cited 1 time thru one cited 67 times
            U.S. & International patents
                                                                          •   3600 cases from Court of Appeals, Federal Circuit,
     ★ Odd fact: a few patents cited/referenced as many as                    1993 - 2007 (Georgetown Law)
       13,870 other patents
                                                                          ★ 43 mentions of U.S. patents issued between 2002 -
     •      ~216 are AMEX patents                                          2009; relies on exact “Patent No. 9,999,999” match

                                                                          •   Productivity improvement from weeks to hours




IBM Confidential                                               Chart 14                                                © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Conclusion


                        In God we trust
                   ...all others, bring data




IBM Confidential                           Chart 15   © 2009 IBM Corporation

More Related Content

Similar to Disruptive Applications with Hadoop__HadoopSummit2010

IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM (Middle East and Africa)
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data AnalyticsCynthia Saracco
 
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...TIBCO Jaspersoft
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...Neo4j
 
An Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud InnovationAn Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud InnovationOpen Data Center Alliance
 
Advance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopAdvance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopCCG
 
Application Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile ConvergeApplication Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile ConvergeJohn Head
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAvkash Chauhan
 
BI on Cloud Computing
BI on Cloud ComputingBI on Cloud Computing
BI on Cloud Computingtdwiindia
 
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...John Head
 
Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Janine Sneed
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessTIBCO_Software
 
What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4Senturus
 
IBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptxIBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptxKamalKamalli1
 
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...Vanguard Visions
 

Similar to Disruptive Applications with Hadoop__HadoopSummit2010 (20)

IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
 
An Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud InnovationAn Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud Innovation
 
Advance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopAdvance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual Workshop
 
Enabling Ad Hoc Reporting
Enabling Ad Hoc ReportingEnabling Ad Hoc Reporting
Enabling Ad Hoc Reporting
 
Application Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile ConvergeApplication Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile Converge
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
BI on Cloud Computing
BI on Cloud ComputingBI on Cloud Computing
BI on Cloud Computing
 
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
 
Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter Business
 
What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4
 
IBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptxIBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptx
 
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
 

More from Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Disruptive Applications with Hadoop__HadoopSummit2010

  • 1. Emerging Technologies DIY Analytics
  • 2. IBM Software for a Smarter Planet Emerging Technology - What Do We Do? Innovation/collaborations in technologies that we hope garner broad industry adoption in timeframe of 12 -18 months Our technology initiatives are refined based on the marketplace & evolution of web technologies Voice of the Customer – early & direct customer engagements (POCs) to iterate on both the technology and the business value IBM Confidential Chart 2 © 2009 IBM Corporation
  • 3. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 3 © 2009 IBM Corporation
  • 4. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 4 © 2009 IBM Corporation
  • 5. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 5 © 2009 IBM Corporation
  • 6. IBM Software for a Smarter Planet New Intelligence DIY Analytics Making Hadoop accessible to the business professionals IBM Confidential Chart 6 © 2009 IBM Corporation
  • 7. IBM Software for a Smarter Planet New Intelligence - New Class of Application On Horizon Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with me in the drivers seat…” Rich Spectrum DIY Analytic Leveraging easy-to-use, rich data manipulation metaphors like Applications spreadsheets, etc.. Emerging Rich visualizations to quickly identify insights IBM Confidential Chart 7 © 2009 IBM Corporation
  • 8. IBM Software for a Smarter Planet IBM Emerging Technology Project: BigSheets What is it? An insight engine for enabling ad-hoc business insights for business users - at web scale How does it work? Discovery Process 1. point BigSheets to data sources of interests • unstructured web data, feeds, XML, etc.. 2. transform data into a form that can be analyzed • Unstructured data becomes semi-structured data • Example: name: Rod Smith, employer: IBM, state: GA • Apply analytics - enriching the data 3. “what if tooling” - browser-based visual front end - spreadsheet metaphor to create worksheets for exploring/visualizing the big data What’s different? • Unlocking insights embedded in unstructured data • Analyzing data previously unavailable to analyze IBM Confidential Chart 8 © 2009 IBM Corporation
  • 9. IBM Software for a Smarter Planet BigSheets: Framework on Hadoop Expanding upon the Hadoop stack • Visual tooling builds extensively on Pig Big Sheets Architecture Characteristics: • Extensible via UDFs • REST API for customer choice of analytic service/ engine • REST APl for choice of visualization packages • Export content as feeds, XML, etc.. • ...more to come IBM Confidential Chart 9 © 2009 IBM Corporation
  • 10. IBM Software for a Smarter Planet BigSheets in action Crowd sourcing - Nikon: what are folks on twitter saying about our cameras - by model [ Input Gather Daily Tweets for May • 64 million tweets per day • ~210 terabytes a month ][ • • Map Split data across cluster Emit tweets mentioning Nikon cameras (key=Nikon D90, …) ][ • • • model Reduce D90: 300 tweets D3000: 68 tweets ] Aggregate tweets for each Nikon • • Output Perform sediment analysis • “..Wow, Great, Incredible…” “..Lousy, sucks, ... “ “..no RAW support...” IBM Confidential Chart 10 3 © 2009 IBM Corporation
  • 11. IBM Software for a Smarter Planet A Demonstration of BigSheets in action Crowd sourcing - What do people want to buy? What do people want to buy • Gather • Created an analysis model, using IBM Content Analytics, looking for ʻbuy signalsʼ: • Verb phrase indicating the desire to get something • “I would really love a...” • Buy Target (“I would really love to get myself a cool new phone”) • Brand, Company, and opinion statements in the context of this buy statement • Deployed the analysis model into BigSheets where it gets deployed across the Hadoop cloud ★In BigSheets each analysis model is considered a macro • Visualize the results IBM Confidential Chart 11 3 © 2009 IBM Corporation
  • 12. IBM Software for a Smarter Planet Marketplace Application Example - British Library The Goal Can an ET technology project & Web Archive Opportunity IBM’s Classification Module (ICM) electronically classify & tag web Libraries & archives are interested in content & enable/create collecting & preserving the web data visualizations • British Library has opened the UK Web Archive portal for researchers & historians to explore preserved web content • Parliament nearing vote to give the British Library the nod to archive all .uk domain data, spanning 4 million sites & ~128TB today. • Today, web page classification for the 5000 British Library web sites is performed by 30 folks Web Content To Gather: • British Library gathered 1.48 TB of data - 4 web archive files comprising ~400,000 web pages from 300 archived websites • 4 machines (dual core), HD 1TB, 8 GBs RAM IBM Confidential Chart 12 © 2009 IBM Corporation
  • 13. IBM Software for a Smarter Planet Marketplace Application Example: AmEx or IBM Business Questions • Ongoing tracking of acquisitions and associated IP • Visualizations, e.g. corporate genealogy Project: Knowledge of Interest: Improve IP Portfolio Analysis for • Corporate genealogies Mergers & Acquisitions • IP ownership roll-up • Patents ranked by citation • Augment analysis with items affecting IP “...please collect all US Patent value, inventor affiliation, citation rank by filings… then let’s do…” time Web Content To Gather: • SEC filings, e.g. annual and quarterly reports • USPTO patents, assignments and trademarks • Company press releases • Other M&A, inventor information from feeds, webpages IBM Confidential Chart 13 © 2009 IBM Corporation
  • 14. IBM Software for a Smarter Planet Let’s Talk Customers: AmEx or IBM American Express: Evaluating IP with large amounts of public and private data Gathered 1,400,000 U.S. Patents on record from 2002 - 2009 ★ 90 were cited/referenced of AMEX cited patents, 24 • The 1,400,000 cited/referenced another 6,100,000 cited 1 time thru one cited 67 times U.S. & International patents • 3600 cases from Court of Appeals, Federal Circuit, ★ Odd fact: a few patents cited/referenced as many as 1993 - 2007 (Georgetown Law) 13,870 other patents ★ 43 mentions of U.S. patents issued between 2002 - • ~216 are AMEX patents 2009; relies on exact “Patent No. 9,999,999” match • Productivity improvement from weeks to hours IBM Confidential Chart 14 © 2009 IBM Corporation
  • 15. IBM Software for a Smarter Planet Conclusion In God we trust ...all others, bring data IBM Confidential Chart 15 © 2009 IBM Corporation