SlideShare a Scribd company logo
1 of 167
Download to read offline
Introduction to Big Data and
Real Time Analytics Workshop




Telco Big Data & Real Time Analytics Summit 2012
3-5 December 2012, London
www.alanquayle.com/blog
                             © 2012 Alan Quayle Business and Service Development   1
"There are three kinds of lies:
     lies, damned lies, and statistics."


    British Prime Minister Benjamin Disraeli (1804–1881), or perhaps
Samuel Langhorne Clemens (1835 – 1910) better known as Mark Twain




                      © 2012 Alan Quayle Business and Service Development   2
Never Forget This!



      People
                                            Most projects fail here


       Process

      Technology


                 © 2012 Alan Quayle Business and Service Development   3
The Data
                                                      Tsunami!
© 2012 Alan Quayle Business and Service Development          4
Why are we measuring so many things?
•   Atoms vibrate at about 10^13 Hz, assuming we only measure the atom and not the
    subatomic constituents to the resolution of only 1 byte, that’s 10TB per second
•   Now there are rough 7*10^27 atoms in the human body
•   So just monitoring one human body’s atoms will generate 7*10^40 bytes per second.
•   That’s 2*10^48 bytes in a year, that’s 2 yotta yotta bytes


•   By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes,
    that’s only 35*10^21


•   Its easy (fun) to play with numbers! Lies, damned lies and statistics!


•   We do not need to measure each revolution of an airplane’s turbine, only when an
    event (out of tolerance) occurs does it matter.
    o   Events and collecting what matters, NOT collecting everything all the time!
    o   How do we know what matters? Common sense, knowing your business and experimentation!



                                    © 2012 Alan Quayle Business and Service Development     5
Beware the
“Bait and Switch”




             © 2012 Alan Quayle Business and Service Development   6
Data You Need Lots of It!!
© 2012 Alan Quayle Business and Service Development   7
But There’s a Shortage
of Data Scientists to Do
Anything With It




              © 2012 Alan Quayle Business and Service Development   8
So Give Me
 All Your
  Money




© 2012 Alan Quayle Business and Service Development   9
Introduction
•   The purpose of this one day workshop is to provide both an introduction and pragmatic insight
    into Big Data, Data Science and Real-Time Analytics.
•   This course will provide a frank and objective review of the state of the art and the market.
    Examining what is working in practice and what is not through an extensive series of case studies.
•   Big data usually includes data sets with sizes beyond the ability of commonly used software tools
    to capture, manage, and process the data.
    o   Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many
        petabytes.
    o   A new platform of "big data" tools has arisen to handle sense-making over large quantities of data, for
        example the Apache Hadoop Big Data Platform.
•   Analyzing large data sets in near real-time is not new, business intelligence is as old as business
    itself (that is as old as human society).
    o   IT automated it, and enabled an organization to own it rather than in the wet-ware of a few human
        brains (generally the owners of a business.)
    o   Some real-time analysis results in automated triggers, so called machine learning, most analysis still
        requires human interpretation which is not straight forward.
    o   Analysis of such large and mixed data sources has its own problems, as we’ll discuss in the course.
    o   Privacy and regulation cannot be ignored, for some industries this will limit the application of Big
        Data.


                                       © 2012 Alan Quayle Business and Service Development                       10
Structure Part 1 of 5
•    09:00 Registration


•    09:30 History and Overview: Understanding Big Data and Real-Time Analytics in
     Context
•    What do we mean by Big Data?                                      •      History of Big Data
•    Why does Big Data matter?                                         •      Taxonomy of Big Data Companies
•    Big Data Maturity                                                 •      Big Data Landscape
•    The 3Vs” Volume, Variety and Velocity                             •      List of Companies in Big Data (and their Big
•    What are the Domains of Big Data?                                        Data revenues)
•    Big Data Technologies                                             •      Big Data Market Sizing
•    What Enterprises Think of Big Data                                •      Telecoms and Real-Time
•    How Enterprise Verticals are Impacted by Big Data                 •      O2 More: Proof we can do it!
•    Why Now?
•    Key Trends driving towards Big Data


•    10:45 Coffee Break



                                    © 2012 Alan Quayle Business and Service Development                              11
Structure Part 2 of 5
•   11:00 Quick Technology Review: Diving into a little detail on a few of the key technologies
    (only as deep as the architecture) to understand their history and capabilities /
    limitations
•   Hadoop
    o   What is Hadoop?
    o   Ecosystem
    o   History
    o   Design Axioms
    o   Hadoop Distributed File System
    o   MapReduce: Distributed Processing
    o   Architecture
    o   Data Schemas
    o   Query Language Flexibility
    o   Economics
    o   Case Studies
•   Hadoop and Hbase in the Cloud (Amazon)
•   NoSQL and Cassandra + some use cases
•   Hbase versus Cassandra
•   Graph Database introduction



                                            © 2012 Alan Quayle Business and Service Development   12
Structure Part 3 of 5                                              •      The Social Enterprise
                                                                              o    Business Benefits
                                                                                   ALU example
•   12:00 & 14:00 Application of Big Data
                                                                              o

                                                                              o    Drivers
•   Hardware and Software Trends
                                                                              o    Social + Data Analysis = Business
    o   Execution and Results Characteristics
                                                                                   intelligence
    o   Framework: Ecosystem, Application Services, Data
                                                                              o    AT&T Case Study
        Management
                                                                              o    Lessons Learned
•   Real-Time Analytics
                                                                       •      Telcos and Big Data
    o   Use Cases
                                                                              o    TMF Survey
    o   Extended RDMS versus MapReduce / Hadoop
                                                                              o    Big Data Framework
    o   Requirements, Trends, People and Organization
                                                                              o    Predictive / Adaptive Analytics
        Issues, Outlook
                                                                              o    Decision Engineering
•   Big Data and the Cloud                                                    o    The Problem with Telecom
        Why the Cloud and Big Data?
    o
                                                                       •      Telco Analytics
    o   Cloud benefits
                                                                              o    Customer Profiling
    o   Use Cases: Bankinter, Etsy, Razorfish
                                                                              o    Next Product Tools
                                                                              o    Marketing Mix Modeling
                                                                              o    Cost of Acquisition Tools
                                                                              o    Case Study
•   13:00ish Lunch
                                      © 2012 Alan Quayle Business and Service Development                              13
Structure Part 4 of 5
•   15:00 Ecosystem, Taxonomies and                          •      Case Studies
    Suppliers: Understanding the many                        •      Real Time Analytics for Big Data Lessons from
    suppliers, technology camps, and                                Facebook
                                                                    o    Quick technology review
    approaches
                                                                    o    Facebook Real-time Analytics System
•   Taxonomy of Big Data Companies
                                                                    o    Goal
•   Big Data Landscape                                              o    Actual Analytics
•   Cloudera                                                        o    Solution
•   Autonomy                                                        o    Memory, Collocate, Economics
•   Vertica                                                  •      Real Time Analytics for Big Data Lessons from
•   InfoChimps                                                      Twitter
•   Guavas                                                          o    Requirements
                                                                         Actual Analytics
•   Matrix
                                                                    o

                                                                    o    Challenges
                                                                    o    Performance
                                                                    o    One data any API
                                                                    o    Solution
                                                                    o    Memory, Collocate, Economics
                                                             •      Other Case Studies
                                                             •      Orbitz, Hertz, Yelp


                                 © 2012 Alan Quayle Business and Service Development                           14
Structure Part 5 of 5
 •   16:00 Global Enterprise and Telecom Survey on Big Data and Real-Time
     Analytics
 •   Background
 •   The Questions
 •   The Importance of Analytics
 •   Impact of Big Data on Analytics
 •   Size of Data Sets, Number of Data Sources
 •   Update Frequency
 •   Integration of Data Sources
 •   Data Set Responsibility
 •   Types of Data, Types of Processing and Analytics
 •   Challenges
 •   Big Data Analytics Platforms
 •   Benefits and Plans
 •   Data Analytics Storage and IT Infrastructure Requirements
 •   Increasing Interest in Hadoop MapReduce Framework Technology
 •   Conclusions

 •   Recommendations and Wrap Up

                          © 2012 Alan Quayle Business and Service Development   15
Alan Quayle

•   22 years of experience in the telecommunication industry, focused on developing
    profitable new businesses in service providers, suppliers and start-ups.
•   Customers include
    o   Operators such as AT&T, BT, Charter, Etisalat, M1, O2, Rogers, Swisscom, T-Mobile,
        Telstra, Time Warner Cable, Verizon and Vodafone;
    o   Suppliers such as Adobe, Alcatel-Lucent, Ericsson, Huawei, Nokia Siemens Networks,
        and Oracle; and
    o   Innovative start-ups such as Apigee, AppTrigger (sold to Metaswitch), Camiant (sold to
        Tekelec), OpenCloud, and Voxeo.
•   Work with the developer community and on the board of developers such as
    GotoCamera, hSenid Mobile, as well as suppliers such as Sigma Systems.
•   Weblog www.alanquayle.com/blog
•   Linkedin http://www.linkedin.com/in/alanquayle



                                 © 2012 Alan Quayle Business and Service Development             16
A Thank You to Those helping me Put this Course
Together
•   In putting this workshop together I’d like to thank the following
    suppliers for their time, openness, willingness to review, and provide
    material to ensure this workshop is up-to-the-minute.
    o   And especially for not requiring any editorial control over the content or my
        views expressed in this material (in reverse alphabetically order).
•   Guavas
•   HP (don’t mention the Autonomy deal)
•   Versant, NoSQL database vendor
•   Ty Wang, social media entrepreneur using FB Social Graph
•   Lorien Pratt, Data / Decision Scientist with Telco focus
•   Amazon Web Services
•   Matrixx


                               © 2012 Alan Quayle Business and Service Development      17
Introductions

•    Spend 2 minutes to introduce yourself


     o   Name, current employer and job


     o   Let us know your favorite hobby
           • For me its hiking with my family


     o   What you want to get out of this course
           • What topics are most important to you?

18
                            (c) 2012 Alan Quayle Business and Service Development
History and Overview
           Understanding Big Data and Real-Time
           Analytics in Context




© 2012 Alan Quayle Business and Service Development   19
Structure

•    What do we mean by Big Data?                            •      History of Big Data
•    Why does Big Data matter?                               •      Taxonomy of Big Data Companies
•    Big Data Maturity                                       •      Big Data Landscape
•    The 3Vs” Volume, Variety and Velocity                   •      List of Companies in Big Data (and
•    What are the Domains of Big Data?                              their Big Data revenues)
•    Big Data Technologies                                   •      Big Data Market Sizing
•    What Enterprises Think of Big Data                      •      Telecoms and Real-Time
•    How Enterprise Verticals are Impacted                   •      O2 More: Proof we can do it!
     by Big Data
•    Why Now?
•    Key Trends driving towards Big Data




                                 © 2012 Alan Quayle Business and Service Development                     20
What Do We Mean by Big Data?




               © 2012 Alan Quayle Business and Service Development   21
IDC’s Definition of Big Data




                 © 2012 Alan Quayle Business and Service Development   22
What is Big Data




                   © 2012 Alan Quayle Business and Service Development   23
Why does Big Data Matter?




                © 2012 Alan Quayle Business and Service Development   24
© 2012 Alan Quayle Business and Service Development   25
© 2012 Alan Quayle Business and Service Development   26
© 2012 Alan Quayle Business and Service Development   27
Another Version of the 3 Vs

•   Volume: Data sets are expanding constantly. A strategic approach to
    big data takes into account ways to store and manage the huge
    volumes of data that are being generated.
•   Variety: Big data comes in many forms. Analyzing multi-structured
    data can yield important insights that can help direct a business
    strategy.
•   Velocity: The speed at which data is analyzed is everything,
    especially when working in a time sensitive business environment.




                          © 2012 Alan Quayle Business and Service Development   28
© 2012 Alan Quayle Business and Service Development   29
© 2012 Alan Quayle Business and Service Development   30
© 2012 Alan Quayle Business and Service Development   31
© 2012 Alan Quayle Business and Service Development   32
© 2012 Alan Quayle Business and Service Development   33
What are the Domains of Big Data?




                © 2012 Alan Quayle Business and Service Development   34
Big Data Technology Stack




                © 2012 Alan Quayle Business and Service Development   35
Big Data Technologies




                © 2012 Alan Quayle Business and Service Development   36
The Technology has Become Quite Fashionable




                © 2012 Alan Quayle Business and Service Development   37
© 2012 Alan Quayle Business and Service Development   38
© 2012 Alan Quayle Business and Service Development   39
© 2012 Alan Quayle Business and Service Development   40
© 2012 Alan Quayle Business and Service Development   41
© 2012 Alan Quayle Business and Service Development   42
Big Data Use Cases




                © 2012 Alan Quayle Business and Service Development   43
© 2012 Alan Quayle Business and Service Development   44
Companies in Big Data
•   Storage: HP, EMC, IBM, Dell, NetApp, Hitachi Ltd., Fujitsu, Oracle, NEC


•   Servers: IBM, HP, Dell, Oracle, Fujitsu, Acer, Cray, Groupe Bull, Hitachi, NEC, SGI, Stratus
    Technologies, Unisys, Cisco, Lenovo


•   Networking: Cisco, Brocade, HP, Dell, IBM, Alcatel-Lucent, F5 Networks, Citrix


•   Relational database software: Oracle Exadata, IBM Netezza, IBM Smart Analytics System,
    Teradata, HP Vertica and Autonomy, SAP Sybase IQ, EMC Greenplum DB and HD, Microsoft SQL
    Server Parallel Edition, IBM Netezza High Capacity Appliance, Teradata Extreme Performance
    Appliance, SAP-Sybase IQ


•   Hadoop-based data management and analysis software: Cloudera, MapR, EMC Greenplum HD,
    Oracle Big Data Appliance, IBM BigInsights, Hstreaming, Platfora, Zettaset, DataStax,
    Karmashere, Datameer, Hadapt, and so forth


•   XML databases: MarkLogic, Oracle XML DB, IBM pureXML, Software AG webMethods, Tamino
    XML Server, TigerLogic, Xyleme, and so forth


                                   © 2012 Alan Quayle Business and Service Development             45
Companies in Big Data
•   Object-oriented databases: Jade Software, Objectivity, Progress Software, Versant

•   Graph databases: Neo Technology, Objectivity, Franz Inc., Sones, Ravel

•   Ultra-high-speed streaming data technologies: IBM InfoSphere Streams, Informatica
    Ultra Messaging Streaming Edition, TIBCO FTL and BusinessEvents, Progress Software
    Apama CEP

•   Analytics and discovery software: SAS, IBM, Attivio, HP Autonomy, Skytree, Attivio,
    Oracle Advanced Analytics, IBM SPSS, Microsoft, Vivisimo, ZyLAB, Sinequa, Revolution
    Analytics, KXEN, BA Insight, Palantir, Perfect Search, Wolfram Alpha

•   Decision support and automation software including applications: Webtrends, Adobe-
    Omniture, IBM Coremetrics, FICO

•   Services: Accenture, Deloitte, TCS, HP, Teradata, Mu Sigma, Think Big Analytics,
•   Hortonworks, Hashrocket, KloudData, Trendwise Analytics

                                 © 2012 Alan Quayle Business and Service Development      46
Big Data Is a Big Market & Big Business - $50 Billion
Market by 2017 (according to Wikibon)
•   Open source analyst firm Wikibon pegs the current Big Data market at just over $5
    billion (IDC and others agree with)


•   Wikibon forecast the Big Data market will grow at a CAGR of 58% between now and
    2017, hitting the $50 billion within five years.


•   Vendors from whales like IBM and HP to pure-plays like Vertica and Cloudera are
    bringing in significant revenue today helping enterprises, governments and
    healthcare organizations process and make sense of the torrents of unstructured data
    flowing from mobile devices, sensors, social media and other sources.


•   Today Big Data technologies like Hadoop are mostly in production at Web and online
    gaming companies, large financial services firms and banks, and online retailers.


                                © 2012 Alan Quayle Business and Service Development     47
Big Data Is Big Market & Big Business - $50 Billion
Market by 2017
•   Another important point is that, while Hadoop may be the poster child of Big Data,
    there are other important technologies at play.
    o   Hadoop: open source framework for distributing data processing across multiple nodes, these
        include massively parallel data warehouses “that deliver fast data loading and real-time
        analytic capabilities,”
    o   Analytic platforms and applications that allow Data Scientists and Business Analysts to
        manipulate Big Data; and
    o   Data Visualization tools that bring insights from Big Data analysis alive for end users.


•   Of the current market, Big Data pure-play vendors account for $300 million in Big
    Data-related revenue.
    o   Despite their relatively small percentage of current overall revenue (approximately 5%), Big
        Data pure-play vendors – such as Vertica, Splunk and Cloudera — are responsible for the vast
        majority of new innovations and modern approaches to data management and analytics that
        have emerged over the last several years and made Big Data the hottest sector in IT.




                                     © 2012 Alan Quayle Business and Service Development               48
Wikibon Forecast




                   © 2012 Alan Quayle Business and Service Development   49
IDC’s Forecast




                 © 2012 Alan Quayle Business and Service Development   50
© 2012 Alan Quayle Business and Service Development   51
© 2012 Alan Quayle Business and Service Development   52
© 2012 Alan Quayle Business and Service Development   53
© 2012 Alan Quayle Business and Service Development   54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
Technology Review
           Diving into a little detail on a few of the key
           technologies (only as deep as the
           architecture) to understand their history
           and capabilities / limitations




© 2012 Alan Quayle Business and Service Development   73
Structure Part 2 of 4
•   Hadoop
    o   What is Hadoop?
    o   Ecosystem
    o   History
    o   Design Axioms
    o   Hadoop Distributed File System
    o   MapReduce: Distributed Processing
    o   Architecture
    o   Data Schemas
    o   Query Language Flexibility
    o   Economics
    o   Case Studies
•   Hadoop and Hbase in the Cloud (Amazon)
•   NoSQL and Cassandra + some use cases
•   Hbase versus Cassandra
•   Graph Database introduction



                                     © 2012 Alan Quayle Business and Service Development   74
Hbase Versus Cassandra: History

•   HBase and its required supporting systems are derived from what is
    known of the original Google BigTable and Google File System
    designs (as known from the Google File System paper Google
    published in 2003, and the BigTable paper published in 2006).


•   Cassandra on the other hand is a recent open source fork of a
    standalone database system initially coded by Facebook, which
    while implementing the BigTable data model, uses a system inspired
    by Amazon’s Dynamo for storing data (in fact much of the initial
    development work on Cassandra was performed by two Dynamo
    engineers recruited to Facebook from Amazon).

                         © 2012 Alan Quayle Business and Service Development   75
Hbase Versus Cassandra:

•   These differing histories have resulted in HBase being more suitable for data
    warehousing, and large scale data processing and analysis (for example, such as
    that involved when indexing the Web)
•   Cassandra being more suitable for real time transaction processing and the
    serving of interactive data.
•   For lightweight validation you’ll find the current makeup of the key committers
    interesting:
    o   the primary committers to HBase work for Bing (M$ bought their search company last
        year, and gave them permission to continue submitting open source code after a couple
        of months).
    o   By contrast the primary committers on Cassandra work for Rackspace, which supports
        the idea of an advanced general purpose NOSQL solution being freely available to
        counter the threat of companies becoming locked in to the proprietary NOSQL solutions
        offered by the likes of Google, Yahoo and Amazon EC2.



                                 © 2012 Alan Quayle Business and Service Development            76
•   The CAP Theorem, and was developed by Professor Eric Brewer, Co-founder and Chief Scientist of
    Inktomi.
•   The theorem states, that a distributed (or “shared data”) system design, can offer at most two out of three
    desirable properties – Consistency, Availability and tolerance to network Partitions. Consistency means
    that if someone writes a value to a database, thereafter other users will immediately be able to read the
    same value back. Availability means that if some number of nodes fail in your cluster the distributed
    system can remain operational, and Tolerance to Partitions means that if the nodes in your cluster are
    divided into two groups that can no longer communicate by a network failure, again the system remains
    operational
•   If you search online posts related to HBase and Cassandra comparisons, you will regularly find the HBase
    community explaining that they have chosen CP, while Cassandra has chosen AP
•   BUT the CAP theorem only applies to a single distributed algorithm. But there is no reason why you
    cannot design a single system where for any given operation, the underlying algorithm and thus the trade-
    off achieved is selectable.
•   Thus while it is true that a system may only offer two of these properties per operation, what has been
    widely missed is that a system can be designed that allows a caller to choose which properties they want
    when any given operation is performed.
•   Not only that, reality is not nearly so black and white, and it is possible to offer differing degrees of
    balance between consistency, availability and tolerance to partition. This is Cassandra.
                                         © 2012 Alan Quayle Business and Service Development                    77
Application of Big Data




© 2012 Alan Quayle Business and Service Development   78
Structure                                                          •      The Social Enterprise
                                                                          o    Business Benefits
•   Hardware and Software Trends                                          o    ALU example
                                                                          o    Drivers
    o   Execution and Results Characteristics
                                                                          o    Social + Data Analysis = Business
    o   Framework: Ecosystem, Application
                                                                               intelligence
        Services, Data Management                                         o    AT&T Case Study
•   Real-Time Analytics                                                   o    Lessons Learned
    o   Use Cases                                                  •      Telcos and Big Data
    o   Extended RDMS versus MapReduce /                                  o    TMF Survey
                                                                          o    Big Data Framework
        Hadoop
                                                                          o    Predictive / Adaptive Analytics
    o   Requirements, Trends, People and
                                                                          o    Decision Engineering
        Organization Issues, Outlook                                      o    The Problem with Telecom
•   Big Data and the Cloud                                         •      Telco Analytics
    o   Why the Cloud and Big Data?                                       o    Customer Profiling
    o   Cloud benefits                                                    o    Next Product Tools
                                                                          o    Marketing Mix Modeling
    o   Use Cases: Bankinter, Etsy, Razorfish
                                                                          o    Cost of Acquisition Tools
                                                                          o    Case Study


                                  © 2012 Alan Quayle Business and Service Development                              79
Use Cases for Big Data Analytics
•    Search ranking.
     o   All search engines attempt to rank the relevance of a webpage to a search request against all
         other possible webpages
     o   Google’s page rank algorithm is, of course, the poster child for this use case
•    Ad tracking.
     o   E-commerce sites typically record an enormous river of data including every page event in
         every user session
     o   This allows for very short turnaround of experiments in ad placement, color, size, wording,
         and other features
     o   When an experiment shows that such a feature change in an ad results in improved click
         through behavior, the change can be implemented virtually in real time
•    Location and proximity tracking.
     o   Many use cases add precise GPS location tracking, together with frequent updates, in
         operational applications, security analysis, navigation, and social media
     o   Precise location tracking opens the door for an enormous ocean of data about other locations
         nearby the GPS measurement
                                      © 2012 Alan Quayle Business and Service Development          80
Use Cases for Big Data Analytics
•    Causal factor discovery.
     o   Point-of-sale data has long been able to show us when the sales of a product goes sharply up
         or down. But searching for the causal factors that explain these deviations has been, at best, a
         guessing game or an art form.
     o   The answers may be found in competitive pricing data, competitive promotional data
         including print and television media, weather, holidays, national events including disasters,
         and virally spread opinions found in social media.


•    Social CRM.
     o   This use case is one of the hottest new areas for marketing analysis. The Altimeter Group has
         described a very useful set of key performance indicators for social CRM that include share of
         voice, audience engagement, conversation reach, active advocates, advocate influence,
         advocacy impact, resolution rate, resolution time, satisfaction score, topic trends, sentiment
         ratio, and idea impact.
     o   The calculation of these KPIs involves in-depth trolling of a huge array of data sources,
         especially unstructured social media.

                                      © 2012 Alan Quayle Business and Service Development            81
Use Cases for Big Data Analytics
•    Document similarity testing.
     o   Two documents can be compared to derive a metric of similarity. There is a large body of academic
         research and tested algorithms, for example latent semantic analysis, that is just now finding its way to
         driving monetized insights of interest to big data practitioners.
     o   For example, a single source document can be used as a kind of multifaceted template to compare against a
         large set of target documents. This could be used for threat discovery, sentiment analysis, and opinion
         polls. For example: "find all the documents that agree with my source document on global warming.“


•    Genomics analysis: e.g., commercial seed gene sequencing.
     o   A few months ago the cotton research community was thrilled by a genome sequencing announcement that
         stated in part "The sequence will serve a critical role as the reference for future assembly of the larger
         cotton crop genome.
     o   Cotton is the most important fiber crop worldwide and this sequence information will open the way for
         more rapid breeding for higher yield, better fiber quality and adaptation to environmental stresses and for
         insect and disease resistance.” Scientist Ryan Rapp stressed the importance of involving the cotton
         research community in analyzing the sequence, identifying genes and gene families and determining the
         future directions of research.
     o   This use case is just one example of a whole industry that is being formed to address genomics analysis
         broadly, beyond this example of seed gene sequencing.



                                           © 2012 Alan Quayle Business and Service Development                        82
Use Cases for Big Data Analytics
•    Discovery of customer cohort groups.
     o   Customer cohort groups are used by many enterprises to identify common demographic trends and
         behavior histories. We are all familiar with Amazon's cohort groups when they say other customers who
         bought the same book as you have also bought the following books. Of course, if you can sell your product
         or service to one member of a cohort group, then all the rest may be reasonable prospects. Cohort groups
         are represented logically and graphically as links, and much of the analysis of cohort groups involves
         specialized link analysis algorithms.
•    In-flight aircraft status.
     o   This use case as well as the following two use cases are made possible by the introduction of sensor
         technology everywhere. In the case of aircraft systems, in-flight status of hundreds of variables on engines,
         fuel systems, hydraulics, and electrical systems are measured and transmitted every few milliseconds. The
         value of this use case is not just the engineering telemetry data that could be analyzed at some future point
         in time, but drives real-time adaptive control, fuel usage, part failure prediction, and pilot notification.
•    Smart utility meters.
     o   It didn't take long for utility companies to figure out that a smart meter can be used for more than just the
         monthly readout that produces the customer’s utility bill. By drastically cranking up the frequency of the
         readouts to as much as one readout per second per meter across the entire customer landscape, many
         useful analyses can be performed including dynamic load-balancing, failure response, adaptive pricing,
         and longer-term strategies for incenting customers to utilize the utility more effectively (either from the
         customers’ point of view or the utility's point of view!)

                                           © 2012 Alan Quayle Business and Service Development                          83
Use Cases for Big Data Analytics
•   Building sensors.
    o   Modern industrial buildings and high-rises are being fitted with thousands of small
        sensors to detect temperature, humidity, vibration, and noise.
    o   Like the smart utility meters, collecting this data every few seconds 24 hours per day
        allows many forms of analysis including energy usage, unusual problems including
        security violations, component failure in air-conditioning and heating systems and
        plumbing systems, and the development of construction practices and pricing strategies.


•   Satellite image comparison.
    o   Images of the regions of the earth from satellites are captured by every pass of certain
        satellites on intervals typically separated by a small number of days.
    o   Overlaying these images and computing the differences allows the creation of hot spot
        maps showing what has changed. This analysis can identify construction, destruction,
        changes due to disasters like hurricanes and earthquakes and fires, and the spread of
        human encroachment.



                                   © 2012 Alan Quayle Business and Service Development             84
Use Cases for Big Data Analytics
•   CAT scan comparisons.
    o   CAT scans are stacks of images taken as "slices" of the human body. Large
        libraries of CAT scans can be analyzed to facilitate the automatic diagnosis of
        medical issues and their prevalence.


•   Financial account fraud detection and intervention.
    o   Account fraud, of course, has immediate and obvious financial impact. In
        many cases fraud can be detected by patterns of account behavior, in some
        cases crossing multiple financial systems. For example, "check kiting" requires
        the rapid transfer of money back and forth between two separate accounts.
    o   Certain forms of broker fraud involve two conspiring brokers selling a security
        back-and-forth at ever increasing prices, until an unsuspecting third party
        enters the action by buying the security, allowing the fraudulent brokers to
        quickly exit. Again, this behavior may take place across two separate
        exchanges in a short period of time.

                               © 2012 Alan Quayle Business and Service Development     85
Use cases for big data analytics
•   Computer system hacking detection and intervention.
    o   System hacking in many cases involves an unusual entry mode or some other kind of behavior
        that in retrospect is a smoking gun but may be hard to detect in real-time.
•   Online game gesture tracking.
    o   Online game companies typically record every click and maneuver by every player at the most
        fine grained level. This avalanche of "telemetry data" allows fraud detection, intervention for a
        player who is getting consistently defeated (and therefore discouraged), offers of additional
        features or game goals for players who are about to finish a game and depart, ideas for new
        game features, and experiments for new features in the games.
    o   This can be generalized to television viewing. Your DVR box can capture remote control
        keystrokes, recording events, playback events, picture-in-picture viewing, and the context of
        the guide. All of this can be sent back to your provider.
•   Big science including atom smashers, weather analysis, space probe telemetry feeds.
    o   Major scientific projects have always collected a lot of data, but now the techniques of big data
        analytics are allowing broader access and much more timely access to the data. Big science
        data, of course, is a mixture of all forms of data, scalar, vector, complex structures, analog wave
        forms, and images.



                                     © 2012 Alan Quayle Business and Service Development                 86
Use Cases for Big Data Analytics
•   "Data bag" exploration.
    o   There are many situations in commercial environments and in the research
        communities where large volumes of raw data are collected. One example might be data
        collected about structure fires. Beyond the predictable dimensions of time, place,
        primary cause of fire, and responding firefighters, there may be a wealth of
        unpredictable anecdotal data that at best can be modeled as a disorderly collection of
        name value pairs, such as "contributing weather= lightning.” Another example would be
        the listing of all relevant financial assets for a defendant in a lawsuit.
    o   Again such a list is likely to be a disorderly collection of name value pairs, such as
        "shared real estate ownership =condominium.” The list of examples like this is endless.
        What they have in common is the need to encapsulate the disorderly collection of name
        value pairs which is generally known as a "data bag.” Complex data bags may contain
        both name value pairs as well as embedded sub data bags. The challenge in this use case
        is to find a common way to approach the analysis of data bags when the content of the
        data may need to be discovered after the data is loaded.
                                      © 2012 Alan Quayle Business and Service Development        87
Use Cases for Big Data Analytics
•   The final two use cases are old and even predate data warehousing itself. But
    new life has been breathed into these use cases because of the exciting potential
    of ultra-atomic customer behavior data.
    o   Loan risk analysis and insurance policy underwriting. In order to evaluate the risk of a
        prospective loan or a prospective insurance policy, many data sources can be brought
        into play ranging from payment histories, detailed credit behavior, employment data,
        and financial asset disclosures. In some cases the collateral for a loan or the insured
        item may be accompanied by image data.
    o   Customer churn analysis. Enterprises concerned with churn want to understand the
        predictive factors leading up to the loss of a customer, including that customer’s detailed
        behavior as well as many external factors including the economy, life stage and other
        demographics of the customer, and finally real time competitive issues.




                                     © 2012 Alan Quayle Business and Service Development          88
Characteristics
of Big Data




           How the Cloud Is
           Big Data’s Best Friend



                    Big Data on the Cloud
                    In the Real World
Characteristics of
    Big Data
Features driven by MapReduce
Big Data is Getting Bigger



                         2.7 Zetabytes in 2012
                         Over 90% will be
                         unstructured
                         Data spread across a
                         wide array of silos
Why is Big Data Hard (and Getting Harder)?



        Changing Data Requirements
      Faster response time of fresher data
    Sampling is not good enough & history is
                   important
       Increasing complexity of analytics
   Users demand inexpensive experimentation
Where is it Coming From?

 Computer                    Human
 Generated                   Generated
   • Application server        • Twitter “Fire Hose”
     logs (web sites,            50m tweets/day
     games)                      1,400% growth per
   • Sensor data (weather,       year
     water, smart grids)       • Blogs/Reviews/Emails
   • Images/videos               /Pictures
     (traffic, security        • Social Graphs:
     cameras)                    Facebook, Linked-in,
                                 Contacts
Big Data Verticals



                                                                                   Social
Media/Ad                                   Life      Financial
               Oil & Gas    Retail                                 Security      Network/
vertising                                Sciences    Services
                                                                                  Gaming


                                                                                    User
                                                                   Anti-virus    Demographi
  Targeted                 Recommen                  Monte Carlo                     cs
 Advertising                   d                     Simulations


                Seismic                   Genome                     Fraud         Usage
                Analysis                  Analysis                  Detection     analysis


 Image and
                           Transaction                 Risk
   Video
                            s Analysis                Analysis       Image        In-game
 Processing
                                                                   Recognition     metrics
Bank – Monte Carlo Simulations

               “The AWS platform was a good fit for its
               unlimited and flexible computational
               power to our risk-simulation process
23 Hours       requirements.


to             With AWS, we now have the power to
               decide how fast we want to obtain
               simulation results, and, more importantly,

20 Minutes     we have the ability to run simulations not
               possible before due to the large amount of
               infrastructure required.” – Castillo,
               Director, Bankinter
Recommendations




The Taste Test http://www.etsy.com/tastetest
Recommendations

Gift Ideas for Facebook Friends




etsy.com/gifts
Recommendations
Click Stream Analysis

   User recently
   purchased a
   sports movie and      Targeted Ad
   is searching for
   video games        (1.7 Million per day)
The Social Enterprise

•   Implementations are getting bigger and growing faster than ever
•   Virtually all data continue to show sustained real-world benefits (McKinsey,
    IBM, Frost and Sullivan, AIIM)
•   Everything is becoming social: Social features are appearing in virtually all types
    of applications
•   There continues to be considerable confusion about who “owns” social in the
    organization
•   The predicted social data explosion: It happened
•   Mining insight from social data has now become a major industry (#bigdata,
    #analytics)
•   The blur between internal and external social business has not progressed as far
    as many thought
•   The first serious talk about open social business standards has begun

                              © 2012 Alan Quayle Business and Service Development    101
© 2012 Alan Quayle Business and Service Development   102
Decision
             Engineering




           Adaptive
           Analytics

      Predictive Analytics


          Reporting

Data Management (including data
  migration, data quality, data
          modeling)
Decision
             Engineering




           Adaptive
           Analytics

      Predictive Analytics


          Reporting

Data Management (including data
  migration, data quality, data
          modeling)
Predictive/Adaptive Analytics on 1 slide

                        Will this customer churn?
 Yes/No data: If customer has an open trouble ticket: Yes, otherwise: No
 Real-Valued: If customer age < 30: Yes, otherwise: No              Pattern

 Combination: If customer age <30 AND has an open trouble ticket: Yes,
    otherwise: No
 Linear Combination: If 2.3 x Age + 4.4 x Income > 40: Yes, otherwise: No
 Predictive Analytics: Obtain these numbers by analyzing historical data
 Adaptive Analytics: Update your historical data, and re-derive the numbers
    periodically to take changing situations into account.
 Nonlinear Analytics:
                          Income              vs.    Income

                                                              age
                                   age
Decision
            Engineering



           Adaptive
           Analytics

      Predictive Analytics


          Reporting

Data Management (including data
  migration, data quality, data
          modeling)
Decision Model (part of Decision Engineering)




                             From: Agile Decision Making: Improving business results with analytics
                             TM Forum Quick Insight report, 2011. Source: Lorien Pratt

 …Decision engineering places analytics in the larger business
 context. Each “f” here is an analytic, or based on human
 expertise
1

                                 Data used to
                                 construct the
                                   analytic


              3
                                     2           5
                  Sally                          Sally is likely
Operational               If 2.3 x Age + 4.4 x
   data                   Income > 40: Yes,      enough to
                          otherwise: No          churn that
                                                 we should
                                     4
                                                 call her
Key Distinctions

•   Automated versus human-in-the-loop while building
    analytics
•   Automated versus human-in-the-loop while using
    analytics
•   Strategic versus tactical goals
•   One-size fits all versus demographic versus personalized
•   Within-silo versus between-silo
•   Cleansing for operational versus analytic purposes
Moving Analytics to the Center: Retailers face new competition that is driving
an advanced view of customers and interactions to the center of the business.




  How to dynamically                  Multi-Channel Operations             How do I leverage and
  manage margin and                                                        operationalize customer
  brand perception with                                                    insights and experience
  the right mix of regular,                                                data to drive personal,
  promotional and                                                          timely, and relevant
                              Merchandising            Marketing & Sales
  markdown products                                                        interactions across all
  across categories,                           Advanced                    channels?
  channels, and formats?                       Customer
                                              Intelligence                 How do I create a
  Are inventory and                                                        responsive analytics
  demand data leveraged                                                    capability, and
                              Supply Chain                   Operations
  to optimize the customer                                                 governance relative to
  experience and                                                           the right-time
  effectively respond to                                                   application of analytic
  changing marketing                                                       decision making?
                                    Supplier/Partner Collaboration
  conditions?
Semantic Framework: Applied Customer Analytics Capability
The New Analytical Competency

Focus of Efforts in the Past           New Competency Requirements
Large-scale Integration of All Data    Connected Information & Analytics
Sources                                Governance for the Enterprise
Central Control of Meta Data and       Provisioning Information & Insights to
Information Usage                      Point of Leverage
Developing the Most Technically        Agile Analytical Modeling Processes &
Correct Analytical Point Solution      Rapid Evaluation of Business Lift
Possible

 Example-

 FROM: How can we use all possible customer dimensions to predict
 customer churn?

 TO: What is the optimum behavior modeling framework to rapidly build
 and deploy models applicable to multiple business objectives that change
 over time?
Predictive Analytics



         Historical           Future Needs
     Approaches Rely on      Require a More
        Static Data         Dynamic Approach
 •   Propensity to Churn   • Ability to intervene
 •   Propensity to Buy       in customer
 •   Propensity to Pay       interactions to
                             create desired
 •   Customer Lifetime
                             outcomes
     Value
Problem Statements


Telcos are not traditionally nimble



     Telcos look at customers in groups, not individually.



          Telcos have very little idea what drives customer
          behavior



                Telcos have no idea how to influence customer behavior


                     Even if they knew how to influence customer behavior,
                     Telcos do not have the nimble decisioning tools required
                     to impact customer behavior in real time.
Ecosystem, Taxonomies
          and Supplier Review
           Understanding the many suppliers,
           technology camps, and approaches




© 2012 Alan Quayle Business and Service Development   115
Structure Part 4 of 5
•   15:00 Ecosystem, Taxonomies and                          •      Case Studies
    Suppliers: Understanding the many                        •      Real Time Analytics for Big Data Lessons from
    suppliers, technology camps, and                                Facebook
                                                                    o    Quick technology review
    approaches
                                                                    o    Facebook Real-time Analytics System
•   Taxonomy of Big Data Companies
                                                                    o    Goal
•   Big Data Landscape                                              o    Actual Analytics
•   Cloudera                                                        o    Solution
•   Autonomy                                                        o    Memory, Collocate, Economics
•   Vertica                                                  •      Real Time Analytics for Big Data Lessons from
•   InfoChimps                                                      Twitter
•   Guavas                                                          o    Requirements
                                                                         Actual Analytics
•   Matrix
                                                                    o

                                                                    o    Challenges
                                                                    o    Performance
                                                                    o    One data any API
                                                                    o    Solution
                                                                    o    Memory, Collocate, Economics
                                                             •      Other Case Studies
                                                             •      Orbitz, Hertz, Yelp


                                 © 2012 Alan Quayle Business and Service Development                           116
provides integrated solutions to enable
   rapid decisions on big data for CSPs

Guavus delivers    Unique ability to   Patent pending        Current
    big data      rapidly fuse huge       streaming     customers include
 solutions,         quantities of          analytics    leading wireless,
    not just          data from          technology       IP, and video
  technology      diverse sources      proven over           service
 components                              10+ years         providers
Guavus at a Glance


Silicon Valley
Venture Backed    • US HQ in San Mateo, CA, R&D Offices in India
Company           • Raised $48 Million, 350 employees worldwide



                  • 3 of the top 5 NA mobile operators, 3 of the top
Tier-1 CSP          5 IP / MPLS backbone carriers, & CDN Networks
Customers &
                  • 4 of the top 6 largest global communications
Partnerships
                    infrastructure equipment vendors

                  •   Mature (10+ years) patent-pending technology
Industry Proven
& Recognized
Guavus Empowers LOB to Make Decisions
  Information Systems                                                                       Devices & Networks

Enterprise
Apps




                                                                                                   Networks
Databases
                            Data at Rest              Data in Motion
                               Views                      Flows




Data
Warehouses




               Finance &              Network &                                                 Customer
                                                           Marketing       Executives
               Regulatory             Operations                                                Care & Sales
              • Profitability        • Traffic            • Customer       • Continuous        • Churn Prediction
                Analysis               Engineering          Segmentation     Business          • Focused
              • Tiered Pricing       • Capacity           • Campaign         Optimization        Prospecting
                Optimization           Planning             management     • Predictive        • Targeted Up-Sell
              • Contract/SLA         • Peering                               Planning            & Cross-Sell
                Enforcement            Optimization



             Data Collection, Fusion and Mining Across Disparate Data
             Sources
Operator Challenges in a Big Data World




EXPONENTIAL       DATA      TIMELY        DISTRIBUTED
[ STREAMING ]   SITTING    INSIGHTS        NETWORK
DATA GROWTH     IN SILOS                  GENERATION
Key Data Sources & Insights

 CONTENT    INTERNET      CDN        Streaming
PROVIDERS                            Analytics
                                      Insights
                                    Content trending
                                     & consumption
                                     Fused network
                                         events
                                       Subscriber
   EDGE      ACCESS      CPE OR      dynamic usage
 NETWORK    NETWORK    END DEVICE       profiles
                                     Network usage
                                        patterns
                                     Policy control
                                       functions
Transforming the Big Data Analytics Economic Model

         Traditional                                      Streaming Centric
    Centralized, Store-First                          Distributed, Compute-First
         Architecture                                        Architecture




                                                  TRANSPORT


                                                              STORAGE
TRANSORT           STORAGE      COMPUTE                                 COMPUTE
                                [ Insights ]                            [ Insights ]



RESOURCES & TIME                                 RESOURCES & TIME

                                               • Move processing to data edge
• Consolidate data in a repository
                                               • Focus spend on analytics first
• Transport and store data- Transport
                                               • Continuous processing yields timely and
  and storage costs alone may put it
                                                 actionable insights
  over budget
                                               • Reduce overall spend per new analytics
• Project may not even get started
                                                 questions
                                               • Leverage off the shelf low cost processing
                                                 and storage
Big Data Streaming Analytics Architecture
  Analytics Applications Examples                                   3rd Party Feeds & Customer Tools
                                                                 Data         Market                         Capacity
                                    Broadban                                                    Ad
        Mobility       Digital                                 Warehouse     Research                        Planning
                                        d                                                    Targeting
                       Media                                       s




Centralized                                                                                             Clustering
                                 Master            Master           Business        Machine
Compute &                                                                                                   &
                                 Fusion          Aggregation         Logic          Learning
                                                                                                        Classifying
Analyze

   Distributed Site 1                        Distributed Site 2                         Distributed Site 3
    Aggregation                                  Aggregation                              Aggregation
    Data Fusion                                  Data Fusion                              Data Fusion
  Streaming / Batch      Local               Streaming / Batch        Local             Streaming / Batch        Local
       Ingest                                     Ingest                                     Ingest
                       Data Store                                   Data Store                                 Data Store


                                                                                   Media         Service
 DPI           PDN       Flow &           AAA          Web          Web                                    Advertisin
                                                                                 Type Meta     Consumption
 Data          Flows     Routing          Data        Activity    Taxonomy                                  g Traffic
                                                                                   Data           Traffic

                                                    Data Sources
Guavus Analytics Platform Details
Guavus Applications                       Customer UI Portals                       Insight Discovery                3rd Party System Support
         Mobility Reflex                        Consumer                            Guavus External API                     Network Management,
                                                Reporting                             & POC Sandbox                          Field Inventory, etc.
                  IP Reflex
                                                Enterprise
               CDN Reflex                       Reporting
                 Ad Reflex                                                                 API
                                                                                                              Data Stores (IT, DWH, Cloud)
                                                                Cube API             HBASE API
                                                                    SQL               SQL/Hive            Ingest       Export
 Processing Pipeline




                                 Caching Compute Nodes                                                                                  …         XDR
   Guavus Stream




                              ( Bus Cubes, Machine Learning Caching )
                                                                              Analysis Store                                            Traditional
                                                                                                                                        ETL Layers
                                                 Central Compute
                                           ( Fusion, Aggregation & Compute )
                                                                                                               Data Store

                        Distributed Data                 Distributed Data                Distributed Data
                           Collectors                       Collectors                      Collectors



                                                                                                                                                        Inventor
                       DPI         PCMD         IPDR         NetFlow       RADIUS          DNS            …                   PM / FM       CRM
                                                                                                                                                            y


                                                       Streaming Data Feeds
Matrixx. Parallel-MATRIXX™

•   Parallel-MATRIXX™ technology has completely re-invented
    transactional real-time and eliminated limitations with
    contemporary technologies described earlier.
•   The Next slide identifies the Parallel-MATRIXX™ functional
    architecture based on multiple patented technologies, and offering a
    performance improvement of at least two orders of magnitude
    relative to legacy approaches.




                          © 2012 Alan Quayle Business and Service Development   125
© 2012 Alan Quayle Business and Service Development   126
Matrixx. Algebraic-Decision Engine

•   OCS raters can be broadly classified as rule- or data driven.
    o   The former offer great flexibility to configure rating scenarios of arbitrary sophistication
        but which can become challenging to maintain beyond a certain complexity.
    o   Data driven systems typically offer a rich catalog of off-the-shelf templates that are easily
        configured to create real offers.
•   These templates are “baked” into code so performance can be highly optimized.
    The challenge with this approach arises when no suitable template is available,
    often requiring complex and costly customization.
•   With respect to real-time performance, both approaches share a common
    weakness. Every transaction results in execution of conditional logic reflecting
    the rating discriminators (if weekend, and if URL is On-net, and if…).
•   As rating, or indeed policy, rules become more sophisticated, execution code
    paths extend and performance degrades – often unpredictably.


                                   © 2012 Alan Quayle Business and Service Development            127
Matrixx. Algebraic-Decision Engine
•   The Parallel-MATRIXX™ Algebraic-Decision engine eliminates this degradation by
    building on the simple principle that any pricing concept can be represented as a set
    of mathematical equations.
•   Modern CPUs capable of 200 million multiplications per second are exceptionally
    efficient at solving such equations.
•   Pricing plans, offers, and policies are configured via a GUI and transparently
    compiled into an n-dimensional matrix where each dimension corresponds to a
    rating normalizer (such as time, location, service, etc.).
•   Stored at each matrix “intersection” is a linear equation representing the rating
    formula to be applied. As each transaction is mapped to the relevant intersection,
    solution of the associated linear equation is extremely fast.
•   As offers are extended with additional normalizers (for example, adding a device
    dependency to offer lower rates for a promoted device), the matrix dimensionality is
    extended accordingly. This simply results in a few additional CPU cycles to solve the
    rate equation with no significant impact on latency.
                                 © 2012 Alan Quayle Business and Service Development     128
Contention-Free In-Memory Database and Parallel-
    MATRIXX™ Processing
•   Maintaining data and transaction integrity is a mission-critical requirement for any
    database containing CSP customer or financial data. For example, an attempt to
    transfer funds between two customers must complete successfully or be cleanly
    aborted.
•   A situation where the donor’s account is debited but some technical failure results in
    the recipient not receiving the funds would leave the database in an invalid state.
•   As described earlier, current real-time systems rely heavily on OLTP and locking
    techniques to assure data integrity but which can lead to rapidly degrading and
    unpredictable performance.
•   Parallel-MATRIXX™ technology is based on an in-memory database that does not
    utilize locking while still supporting full ACID-compliant transactions.
•   No transaction is ever blocked from accessing or updating data while newly developed
    algorithms detect and resolve transaction conflicts.

                                 © 2012 Alan Quayle Business and Service Development      129
© 2012 Alan Quayle Business and Service Development   130
Case Studies
           Understanding where big data is used in
           practice




© 2012 Alan Quayle Business and Service Development   131
Structure Part 4 of 5
•   15:00 Ecosystem, Taxonomies and                          •      Case Studies
    Suppliers: Understanding the many                        •      Real Time Analytics for Big Data Lessons from
    suppliers, technology camps, and                                Facebook
                                                                    o    Quick technology review
    approaches
                                                                    o    Facebook Real-time Analytics System
•   Taxonomy of Big Data Companies
                                                                    o    Goal
•   Big Data Landscape                                              o    Actual Analytics
•   Cloudera                                                        o    Solution
•   Autonomy                                                        o    Memory, Collocate, Economics
•   Vertica                                                  •      Real Time Analytics for Big Data Lessons from
•   InfoChimps                                                      Twitter
•   Guavas                                                          o    Requirements
                                                                         Actual Analytics
•   Matrix
                                                                    o

                                                                    o    Challenges
                                                                    o    Performance
                                                                    o    One data any API
                                                                    o    Solution
                                                                    o    Memory, Collocate, Economics
                                                             •      Other Case Studies
                                                             •      Orbitz, Hertz, Yelp


                                 © 2012 Alan Quayle Business and Service Development                           132
© 2012 Alan Quayle Business and Service Development   133
© 2012 Alan Quayle Business and Service Development   134
© 2012 Alan Quayle Business and Service Development   135
© 2012 Alan Quayle Business and Service Development   136
© 2012 Alan Quayle Business and Service Development   137
Global Enterprise and
          Telecom Survey on Big
          Data and Real-Time
          Analytics




© 2012 Alan Quayle Business and Service Development   138
Structure
 •   Background
 •   The Questions
 •   The Importance of Analytics
 •   Impact of Big Data on Analytics
 •   Size of Data Sets, Number of Data Sources
 •   Update Frequency
 •   Integration of Data Sources
 •   Data Set Responsibility
 •   Types of Data, Types of Processing and Analytics
 •   Challenges
 •   Big Data Analytics Platforms
 •   Benefits and Plans
 •   Data Analytics Storage and IT Infrastructure Requirements
 •   Increasing Interest in Hadoop MapReduce Framework Technology
 •   Conclusions
                        © 2012 Alan Quayle Business and Service Development   139
Background

•   Global Survey
•   Across 200 business and IT executives, questioned in August and September
    2012
•   105 enterprise (non Telco), 55 Telco – all large enterprises (no mid-market
    analysis)
•   Non-Telco included web service providers, financial services, healthcare,
    manufacturing, retail, education, government, military, entertainment verticals
•   Generally VP level with a few CxO level, all decision makers with budget
    responsibilities
•   Generally known to me, or through my contacts as I was trying to gather frank
    reviews
•   Surprisingly similar across Telco and non-Telco



                             © 2012 Alan Quayle Business and Service Development    140
Importance of Enhancing Data
        Processing and Analytics versus all
               Business Priorities
              39%

  31%

                                        20%

                                                                           9%

                                                                                   1%

  Most       Top 5                   Top 10                         Top 20        Not
Important                                                                       Important


                     © 2012 Alan Quayle Business and Service Development                    141
Impact of Big Data on Analytics

•   There is much market hype surrounding the term big data. When asked what the
    term means to them, a majority of respondents indicated that it simply refers to very
    large data sets, see next slide.
•   The big data movement born from the Hadoop open source initiative has not reached
    most IT departments or even analytics professionals, as evidenced by the fact that
    only 11% of survey respondents associate Hadoop MapReduce with the concept of big
    data.
•   Most organizations’ analytics efforts to date have dealt with structured data, sourced
    through relational databases and data warehouses, and for the vast majority of
    analytical undertakings this makes sense.
•   But even organizations that have not been captured by the Hadoop movement are
    still increasingly under the gun to deal with larger data volumes, and the incursion of
    unstructured data. This, plus the many public examples of big data that have caught
    the imagination of business executives, have reinvigorated interest in data analytics.



                                 © 2012 Alan Quayle Business and Service Development     142
What does the term Big Data mean to you?

               Hadoop / MapReduce



         Web and search engine data



Problems in storing / processing data



                      Data Analytics



                    Dat Warehouses



                Very large databases



                 Very large data sets


                                        0%   10%           20%           30%            40%        50%   60%   70%   80%




                                             © 2012 Alan Quayle Business and Service Development                     143
Size of Data Sets

•   The majority (66%) of respondents revealed that the size of the
    largest data set on which their organization conducts analytics is no
    more than 5 terabytes (TB).
•   Overall, the largest data analytics set is approximately 10 TB.
•   While these numbers might not reflect the expectations that often
    accompany the concept of big data, the reality is that processing
    even gigabytes of data at a time during traditional analytics
    exercises is significant.




                           © 2012 Alan Quayle Business and Service Development   144
What is the Largest Data Set?
                                  32%




                   20%
                                                      19%




                                                                          11%
          9%


 5%
                                                                                  3%
                                                                                          1%


<250GB   <500GB   <1TB            <5TB               <10TB               <25TB   <50TB   >50TB




                         © 2012 Alan Quayle Business and Service Development                     145
Number of Data Sources

•   A significant part of data analytics exercises is the amalgamation of
    data from multiple disparate sources.
•   The next slide show 57% of these organizations are pulling from at
    least three unique data sources, and one-quarter (25%) are
    integrating data from five or more sources.




                          © 2012 Alan Quayle Business and Service Development   146
Number of Data Sources


                                25%



                21%



                                                           17%
                                                                             16%



    12%


                                                                                   9%




Single Source    2               3                           4                5    >5



                       © 2012 Alan Quayle Business and Service Development              147
Update Frequency

•   Many organizations identified improving business intelligence and/or delivery of
    real-time business information as a key business initiative that will have an
    impact on IT spending decisions.
•   Considering the volumes of data organizations intend to analyze in shorter
    timeframes, organizations will need to evaluate whether their current
    approaches are adaptable to these demanding and constantly changing
    requirements. As part of the same spending survey, organizations also identified
    major application deployments or upgrades as a top IT priority, which is
    significant since every newly deployed or upgraded application will have a
    corresponding impact on existing data integration processes.
•   When asked about the rate with which their largest data set data is updated,
    nearly two thirds (65%) of organizations revealed that the changes take place at
    an either real-time or near real-time pace.

                              © 2012 Alan Quayle Business and Service Development   148
Frequency of Update
                                         37%
                                                                            35%




      28%




Realtime (streams)                  Near realtime                           Batch




                      © 2012 Alan Quayle Business and Service Development           149
Integration of Data Sources
•   When asked about the primary method to integrate data sources
    comprising their organization’s largest data sets, nearly four fifths of
    respondents identified purpose-built applications such as
    Informatica, Oracle, and Teradata.
•   An additional 30% use custom extract, transform, load (ETL) scripts
    or custom extract, load, transform (ELT) scripts for data source
    integration purposes.




                            © 2012 Alan Quayle Business and Service Development   150
Main Method of Integrating Data Sources

    39%




                      30%




                                                      12%
                                                                                         10%
                                                                                                     9%




Purpose built      Custom ETL                          EAI                            Open Source   Other




                                © 2012 Alan Quayle Business and Service Development                         151
Data Set Responsibility
•   In terms of the sources responsible for populating organizations’ largest data
    sets, nearly half (51%) of respondents identified back office applications, such as
    resource planning, human capital management, and accounting systems.
    o   For example, many years of order or payment information can yield useful insight into
        customer patterns.
•   Another common source involves the information gleaned from corporate data
    centers and computer networks in the form of network traffic and system log
    files. This information is important to not only those organizations looking to
    maximize network and system performance and utilization metrics, but also to
    those that rely on security analytics to help shape information privacy and
    information protection strategies.
•   Enterprise organizations were significantly more likely to identify internal back
    and front office applications, internal data center or computer networks, e-
    commerce applications (i.e., point-of-sale, supply chain, etc.), and scientific
    research as data sources that comprise their largest data sets.

                                  © 2012 Alan Quayle Business and Service Development           152
Responsible for Populating Data Set

 Scientific research    7%


        Third Party          10%


External public data          11%


         Telemetry           10%


       Social media            12%


  Web Applications                                                                  34%


        Front office                                                                  35%


Internal data center                                                                        45%


Internal back-office                                                                              51%




                                    © 2012 Alan Quayle Business and Service Development                 153
Types of Data

•   What data types end up in organizations’ largest data sets from the
    aforementioned sources? More than half (52%) of respondents indicated that
    their largest data set is comprised of database data.
•   Nearly half (48%) of organizations have some measure of transactional data—
    such as point-of-sale (POS) or inventory—residing in their largest data set.
•   What is interesting is the number of organizations that report that unstructured
    data—especially machine-generated content such as log files and sensor data—
    populates their largest data sets. These data types precipitated the concept of big
    data and there are emerging signs that these will consume a vast amount of
    bandwidth, compute, and storage resources. Probably the most significant
    takeaway is that big data becomes really big when an organization starts to see
    unstructured / machine-generated data grow to the size of—or even surpass—
    relational information, which will serve to further exacerbate the integration
    challenges mentioned above.

                              © 2012 Alan Quayle Business and Service Development    154
Source of Data

         Sensor data    9%


       Audio / video         11%


        Web log files                 16%


       Location data                       18%


     Text / messages                         19%


            Log files                               22%


   Office documents                                                   30%


Transaction database                                                                 48%


 Relational database                                                                       52%




                               © 2012 Alan Quayle Business and Service Development               155
Challenges

•   When asked to identify the data processing and/or analytics challenges
    associated with their organization’s largest data set, nearly half cited security /
    regulation / compliance.
•   Personally identifiable information (PII) and other sensitive information is what
    drives this.
•   About one third of respondents identified data quality (35%) and data cleansing
    tasks (33%) since data cleansing and preparation was categorized as the most
    time-consuming data processing and analytics activity.
•   While lack of skills is a middle of the pack challenge according to respondents.
•   Clearly, responses involving process-related considerations (i.e., data security,
    integration, cleansing, etc.) gravitated to the top of the challenges list




                               © 2012 Alan Quayle Business and Service Development        156
Data Processing Challenges

                     Lack of Skills                       17%



                             Costs                          18%



             Data Synchronization                             19%



            Business expectations                                         25%



                  Data integration                                                29%



                        Cleansing                                                       32%



                      Data quality                                                            35%



Security / Regulation / Compliance                                                                  48%




                                      © 2012 Alan Quayle Business and Service Development                 157
Benefits

•   Cost containment is still an important business initiative to many
    organizations, especially when it comes to IT investments.
•   More than half (55%) of respondents identified reduced costs as a
    key benefit associated with their data analytics platform.
•   Other top benefits centered on simplicity and efficiency, including
    easier management and process improvements, as well as improved
    business agility, which is particularly significant since business
    requirements are constantly changing when it comes to data
    analytics.



                          © 2012 Alan Quayle Business and Service Development   158
Benefits from Data Analytics Platform

     Fraud detection                               21%




   Event monitoring                                         25%




     Better accuracy                                                        32%




     Business agility                                                         33%




Process improvement                                                                   37%




      Cost reduction                                                                        55%




                                © 2012 Alan Quayle Business and Service Development               159
Conclusions and
          Recommendations




© 2012 Alan Quayle Business and Service Development   160
Recommendations to the Big Data Buyer
•   Recognize the value of unified information access and analysis in supporting
    fact-based decisions by individuals, groups, and systems.
•   Recognize the shortcomings of operating without having the right information at
    the right time. Use this awareness to help build the business case for addressing
    those shortcomings – fine an anchor tenant for the project. NO ENTERPRISE
    WIDE PLATFORM PROJECTS YET, LOOK TO THE CLOUD.
•   Formulate a Big Data strategy that includes evaluation of decision makers‘
    requirements, decision processes, existing and new technology, and availability
    and quality of data. NOT TECHNOLOGY LED.
•   The application of Big Data technology will fall into two primary categories:
    o   doing more efficiently (including at lower costs) tasks that have been done for years and doing completely new things
        that were never before possible,
    o   Driving up long-term strategic organizational value.
    o   Identify opportunities to apply Big Data to both.

                                             © 2012 Alan Quayle Business and Service Development                       161
Recommendations to the Big Data Buyer
•   Beware of the confusion and hyperbolic marketing in the Big Data
    market today. WE ARE AT PEAK BS.
•   IT organizations will need to consider a coordinated approach to
    planning implementations - when more than one project exists.
•   It is important to develop an IT infrastructure strategy that optimizes
    the server, storage, and network resources. Well-developed plans for
    networking support of Big Data projects should address optimizing the
    network both within a Big Data domain and in the connection to
    traditional enterprise infrastructure. LEGACY MATTERS.
•   Consider the breadth of Big Data technologies and the functionality each
    technology brings to the overall portfolio of tools for collecting,
    accessing, analyzing, monitoring, and managing data.
                             © 2012 Alan Quayle Business and Service Development   162
Recommendations to the Big Data Vendor
•   Revenue opportunities exist at all levels of the Big Data technology stack as well
    as in services. Service is where the bulk of the growth exists.
•   Articulate your value proposition by connecting technology capabilities to
    business problems or opportunities. NOT TECHNOLOGY LED.
•   Big Data technology is not an end in itself. NOT TECHNOLOGY LED.
•   Recognize the value of Big Data to drive employee and customer decisions and
    actions.
•   Decide if you want to be a niche player or enter the mainstream.
    o   If the former, then build a network of consultants and partners to support your
        technology.
    o   If the latter, then build a business case that assumes eventual acquisition.
•   The growth in appliances, cloud, and outsourcing deals for Big Data technology
    will likely mean that end users will choose new applications and services, based
    less on the technology itself and more on the business value they deliver.
                                     © 2012 Alan Quayle Business and Service Development   163
Recommendations to the Big Data Vendor
•   Whether the application is based on a database or is search based, and whether
    the database is row based or column based, is in-memory or disk based, or uses
    SQL or NoSQL technologies will become less relevant over time. Thus
    technology will provide only a short-lived competitive advantage to any vendor.
•   System performance, availability, security, and manageability will all matter
    greatly. However, how they are achieved will be less of a point for
    differentiation.
•   HPC vendors have an edge in Big Data because leading-edge data-intensive
    computing has been an integral part of HPC for decades.
•   Most HPC Big Data work involves established methods of analyzing increasingly
    large data volume related to numerical modeling and simulation.



                               © 2012 Alan Quayle Business and Service Development   164
Recommendations to the Big Data Vendor
•   Vendors should tout, not hide, their HPC histories. A number of vendors
    with HPC origins and strong HPC reputations have not capitalized on these
    assets when attempting to address Big Data markets outside of HPC.
•   It is better to position your high-end HPC experience as a strength for
    meeting the presumably less-difficult, data-intensive challenges in the
    mainstream market.
•   Useful tools are largely lacking for very large data sets. Tools such as
    Hadoop and MapReduce can effectively expedite searches through the large,
    irregular data sets that characterize some of the newer Big Data problems.
•   These tools can be great for retrieving and moving through complex data,
    but they do not allow researchers to take the next step and pose intelligent
    questions. In addition, the going gets tough when data sets cross the 100TB
    threshold.
                             © 2012 Alan Quayle Business and Service Development   165
Recommendations to the Big Data Vendor
•   Sophisticated tools for data integration and analysis on this scale are largely
    lacking today. There are opportunities to create tools and applications for Big
    Data. Vendors that create tools and applications for use at this scale can use
    them as a lever to seize market leadership positions in the Big Data market.
•   Not all Big Data use cases involve analytics. Analytics may be at the heart of
    most Big Data opportunities in the enterprise market, but there are also
    opportunities to support operational workloads and information access
    applications.
•   Some of the emerging technologies and the vendors behind them will likely end
    up as components or features of broader information management, access, and
    analysis platforms of larger vendors. Specialized application and service
    providers with localized and industry expertise will be critical to expanding the
    market.
                               © 2012 Alan Quayle Business and Service Development    166
Walk or Run to Big Data?
It depends on your situation. For most telcos the move
 to Big Data will be incremental and complementary to
           existing platforms and investments.
 Focus on the solution: the application of the Analytics
  to the Business – people and process not technology.
                   © 2012 Alan Quayle Business and Service Development   167

More Related Content

What's hot

Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
 
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentationMassTLC
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industryParviz Iskhakov
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?Kun Le
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Donghui Zhang
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesShilpi Sharma
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataDavid Pittman
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4Frazer Clement
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Denodo
 

What's hot (20)

Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
 
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentation
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industry
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
 

Viewers also liked

Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
Evolution Of The Computers
Evolution Of The ComputersEvolution Of The Computers
Evolution Of The Computerspanitiaict
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Software Architecture and Design - An Overview
Software Architecture and Design - An OverviewSoftware Architecture and Design - An Overview
Software Architecture and Design - An OverviewOliver Stadie
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...
Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...
Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...Svetlin Nakov
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 

Viewers also liked (20)

Big data and its impact on indian business
Big data and its impact on indian businessBig data and its impact on indian business
Big data and its impact on indian business
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
3.1.2 classification of network
3.1.2 classification of network3.1.2 classification of network
3.1.2 classification of network
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Evolution Of The Computers
Evolution Of The ComputersEvolution Of The Computers
Evolution Of The Computers
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Layered Software Architecture
Layered Software ArchitectureLayered Software Architecture
Layered Software Architecture
 
Software Architecture and Design - An Overview
Software Architecture and Design - An OverviewSoftware Architecture and Design - An Overview
Software Architecture and Design - An Overview
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...
Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...
Architectural Patterns and Software Architectures: Client-Server, Multi-Tier,...
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 

Similar to Telco Big Data Workshop Sample

Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
doolyk_rev_p_001.compressed
doolyk_rev_p_001.compresseddoolyk_rev_p_001.compressed
doolyk_rev_p_001.compressedDoolytics
 
Cloud Opportunities for Local Governmen
Cloud Opportunities for Local GovernmenCloud Opportunities for Local Governmen
Cloud Opportunities for Local GovernmenTim Willoughby
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalDenodo
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionWilliam Visterin
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceData Science Milan
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overviewnickychu
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )fmarukanda
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Cloud Security - Cloud Arena - Tim Willoughby
Cloud Security - Cloud Arena - Tim WilloughbyCloud Security - Cloud Arena - Tim Willoughby
Cloud Security - Cloud Arena - Tim WilloughbyTim Willoughby
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly
 

Similar to Telco Big Data Workshop Sample (20)

Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
doolyk_rev_p_001.compressed
doolyk_rev_p_001.compresseddoolyk_rev_p_001.compressed
doolyk_rev_p_001.compressed
 
Cloud Opportunities for Local Governmen
Cloud Opportunities for Local GovernmenCloud Opportunities for Local Governmen
Cloud Opportunities for Local Governmen
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
 
Actuarial Analytics in R
Actuarial Analytics in RActuarial Analytics in R
Actuarial Analytics in R
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data Revolution
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overview
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Cloud Security - Cloud Arena - Tim Willoughby
Cloud Security - Cloud Arena - Tim WilloughbyCloud Security - Cloud Arena - Tim Willoughby
Cloud Security - Cloud Arena - Tim Willoughby
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 

More from Alan Quayle

Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...Alan Quayle
 
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...Alan Quayle
 
What makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias GoebelWhat makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias GoebelAlan Quayle
 
eSIM as Root of Trust for IoT security, João Casal
eSIM as Root of Trust for IoT security, João CasaleSIM as Root of Trust for IoT security, João Casal
eSIM as Root of Trust for IoT security, João CasalAlan Quayle
 
Architecting your WebRTC application for scalability, Arin Sime
Architecting your WebRTC application for scalability, Arin SimeArchitecting your WebRTC application for scalability, Arin Sime
Architecting your WebRTC application for scalability, Arin SimeAlan Quayle
 
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...Alan Quayle
 
Programmable Testing for Programmable Telcos, Andreas Granig
Programmable Testing for Programmable Telcos, Andreas GranigProgrammable Testing for Programmable Telcos, Andreas Granig
Programmable Testing for Programmable Telcos, Andreas GranigAlan Quayle
 
How to best maximize the conversation data stream for your business? Surbhi R...
How to best maximize the conversation data stream for your business? Surbhi R...How to best maximize the conversation data stream for your business? Surbhi R...
How to best maximize the conversation data stream for your business? Surbhi R...Alan Quayle
 
Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois
Latest Updates and Experiences in Launching Local Language Tools, Karel BourgoisLatest Updates and Experiences in Launching Local Language Tools, Karel Bourgois
Latest Updates and Experiences in Launching Local Language Tools, Karel BourgoisAlan Quayle
 
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...Alan Quayle
 
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...Alan Quayle
 
Open Source Telecom Software Survey 2022, Alan Quayle
Open Source Telecom Software Survey 2022, Alan QuayleOpen Source Telecom Software Survey 2022, Alan Quayle
Open Source Telecom Software Survey 2022, Alan QuayleAlan Quayle
 
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei IancuOpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei IancuAlan Quayle
 
TADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
TADS 2022 - Shifting from Voice to Workflow Management, Filipe LeitaoTADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
TADS 2022 - Shifting from Voice to Workflow Management, Filipe LeitaoAlan Quayle
 
What happened since we last met TADSummit 2022, Alan Quayle
What happened since we last met TADSummit 2022, Alan QuayleWhat happened since we last met TADSummit 2022, Alan Quayle
What happened since we last met TADSummit 2022, Alan QuayleAlan Quayle
 
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike BromwichStacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike BromwichAlan Quayle
 
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...Alan Quayle
 
Founding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
Founding a Startup in Telecoms. The good, the bad and the ugly. João CamarateFounding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
Founding a Startup in Telecoms. The good, the bad and the ugly. João CamarateAlan Quayle
 
How to bring down your own RTC platform. Sandro Gauci
How to bring down your own RTC platform. Sandro GauciHow to bring down your own RTC platform. Sandro Gauci
How to bring down your own RTC platform. Sandro GauciAlan Quayle
 

More from Alan Quayle (20)

What is a vCon?
What is a vCon?What is a vCon?
What is a vCon?
 
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
 
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
 
What makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias GoebelWhat makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias Goebel
 
eSIM as Root of Trust for IoT security, João Casal
eSIM as Root of Trust for IoT security, João CasaleSIM as Root of Trust for IoT security, João Casal
eSIM as Root of Trust for IoT security, João Casal
 
Architecting your WebRTC application for scalability, Arin Sime
Architecting your WebRTC application for scalability, Arin SimeArchitecting your WebRTC application for scalability, Arin Sime
Architecting your WebRTC application for scalability, Arin Sime
 
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
 
Programmable Testing for Programmable Telcos, Andreas Granig
Programmable Testing for Programmable Telcos, Andreas GranigProgrammable Testing for Programmable Telcos, Andreas Granig
Programmable Testing for Programmable Telcos, Andreas Granig
 
How to best maximize the conversation data stream for your business? Surbhi R...
How to best maximize the conversation data stream for your business? Surbhi R...How to best maximize the conversation data stream for your business? Surbhi R...
How to best maximize the conversation data stream for your business? Surbhi R...
 
Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois
Latest Updates and Experiences in Launching Local Language Tools, Karel BourgoisLatest Updates and Experiences in Launching Local Language Tools, Karel Bourgois
Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois
 
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
 
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
 
Open Source Telecom Software Survey 2022, Alan Quayle
Open Source Telecom Software Survey 2022, Alan QuayleOpen Source Telecom Software Survey 2022, Alan Quayle
Open Source Telecom Software Survey 2022, Alan Quayle
 
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei IancuOpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
 
TADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
TADS 2022 - Shifting from Voice to Workflow Management, Filipe LeitaoTADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
TADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
 
What happened since we last met TADSummit 2022, Alan Quayle
What happened since we last met TADSummit 2022, Alan QuayleWhat happened since we last met TADSummit 2022, Alan Quayle
What happened since we last met TADSummit 2022, Alan Quayle
 
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike BromwichStacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
 
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
 
Founding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
Founding a Startup in Telecoms. The good, the bad and the ugly. João CamarateFounding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
Founding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
 
How to bring down your own RTC platform. Sandro Gauci
How to bring down your own RTC platform. Sandro GauciHow to bring down your own RTC platform. Sandro Gauci
How to bring down your own RTC platform. Sandro Gauci
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Telco Big Data Workshop Sample

  • 1. Introduction to Big Data and Real Time Analytics Workshop Telco Big Data & Real Time Analytics Summit 2012 3-5 December 2012, London www.alanquayle.com/blog © 2012 Alan Quayle Business and Service Development 1
  • 2. "There are three kinds of lies: lies, damned lies, and statistics." British Prime Minister Benjamin Disraeli (1804–1881), or perhaps Samuel Langhorne Clemens (1835 – 1910) better known as Mark Twain © 2012 Alan Quayle Business and Service Development 2
  • 3. Never Forget This! People Most projects fail here Process Technology © 2012 Alan Quayle Business and Service Development 3
  • 4. The Data Tsunami! © 2012 Alan Quayle Business and Service Development 4
  • 5. Why are we measuring so many things? • Atoms vibrate at about 10^13 Hz, assuming we only measure the atom and not the subatomic constituents to the resolution of only 1 byte, that’s 10TB per second • Now there are rough 7*10^27 atoms in the human body • So just monitoring one human body’s atoms will generate 7*10^40 bytes per second. • That’s 2*10^48 bytes in a year, that’s 2 yotta yotta bytes • By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes, that’s only 35*10^21 • Its easy (fun) to play with numbers! Lies, damned lies and statistics! • We do not need to measure each revolution of an airplane’s turbine, only when an event (out of tolerance) occurs does it matter. o Events and collecting what matters, NOT collecting everything all the time! o How do we know what matters? Common sense, knowing your business and experimentation! © 2012 Alan Quayle Business and Service Development 5
  • 6. Beware the “Bait and Switch” © 2012 Alan Quayle Business and Service Development 6
  • 7. Data You Need Lots of It!! © 2012 Alan Quayle Business and Service Development 7
  • 8. But There’s a Shortage of Data Scientists to Do Anything With It © 2012 Alan Quayle Business and Service Development 8
  • 9. So Give Me All Your Money © 2012 Alan Quayle Business and Service Development 9
  • 10. Introduction • The purpose of this one day workshop is to provide both an introduction and pragmatic insight into Big Data, Data Science and Real-Time Analytics. • This course will provide a frank and objective review of the state of the art and the market. Examining what is working in practice and what is not through an extensive series of case studies. • Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data. o Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes. o A new platform of "big data" tools has arisen to handle sense-making over large quantities of data, for example the Apache Hadoop Big Data Platform. • Analyzing large data sets in near real-time is not new, business intelligence is as old as business itself (that is as old as human society). o IT automated it, and enabled an organization to own it rather than in the wet-ware of a few human brains (generally the owners of a business.) o Some real-time analysis results in automated triggers, so called machine learning, most analysis still requires human interpretation which is not straight forward. o Analysis of such large and mixed data sources has its own problems, as we’ll discuss in the course. o Privacy and regulation cannot be ignored, for some industries this will limit the application of Big Data. © 2012 Alan Quayle Business and Service Development 10
  • 11. Structure Part 1 of 5 • 09:00 Registration • 09:30 History and Overview: Understanding Big Data and Real-Time Analytics in Context • What do we mean by Big Data? • History of Big Data • Why does Big Data matter? • Taxonomy of Big Data Companies • Big Data Maturity • Big Data Landscape • The 3Vs” Volume, Variety and Velocity • List of Companies in Big Data (and their Big • What are the Domains of Big Data? Data revenues) • Big Data Technologies • Big Data Market Sizing • What Enterprises Think of Big Data • Telecoms and Real-Time • How Enterprise Verticals are Impacted by Big Data • O2 More: Proof we can do it! • Why Now? • Key Trends driving towards Big Data • 10:45 Coffee Break © 2012 Alan Quayle Business and Service Development 11
  • 12. Structure Part 2 of 5 • 11:00 Quick Technology Review: Diving into a little detail on a few of the key technologies (only as deep as the architecture) to understand their history and capabilities / limitations • Hadoop o What is Hadoop? o Ecosystem o History o Design Axioms o Hadoop Distributed File System o MapReduce: Distributed Processing o Architecture o Data Schemas o Query Language Flexibility o Economics o Case Studies • Hadoop and Hbase in the Cloud (Amazon) • NoSQL and Cassandra + some use cases • Hbase versus Cassandra • Graph Database introduction © 2012 Alan Quayle Business and Service Development 12
  • 13. Structure Part 3 of 5 • The Social Enterprise o Business Benefits ALU example • 12:00 & 14:00 Application of Big Data o o Drivers • Hardware and Software Trends o Social + Data Analysis = Business o Execution and Results Characteristics intelligence o Framework: Ecosystem, Application Services, Data o AT&T Case Study Management o Lessons Learned • Real-Time Analytics • Telcos and Big Data o Use Cases o TMF Survey o Extended RDMS versus MapReduce / Hadoop o Big Data Framework o Requirements, Trends, People and Organization o Predictive / Adaptive Analytics Issues, Outlook o Decision Engineering • Big Data and the Cloud o The Problem with Telecom Why the Cloud and Big Data? o • Telco Analytics o Cloud benefits o Customer Profiling o Use Cases: Bankinter, Etsy, Razorfish o Next Product Tools o Marketing Mix Modeling o Cost of Acquisition Tools o Case Study • 13:00ish Lunch © 2012 Alan Quayle Business and Service Development 13
  • 14. Structure Part 4 of 5 • 15:00 Ecosystem, Taxonomies and • Case Studies Suppliers: Understanding the many • Real Time Analytics for Big Data Lessons from suppliers, technology camps, and Facebook o Quick technology review approaches o Facebook Real-time Analytics System • Taxonomy of Big Data Companies o Goal • Big Data Landscape o Actual Analytics • Cloudera o Solution • Autonomy o Memory, Collocate, Economics • Vertica • Real Time Analytics for Big Data Lessons from • InfoChimps Twitter • Guavas o Requirements Actual Analytics • Matrix o o Challenges o Performance o One data any API o Solution o Memory, Collocate, Economics • Other Case Studies • Orbitz, Hertz, Yelp © 2012 Alan Quayle Business and Service Development 14
  • 15. Structure Part 5 of 5 • 16:00 Global Enterprise and Telecom Survey on Big Data and Real-Time Analytics • Background • The Questions • The Importance of Analytics • Impact of Big Data on Analytics • Size of Data Sets, Number of Data Sources • Update Frequency • Integration of Data Sources • Data Set Responsibility • Types of Data, Types of Processing and Analytics • Challenges • Big Data Analytics Platforms • Benefits and Plans • Data Analytics Storage and IT Infrastructure Requirements • Increasing Interest in Hadoop MapReduce Framework Technology • Conclusions • Recommendations and Wrap Up © 2012 Alan Quayle Business and Service Development 15
  • 16. Alan Quayle • 22 years of experience in the telecommunication industry, focused on developing profitable new businesses in service providers, suppliers and start-ups. • Customers include o Operators such as AT&T, BT, Charter, Etisalat, M1, O2, Rogers, Swisscom, T-Mobile, Telstra, Time Warner Cable, Verizon and Vodafone; o Suppliers such as Adobe, Alcatel-Lucent, Ericsson, Huawei, Nokia Siemens Networks, and Oracle; and o Innovative start-ups such as Apigee, AppTrigger (sold to Metaswitch), Camiant (sold to Tekelec), OpenCloud, and Voxeo. • Work with the developer community and on the board of developers such as GotoCamera, hSenid Mobile, as well as suppliers such as Sigma Systems. • Weblog www.alanquayle.com/blog • Linkedin http://www.linkedin.com/in/alanquayle © 2012 Alan Quayle Business and Service Development 16
  • 17. A Thank You to Those helping me Put this Course Together • In putting this workshop together I’d like to thank the following suppliers for their time, openness, willingness to review, and provide material to ensure this workshop is up-to-the-minute. o And especially for not requiring any editorial control over the content or my views expressed in this material (in reverse alphabetically order). • Guavas • HP (don’t mention the Autonomy deal) • Versant, NoSQL database vendor • Ty Wang, social media entrepreneur using FB Social Graph • Lorien Pratt, Data / Decision Scientist with Telco focus • Amazon Web Services • Matrixx © 2012 Alan Quayle Business and Service Development 17
  • 18. Introductions • Spend 2 minutes to introduce yourself o Name, current employer and job o Let us know your favorite hobby • For me its hiking with my family o What you want to get out of this course • What topics are most important to you? 18 (c) 2012 Alan Quayle Business and Service Development
  • 19. History and Overview Understanding Big Data and Real-Time Analytics in Context © 2012 Alan Quayle Business and Service Development 19
  • 20. Structure • What do we mean by Big Data? • History of Big Data • Why does Big Data matter? • Taxonomy of Big Data Companies • Big Data Maturity • Big Data Landscape • The 3Vs” Volume, Variety and Velocity • List of Companies in Big Data (and • What are the Domains of Big Data? their Big Data revenues) • Big Data Technologies • Big Data Market Sizing • What Enterprises Think of Big Data • Telecoms and Real-Time • How Enterprise Verticals are Impacted • O2 More: Proof we can do it! by Big Data • Why Now? • Key Trends driving towards Big Data © 2012 Alan Quayle Business and Service Development 20
  • 21. What Do We Mean by Big Data? © 2012 Alan Quayle Business and Service Development 21
  • 22. IDC’s Definition of Big Data © 2012 Alan Quayle Business and Service Development 22
  • 23. What is Big Data © 2012 Alan Quayle Business and Service Development 23
  • 24. Why does Big Data Matter? © 2012 Alan Quayle Business and Service Development 24
  • 25. © 2012 Alan Quayle Business and Service Development 25
  • 26. © 2012 Alan Quayle Business and Service Development 26
  • 27. © 2012 Alan Quayle Business and Service Development 27
  • 28. Another Version of the 3 Vs • Volume: Data sets are expanding constantly. A strategic approach to big data takes into account ways to store and manage the huge volumes of data that are being generated. • Variety: Big data comes in many forms. Analyzing multi-structured data can yield important insights that can help direct a business strategy. • Velocity: The speed at which data is analyzed is everything, especially when working in a time sensitive business environment. © 2012 Alan Quayle Business and Service Development 28
  • 29. © 2012 Alan Quayle Business and Service Development 29
  • 30. © 2012 Alan Quayle Business and Service Development 30
  • 31. © 2012 Alan Quayle Business and Service Development 31
  • 32. © 2012 Alan Quayle Business and Service Development 32
  • 33. © 2012 Alan Quayle Business and Service Development 33
  • 34. What are the Domains of Big Data? © 2012 Alan Quayle Business and Service Development 34
  • 35. Big Data Technology Stack © 2012 Alan Quayle Business and Service Development 35
  • 36. Big Data Technologies © 2012 Alan Quayle Business and Service Development 36
  • 37. The Technology has Become Quite Fashionable © 2012 Alan Quayle Business and Service Development 37
  • 38. © 2012 Alan Quayle Business and Service Development 38
  • 39. © 2012 Alan Quayle Business and Service Development 39
  • 40. © 2012 Alan Quayle Business and Service Development 40
  • 41. © 2012 Alan Quayle Business and Service Development 41
  • 42. © 2012 Alan Quayle Business and Service Development 42
  • 43. Big Data Use Cases © 2012 Alan Quayle Business and Service Development 43
  • 44. © 2012 Alan Quayle Business and Service Development 44
  • 45. Companies in Big Data • Storage: HP, EMC, IBM, Dell, NetApp, Hitachi Ltd., Fujitsu, Oracle, NEC • Servers: IBM, HP, Dell, Oracle, Fujitsu, Acer, Cray, Groupe Bull, Hitachi, NEC, SGI, Stratus Technologies, Unisys, Cisco, Lenovo • Networking: Cisco, Brocade, HP, Dell, IBM, Alcatel-Lucent, F5 Networks, Citrix • Relational database software: Oracle Exadata, IBM Netezza, IBM Smart Analytics System, Teradata, HP Vertica and Autonomy, SAP Sybase IQ, EMC Greenplum DB and HD, Microsoft SQL Server Parallel Edition, IBM Netezza High Capacity Appliance, Teradata Extreme Performance Appliance, SAP-Sybase IQ • Hadoop-based data management and analysis software: Cloudera, MapR, EMC Greenplum HD, Oracle Big Data Appliance, IBM BigInsights, Hstreaming, Platfora, Zettaset, DataStax, Karmashere, Datameer, Hadapt, and so forth • XML databases: MarkLogic, Oracle XML DB, IBM pureXML, Software AG webMethods, Tamino XML Server, TigerLogic, Xyleme, and so forth © 2012 Alan Quayle Business and Service Development 45
  • 46. Companies in Big Data • Object-oriented databases: Jade Software, Objectivity, Progress Software, Versant • Graph databases: Neo Technology, Objectivity, Franz Inc., Sones, Ravel • Ultra-high-speed streaming data technologies: IBM InfoSphere Streams, Informatica Ultra Messaging Streaming Edition, TIBCO FTL and BusinessEvents, Progress Software Apama CEP • Analytics and discovery software: SAS, IBM, Attivio, HP Autonomy, Skytree, Attivio, Oracle Advanced Analytics, IBM SPSS, Microsoft, Vivisimo, ZyLAB, Sinequa, Revolution Analytics, KXEN, BA Insight, Palantir, Perfect Search, Wolfram Alpha • Decision support and automation software including applications: Webtrends, Adobe- Omniture, IBM Coremetrics, FICO • Services: Accenture, Deloitte, TCS, HP, Teradata, Mu Sigma, Think Big Analytics, • Hortonworks, Hashrocket, KloudData, Trendwise Analytics © 2012 Alan Quayle Business and Service Development 46
  • 47. Big Data Is a Big Market & Big Business - $50 Billion Market by 2017 (according to Wikibon) • Open source analyst firm Wikibon pegs the current Big Data market at just over $5 billion (IDC and others agree with) • Wikibon forecast the Big Data market will grow at a CAGR of 58% between now and 2017, hitting the $50 billion within five years. • Vendors from whales like IBM and HP to pure-plays like Vertica and Cloudera are bringing in significant revenue today helping enterprises, governments and healthcare organizations process and make sense of the torrents of unstructured data flowing from mobile devices, sensors, social media and other sources. • Today Big Data technologies like Hadoop are mostly in production at Web and online gaming companies, large financial services firms and banks, and online retailers. © 2012 Alan Quayle Business and Service Development 47
  • 48. Big Data Is Big Market & Big Business - $50 Billion Market by 2017 • Another important point is that, while Hadoop may be the poster child of Big Data, there are other important technologies at play. o Hadoop: open source framework for distributing data processing across multiple nodes, these include massively parallel data warehouses “that deliver fast data loading and real-time analytic capabilities,” o Analytic platforms and applications that allow Data Scientists and Business Analysts to manipulate Big Data; and o Data Visualization tools that bring insights from Big Data analysis alive for end users. • Of the current market, Big Data pure-play vendors account for $300 million in Big Data-related revenue. o Despite their relatively small percentage of current overall revenue (approximately 5%), Big Data pure-play vendors – such as Vertica, Splunk and Cloudera — are responsible for the vast majority of new innovations and modern approaches to data management and analytics that have emerged over the last several years and made Big Data the hottest sector in IT. © 2012 Alan Quayle Business and Service Development 48
  • 49. Wikibon Forecast © 2012 Alan Quayle Business and Service Development 49
  • 50. IDC’s Forecast © 2012 Alan Quayle Business and Service Development 50
  • 51. © 2012 Alan Quayle Business and Service Development 51
  • 52. © 2012 Alan Quayle Business and Service Development 52
  • 53. © 2012 Alan Quayle Business and Service Development 53
  • 54. © 2012 Alan Quayle Business and Service Development 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 58. 58
  • 59. 59
  • 60. 60
  • 61. 61
  • 62. 62
  • 63. 63
  • 64. 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 68. 68
  • 69. 69
  • 70. 70
  • 71. 71
  • 72. 72
  • 73. Technology Review Diving into a little detail on a few of the key technologies (only as deep as the architecture) to understand their history and capabilities / limitations © 2012 Alan Quayle Business and Service Development 73
  • 74. Structure Part 2 of 4 • Hadoop o What is Hadoop? o Ecosystem o History o Design Axioms o Hadoop Distributed File System o MapReduce: Distributed Processing o Architecture o Data Schemas o Query Language Flexibility o Economics o Case Studies • Hadoop and Hbase in the Cloud (Amazon) • NoSQL and Cassandra + some use cases • Hbase versus Cassandra • Graph Database introduction © 2012 Alan Quayle Business and Service Development 74
  • 75. Hbase Versus Cassandra: History • HBase and its required supporting systems are derived from what is known of the original Google BigTable and Google File System designs (as known from the Google File System paper Google published in 2003, and the BigTable paper published in 2006). • Cassandra on the other hand is a recent open source fork of a standalone database system initially coded by Facebook, which while implementing the BigTable data model, uses a system inspired by Amazon’s Dynamo for storing data (in fact much of the initial development work on Cassandra was performed by two Dynamo engineers recruited to Facebook from Amazon). © 2012 Alan Quayle Business and Service Development 75
  • 76. Hbase Versus Cassandra: • These differing histories have resulted in HBase being more suitable for data warehousing, and large scale data processing and analysis (for example, such as that involved when indexing the Web) • Cassandra being more suitable for real time transaction processing and the serving of interactive data. • For lightweight validation you’ll find the current makeup of the key committers interesting: o the primary committers to HBase work for Bing (M$ bought their search company last year, and gave them permission to continue submitting open source code after a couple of months). o By contrast the primary committers on Cassandra work for Rackspace, which supports the idea of an advanced general purpose NOSQL solution being freely available to counter the threat of companies becoming locked in to the proprietary NOSQL solutions offered by the likes of Google, Yahoo and Amazon EC2. © 2012 Alan Quayle Business and Service Development 76
  • 77. The CAP Theorem, and was developed by Professor Eric Brewer, Co-founder and Chief Scientist of Inktomi. • The theorem states, that a distributed (or “shared data”) system design, can offer at most two out of three desirable properties – Consistency, Availability and tolerance to network Partitions. Consistency means that if someone writes a value to a database, thereafter other users will immediately be able to read the same value back. Availability means that if some number of nodes fail in your cluster the distributed system can remain operational, and Tolerance to Partitions means that if the nodes in your cluster are divided into two groups that can no longer communicate by a network failure, again the system remains operational • If you search online posts related to HBase and Cassandra comparisons, you will regularly find the HBase community explaining that they have chosen CP, while Cassandra has chosen AP • BUT the CAP theorem only applies to a single distributed algorithm. But there is no reason why you cannot design a single system where for any given operation, the underlying algorithm and thus the trade- off achieved is selectable. • Thus while it is true that a system may only offer two of these properties per operation, what has been widely missed is that a system can be designed that allows a caller to choose which properties they want when any given operation is performed. • Not only that, reality is not nearly so black and white, and it is possible to offer differing degrees of balance between consistency, availability and tolerance to partition. This is Cassandra. © 2012 Alan Quayle Business and Service Development 77
  • 78. Application of Big Data © 2012 Alan Quayle Business and Service Development 78
  • 79. Structure • The Social Enterprise o Business Benefits • Hardware and Software Trends o ALU example o Drivers o Execution and Results Characteristics o Social + Data Analysis = Business o Framework: Ecosystem, Application intelligence Services, Data Management o AT&T Case Study • Real-Time Analytics o Lessons Learned o Use Cases • Telcos and Big Data o Extended RDMS versus MapReduce / o TMF Survey o Big Data Framework Hadoop o Predictive / Adaptive Analytics o Requirements, Trends, People and o Decision Engineering Organization Issues, Outlook o The Problem with Telecom • Big Data and the Cloud • Telco Analytics o Why the Cloud and Big Data? o Customer Profiling o Cloud benefits o Next Product Tools o Marketing Mix Modeling o Use Cases: Bankinter, Etsy, Razorfish o Cost of Acquisition Tools o Case Study © 2012 Alan Quayle Business and Service Development 79
  • 80. Use Cases for Big Data Analytics • Search ranking. o All search engines attempt to rank the relevance of a webpage to a search request against all other possible webpages o Google’s page rank algorithm is, of course, the poster child for this use case • Ad tracking. o E-commerce sites typically record an enormous river of data including every page event in every user session o This allows for very short turnaround of experiments in ad placement, color, size, wording, and other features o When an experiment shows that such a feature change in an ad results in improved click through behavior, the change can be implemented virtually in real time • Location and proximity tracking. o Many use cases add precise GPS location tracking, together with frequent updates, in operational applications, security analysis, navigation, and social media o Precise location tracking opens the door for an enormous ocean of data about other locations nearby the GPS measurement © 2012 Alan Quayle Business and Service Development 80
  • 81. Use Cases for Big Data Analytics • Causal factor discovery. o Point-of-sale data has long been able to show us when the sales of a product goes sharply up or down. But searching for the causal factors that explain these deviations has been, at best, a guessing game or an art form. o The answers may be found in competitive pricing data, competitive promotional data including print and television media, weather, holidays, national events including disasters, and virally spread opinions found in social media. • Social CRM. o This use case is one of the hottest new areas for marketing analysis. The Altimeter Group has described a very useful set of key performance indicators for social CRM that include share of voice, audience engagement, conversation reach, active advocates, advocate influence, advocacy impact, resolution rate, resolution time, satisfaction score, topic trends, sentiment ratio, and idea impact. o The calculation of these KPIs involves in-depth trolling of a huge array of data sources, especially unstructured social media. © 2012 Alan Quayle Business and Service Development 81
  • 82. Use Cases for Big Data Analytics • Document similarity testing. o Two documents can be compared to derive a metric of similarity. There is a large body of academic research and tested algorithms, for example latent semantic analysis, that is just now finding its way to driving monetized insights of interest to big data practitioners. o For example, a single source document can be used as a kind of multifaceted template to compare against a large set of target documents. This could be used for threat discovery, sentiment analysis, and opinion polls. For example: "find all the documents that agree with my source document on global warming.“ • Genomics analysis: e.g., commercial seed gene sequencing. o A few months ago the cotton research community was thrilled by a genome sequencing announcement that stated in part "The sequence will serve a critical role as the reference for future assembly of the larger cotton crop genome. o Cotton is the most important fiber crop worldwide and this sequence information will open the way for more rapid breeding for higher yield, better fiber quality and adaptation to environmental stresses and for insect and disease resistance.” Scientist Ryan Rapp stressed the importance of involving the cotton research community in analyzing the sequence, identifying genes and gene families and determining the future directions of research. o This use case is just one example of a whole industry that is being formed to address genomics analysis broadly, beyond this example of seed gene sequencing. © 2012 Alan Quayle Business and Service Development 82
  • 83. Use Cases for Big Data Analytics • Discovery of customer cohort groups. o Customer cohort groups are used by many enterprises to identify common demographic trends and behavior histories. We are all familiar with Amazon's cohort groups when they say other customers who bought the same book as you have also bought the following books. Of course, if you can sell your product or service to one member of a cohort group, then all the rest may be reasonable prospects. Cohort groups are represented logically and graphically as links, and much of the analysis of cohort groups involves specialized link analysis algorithms. • In-flight aircraft status. o This use case as well as the following two use cases are made possible by the introduction of sensor technology everywhere. In the case of aircraft systems, in-flight status of hundreds of variables on engines, fuel systems, hydraulics, and electrical systems are measured and transmitted every few milliseconds. The value of this use case is not just the engineering telemetry data that could be analyzed at some future point in time, but drives real-time adaptive control, fuel usage, part failure prediction, and pilot notification. • Smart utility meters. o It didn't take long for utility companies to figure out that a smart meter can be used for more than just the monthly readout that produces the customer’s utility bill. By drastically cranking up the frequency of the readouts to as much as one readout per second per meter across the entire customer landscape, many useful analyses can be performed including dynamic load-balancing, failure response, adaptive pricing, and longer-term strategies for incenting customers to utilize the utility more effectively (either from the customers’ point of view or the utility's point of view!) © 2012 Alan Quayle Business and Service Development 83
  • 84. Use Cases for Big Data Analytics • Building sensors. o Modern industrial buildings and high-rises are being fitted with thousands of small sensors to detect temperature, humidity, vibration, and noise. o Like the smart utility meters, collecting this data every few seconds 24 hours per day allows many forms of analysis including energy usage, unusual problems including security violations, component failure in air-conditioning and heating systems and plumbing systems, and the development of construction practices and pricing strategies. • Satellite image comparison. o Images of the regions of the earth from satellites are captured by every pass of certain satellites on intervals typically separated by a small number of days. o Overlaying these images and computing the differences allows the creation of hot spot maps showing what has changed. This analysis can identify construction, destruction, changes due to disasters like hurricanes and earthquakes and fires, and the spread of human encroachment. © 2012 Alan Quayle Business and Service Development 84
  • 85. Use Cases for Big Data Analytics • CAT scan comparisons. o CAT scans are stacks of images taken as "slices" of the human body. Large libraries of CAT scans can be analyzed to facilitate the automatic diagnosis of medical issues and their prevalence. • Financial account fraud detection and intervention. o Account fraud, of course, has immediate and obvious financial impact. In many cases fraud can be detected by patterns of account behavior, in some cases crossing multiple financial systems. For example, "check kiting" requires the rapid transfer of money back and forth between two separate accounts. o Certain forms of broker fraud involve two conspiring brokers selling a security back-and-forth at ever increasing prices, until an unsuspecting third party enters the action by buying the security, allowing the fraudulent brokers to quickly exit. Again, this behavior may take place across two separate exchanges in a short period of time. © 2012 Alan Quayle Business and Service Development 85
  • 86. Use cases for big data analytics • Computer system hacking detection and intervention. o System hacking in many cases involves an unusual entry mode or some other kind of behavior that in retrospect is a smoking gun but may be hard to detect in real-time. • Online game gesture tracking. o Online game companies typically record every click and maneuver by every player at the most fine grained level. This avalanche of "telemetry data" allows fraud detection, intervention for a player who is getting consistently defeated (and therefore discouraged), offers of additional features or game goals for players who are about to finish a game and depart, ideas for new game features, and experiments for new features in the games. o This can be generalized to television viewing. Your DVR box can capture remote control keystrokes, recording events, playback events, picture-in-picture viewing, and the context of the guide. All of this can be sent back to your provider. • Big science including atom smashers, weather analysis, space probe telemetry feeds. o Major scientific projects have always collected a lot of data, but now the techniques of big data analytics are allowing broader access and much more timely access to the data. Big science data, of course, is a mixture of all forms of data, scalar, vector, complex structures, analog wave forms, and images. © 2012 Alan Quayle Business and Service Development 86
  • 87. Use Cases for Big Data Analytics • "Data bag" exploration. o There are many situations in commercial environments and in the research communities where large volumes of raw data are collected. One example might be data collected about structure fires. Beyond the predictable dimensions of time, place, primary cause of fire, and responding firefighters, there may be a wealth of unpredictable anecdotal data that at best can be modeled as a disorderly collection of name value pairs, such as "contributing weather= lightning.” Another example would be the listing of all relevant financial assets for a defendant in a lawsuit. o Again such a list is likely to be a disorderly collection of name value pairs, such as "shared real estate ownership =condominium.” The list of examples like this is endless. What they have in common is the need to encapsulate the disorderly collection of name value pairs which is generally known as a "data bag.” Complex data bags may contain both name value pairs as well as embedded sub data bags. The challenge in this use case is to find a common way to approach the analysis of data bags when the content of the data may need to be discovered after the data is loaded. © 2012 Alan Quayle Business and Service Development 87
  • 88. Use Cases for Big Data Analytics • The final two use cases are old and even predate data warehousing itself. But new life has been breathed into these use cases because of the exciting potential of ultra-atomic customer behavior data. o Loan risk analysis and insurance policy underwriting. In order to evaluate the risk of a prospective loan or a prospective insurance policy, many data sources can be brought into play ranging from payment histories, detailed credit behavior, employment data, and financial asset disclosures. In some cases the collateral for a loan or the insured item may be accompanied by image data. o Customer churn analysis. Enterprises concerned with churn want to understand the predictive factors leading up to the loss of a customer, including that customer’s detailed behavior as well as many external factors including the economy, life stage and other demographics of the customer, and finally real time competitive issues. © 2012 Alan Quayle Business and Service Development 88
  • 89. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  • 90. Characteristics of Big Data
  • 91. Features driven by MapReduce
  • 92. Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
  • 93. Why is Big Data Hard (and Getting Harder)? Changing Data Requirements Faster response time of fresher data Sampling is not good enough & history is important Increasing complexity of analytics Users demand inexpensive experimentation
  • 94. Where is it Coming From? Computer Human Generated Generated • Application server • Twitter “Fire Hose” logs (web sites, 50m tweets/day games) 1,400% growth per • Sensor data (weather, year water, smart grids) • Blogs/Reviews/Emails • Images/videos /Pictures (traffic, security • Social Graphs: cameras) Facebook, Linked-in, Contacts
  • 95. Big Data Verticals Social Media/Ad Life Financial Oil & Gas Retail Security Network/ vertising Sciences Services Gaming User Anti-virus Demographi Targeted Recommen Monte Carlo cs Advertising d Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysis Image and Transaction Risk Video s Analysis Analysis Image In-game Processing Recognition metrics
  • 96. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to our risk-simulation process 23 Hours requirements. to With AWS, we now have the power to decide how fast we want to obtain simulation results, and, more importantly, 20 Minutes we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  • 97. Recommendations The Taste Test http://www.etsy.com/tastetest
  • 98. Recommendations Gift Ideas for Facebook Friends etsy.com/gifts
  • 100. Click Stream Analysis User recently purchased a sports movie and Targeted Ad is searching for video games (1.7 Million per day)
  • 101. The Social Enterprise • Implementations are getting bigger and growing faster than ever • Virtually all data continue to show sustained real-world benefits (McKinsey, IBM, Frost and Sullivan, AIIM) • Everything is becoming social: Social features are appearing in virtually all types of applications • There continues to be considerable confusion about who “owns” social in the organization • The predicted social data explosion: It happened • Mining insight from social data has now become a major industry (#bigdata, #analytics) • The blur between internal and external social business has not progressed as far as many thought • The first serious talk about open social business standards has begun © 2012 Alan Quayle Business and Service Development 101
  • 102. © 2012 Alan Quayle Business and Service Development 102
  • 103. Decision Engineering Adaptive Analytics Predictive Analytics Reporting Data Management (including data migration, data quality, data modeling)
  • 104. Decision Engineering Adaptive Analytics Predictive Analytics Reporting Data Management (including data migration, data quality, data modeling)
  • 105. Predictive/Adaptive Analytics on 1 slide Will this customer churn? Yes/No data: If customer has an open trouble ticket: Yes, otherwise: No Real-Valued: If customer age < 30: Yes, otherwise: No Pattern Combination: If customer age <30 AND has an open trouble ticket: Yes, otherwise: No Linear Combination: If 2.3 x Age + 4.4 x Income > 40: Yes, otherwise: No Predictive Analytics: Obtain these numbers by analyzing historical data Adaptive Analytics: Update your historical data, and re-derive the numbers periodically to take changing situations into account. Nonlinear Analytics: Income vs. Income age age
  • 106. Decision Engineering Adaptive Analytics Predictive Analytics Reporting Data Management (including data migration, data quality, data modeling)
  • 107. Decision Model (part of Decision Engineering) From: Agile Decision Making: Improving business results with analytics TM Forum Quick Insight report, 2011. Source: Lorien Pratt …Decision engineering places analytics in the larger business context. Each “f” here is an analytic, or based on human expertise
  • 108. 1 Data used to construct the analytic 3 2 5 Sally Sally is likely Operational If 2.3 x Age + 4.4 x data Income > 40: Yes, enough to otherwise: No churn that we should 4 call her
  • 109. Key Distinctions • Automated versus human-in-the-loop while building analytics • Automated versus human-in-the-loop while using analytics • Strategic versus tactical goals • One-size fits all versus demographic versus personalized • Within-silo versus between-silo • Cleansing for operational versus analytic purposes
  • 110. Moving Analytics to the Center: Retailers face new competition that is driving an advanced view of customers and interactions to the center of the business. How to dynamically Multi-Channel Operations How do I leverage and manage margin and operationalize customer brand perception with insights and experience the right mix of regular, data to drive personal, promotional and timely, and relevant Merchandising Marketing & Sales markdown products interactions across all across categories, Advanced channels? channels, and formats? Customer Intelligence How do I create a Are inventory and responsive analytics demand data leveraged capability, and Supply Chain Operations to optimize the customer governance relative to experience and the right-time effectively respond to application of analytic changing marketing decision making? Supplier/Partner Collaboration conditions?
  • 111. Semantic Framework: Applied Customer Analytics Capability
  • 112. The New Analytical Competency Focus of Efforts in the Past New Competency Requirements Large-scale Integration of All Data Connected Information & Analytics Sources Governance for the Enterprise Central Control of Meta Data and Provisioning Information & Insights to Information Usage Point of Leverage Developing the Most Technically Agile Analytical Modeling Processes & Correct Analytical Point Solution Rapid Evaluation of Business Lift Possible Example- FROM: How can we use all possible customer dimensions to predict customer churn? TO: What is the optimum behavior modeling framework to rapidly build and deploy models applicable to multiple business objectives that change over time?
  • 113. Predictive Analytics Historical Future Needs Approaches Rely on Require a More Static Data Dynamic Approach • Propensity to Churn • Ability to intervene • Propensity to Buy in customer • Propensity to Pay interactions to create desired • Customer Lifetime outcomes Value
  • 114. Problem Statements Telcos are not traditionally nimble Telcos look at customers in groups, not individually. Telcos have very little idea what drives customer behavior Telcos have no idea how to influence customer behavior Even if they knew how to influence customer behavior, Telcos do not have the nimble decisioning tools required to impact customer behavior in real time.
  • 115. Ecosystem, Taxonomies and Supplier Review Understanding the many suppliers, technology camps, and approaches © 2012 Alan Quayle Business and Service Development 115
  • 116. Structure Part 4 of 5 • 15:00 Ecosystem, Taxonomies and • Case Studies Suppliers: Understanding the many • Real Time Analytics for Big Data Lessons from suppliers, technology camps, and Facebook o Quick technology review approaches o Facebook Real-time Analytics System • Taxonomy of Big Data Companies o Goal • Big Data Landscape o Actual Analytics • Cloudera o Solution • Autonomy o Memory, Collocate, Economics • Vertica • Real Time Analytics for Big Data Lessons from • InfoChimps Twitter • Guavas o Requirements Actual Analytics • Matrix o o Challenges o Performance o One data any API o Solution o Memory, Collocate, Economics • Other Case Studies • Orbitz, Hertz, Yelp © 2012 Alan Quayle Business and Service Development 116
  • 117. provides integrated solutions to enable rapid decisions on big data for CSPs Guavus delivers Unique ability to Patent pending Current big data rapidly fuse huge streaming customers include solutions, quantities of analytics leading wireless, not just data from technology IP, and video technology diverse sources proven over service components 10+ years providers
  • 118. Guavus at a Glance Silicon Valley Venture Backed • US HQ in San Mateo, CA, R&D Offices in India Company • Raised $48 Million, 350 employees worldwide • 3 of the top 5 NA mobile operators, 3 of the top Tier-1 CSP 5 IP / MPLS backbone carriers, & CDN Networks Customers & • 4 of the top 6 largest global communications Partnerships infrastructure equipment vendors • Mature (10+ years) patent-pending technology Industry Proven & Recognized
  • 119. Guavus Empowers LOB to Make Decisions Information Systems Devices & Networks Enterprise Apps Networks Databases Data at Rest Data in Motion Views Flows Data Warehouses Finance & Network & Customer Marketing Executives Regulatory Operations Care & Sales • Profitability • Traffic • Customer • Continuous • Churn Prediction Analysis Engineering Segmentation Business • Focused • Tiered Pricing • Capacity • Campaign Optimization Prospecting Optimization Planning management • Predictive • Targeted Up-Sell • Contract/SLA • Peering Planning & Cross-Sell Enforcement Optimization Data Collection, Fusion and Mining Across Disparate Data Sources
  • 120. Operator Challenges in a Big Data World EXPONENTIAL DATA TIMELY DISTRIBUTED [ STREAMING ] SITTING INSIGHTS NETWORK DATA GROWTH IN SILOS GENERATION
  • 121. Key Data Sources & Insights CONTENT INTERNET CDN Streaming PROVIDERS Analytics Insights Content trending & consumption Fused network events Subscriber EDGE ACCESS CPE OR dynamic usage NETWORK NETWORK END DEVICE profiles Network usage patterns Policy control functions
  • 122. Transforming the Big Data Analytics Economic Model Traditional Streaming Centric Centralized, Store-First Distributed, Compute-First Architecture Architecture TRANSPORT STORAGE TRANSORT STORAGE COMPUTE COMPUTE [ Insights ] [ Insights ] RESOURCES & TIME RESOURCES & TIME • Move processing to data edge • Consolidate data in a repository • Focus spend on analytics first • Transport and store data- Transport • Continuous processing yields timely and and storage costs alone may put it actionable insights over budget • Reduce overall spend per new analytics • Project may not even get started questions • Leverage off the shelf low cost processing and storage
  • 123. Big Data Streaming Analytics Architecture Analytics Applications Examples 3rd Party Feeds & Customer Tools Data Market Capacity Broadban Ad Mobility Digital Warehouse Research Planning d Targeting Media s Centralized Clustering Master Master Business Machine Compute & & Fusion Aggregation Logic Learning Classifying Analyze Distributed Site 1 Distributed Site 2 Distributed Site 3 Aggregation Aggregation Aggregation Data Fusion Data Fusion Data Fusion Streaming / Batch Local Streaming / Batch Local Streaming / Batch Local Ingest Ingest Ingest Data Store Data Store Data Store Media Service DPI PDN Flow & AAA Web Web Advertisin Type Meta Consumption Data Flows Routing Data Activity Taxonomy g Traffic Data Traffic Data Sources
  • 124. Guavus Analytics Platform Details Guavus Applications Customer UI Portals Insight Discovery 3rd Party System Support Mobility Reflex Consumer Guavus External API Network Management, Reporting & POC Sandbox Field Inventory, etc. IP Reflex Enterprise CDN Reflex Reporting Ad Reflex API Data Stores (IT, DWH, Cloud) Cube API HBASE API SQL SQL/Hive Ingest Export Processing Pipeline Caching Compute Nodes … XDR Guavus Stream ( Bus Cubes, Machine Learning Caching ) Analysis Store Traditional ETL Layers Central Compute ( Fusion, Aggregation & Compute ) Data Store Distributed Data Distributed Data Distributed Data Collectors Collectors Collectors Inventor DPI PCMD IPDR NetFlow RADIUS DNS … PM / FM CRM y Streaming Data Feeds
  • 125. Matrixx. Parallel-MATRIXX™ • Parallel-MATRIXX™ technology has completely re-invented transactional real-time and eliminated limitations with contemporary technologies described earlier. • The Next slide identifies the Parallel-MATRIXX™ functional architecture based on multiple patented technologies, and offering a performance improvement of at least two orders of magnitude relative to legacy approaches. © 2012 Alan Quayle Business and Service Development 125
  • 126. © 2012 Alan Quayle Business and Service Development 126
  • 127. Matrixx. Algebraic-Decision Engine • OCS raters can be broadly classified as rule- or data driven. o The former offer great flexibility to configure rating scenarios of arbitrary sophistication but which can become challenging to maintain beyond a certain complexity. o Data driven systems typically offer a rich catalog of off-the-shelf templates that are easily configured to create real offers. • These templates are “baked” into code so performance can be highly optimized. The challenge with this approach arises when no suitable template is available, often requiring complex and costly customization. • With respect to real-time performance, both approaches share a common weakness. Every transaction results in execution of conditional logic reflecting the rating discriminators (if weekend, and if URL is On-net, and if…). • As rating, or indeed policy, rules become more sophisticated, execution code paths extend and performance degrades – often unpredictably. © 2012 Alan Quayle Business and Service Development 127
  • 128. Matrixx. Algebraic-Decision Engine • The Parallel-MATRIXX™ Algebraic-Decision engine eliminates this degradation by building on the simple principle that any pricing concept can be represented as a set of mathematical equations. • Modern CPUs capable of 200 million multiplications per second are exceptionally efficient at solving such equations. • Pricing plans, offers, and policies are configured via a GUI and transparently compiled into an n-dimensional matrix where each dimension corresponds to a rating normalizer (such as time, location, service, etc.). • Stored at each matrix “intersection” is a linear equation representing the rating formula to be applied. As each transaction is mapped to the relevant intersection, solution of the associated linear equation is extremely fast. • As offers are extended with additional normalizers (for example, adding a device dependency to offer lower rates for a promoted device), the matrix dimensionality is extended accordingly. This simply results in a few additional CPU cycles to solve the rate equation with no significant impact on latency. © 2012 Alan Quayle Business and Service Development 128
  • 129. Contention-Free In-Memory Database and Parallel- MATRIXX™ Processing • Maintaining data and transaction integrity is a mission-critical requirement for any database containing CSP customer or financial data. For example, an attempt to transfer funds between two customers must complete successfully or be cleanly aborted. • A situation where the donor’s account is debited but some technical failure results in the recipient not receiving the funds would leave the database in an invalid state. • As described earlier, current real-time systems rely heavily on OLTP and locking techniques to assure data integrity but which can lead to rapidly degrading and unpredictable performance. • Parallel-MATRIXX™ technology is based on an in-memory database that does not utilize locking while still supporting full ACID-compliant transactions. • No transaction is ever blocked from accessing or updating data while newly developed algorithms detect and resolve transaction conflicts. © 2012 Alan Quayle Business and Service Development 129
  • 130. © 2012 Alan Quayle Business and Service Development 130
  • 131. Case Studies Understanding where big data is used in practice © 2012 Alan Quayle Business and Service Development 131
  • 132. Structure Part 4 of 5 • 15:00 Ecosystem, Taxonomies and • Case Studies Suppliers: Understanding the many • Real Time Analytics for Big Data Lessons from suppliers, technology camps, and Facebook o Quick technology review approaches o Facebook Real-time Analytics System • Taxonomy of Big Data Companies o Goal • Big Data Landscape o Actual Analytics • Cloudera o Solution • Autonomy o Memory, Collocate, Economics • Vertica • Real Time Analytics for Big Data Lessons from • InfoChimps Twitter • Guavas o Requirements Actual Analytics • Matrix o o Challenges o Performance o One data any API o Solution o Memory, Collocate, Economics • Other Case Studies • Orbitz, Hertz, Yelp © 2012 Alan Quayle Business and Service Development 132
  • 133. © 2012 Alan Quayle Business and Service Development 133
  • 134. © 2012 Alan Quayle Business and Service Development 134
  • 135. © 2012 Alan Quayle Business and Service Development 135
  • 136. © 2012 Alan Quayle Business and Service Development 136
  • 137. © 2012 Alan Quayle Business and Service Development 137
  • 138. Global Enterprise and Telecom Survey on Big Data and Real-Time Analytics © 2012 Alan Quayle Business and Service Development 138
  • 139. Structure • Background • The Questions • The Importance of Analytics • Impact of Big Data on Analytics • Size of Data Sets, Number of Data Sources • Update Frequency • Integration of Data Sources • Data Set Responsibility • Types of Data, Types of Processing and Analytics • Challenges • Big Data Analytics Platforms • Benefits and Plans • Data Analytics Storage and IT Infrastructure Requirements • Increasing Interest in Hadoop MapReduce Framework Technology • Conclusions © 2012 Alan Quayle Business and Service Development 139
  • 140. Background • Global Survey • Across 200 business and IT executives, questioned in August and September 2012 • 105 enterprise (non Telco), 55 Telco – all large enterprises (no mid-market analysis) • Non-Telco included web service providers, financial services, healthcare, manufacturing, retail, education, government, military, entertainment verticals • Generally VP level with a few CxO level, all decision makers with budget responsibilities • Generally known to me, or through my contacts as I was trying to gather frank reviews • Surprisingly similar across Telco and non-Telco © 2012 Alan Quayle Business and Service Development 140
  • 141. Importance of Enhancing Data Processing and Analytics versus all Business Priorities 39% 31% 20% 9% 1% Most Top 5 Top 10 Top 20 Not Important Important © 2012 Alan Quayle Business and Service Development 141
  • 142. Impact of Big Data on Analytics • There is much market hype surrounding the term big data. When asked what the term means to them, a majority of respondents indicated that it simply refers to very large data sets, see next slide. • The big data movement born from the Hadoop open source initiative has not reached most IT departments or even analytics professionals, as evidenced by the fact that only 11% of survey respondents associate Hadoop MapReduce with the concept of big data. • Most organizations’ analytics efforts to date have dealt with structured data, sourced through relational databases and data warehouses, and for the vast majority of analytical undertakings this makes sense. • But even organizations that have not been captured by the Hadoop movement are still increasingly under the gun to deal with larger data volumes, and the incursion of unstructured data. This, plus the many public examples of big data that have caught the imagination of business executives, have reinvigorated interest in data analytics. © 2012 Alan Quayle Business and Service Development 142
  • 143. What does the term Big Data mean to you? Hadoop / MapReduce Web and search engine data Problems in storing / processing data Data Analytics Dat Warehouses Very large databases Very large data sets 0% 10% 20% 30% 40% 50% 60% 70% 80% © 2012 Alan Quayle Business and Service Development 143
  • 144. Size of Data Sets • The majority (66%) of respondents revealed that the size of the largest data set on which their organization conducts analytics is no more than 5 terabytes (TB). • Overall, the largest data analytics set is approximately 10 TB. • While these numbers might not reflect the expectations that often accompany the concept of big data, the reality is that processing even gigabytes of data at a time during traditional analytics exercises is significant. © 2012 Alan Quayle Business and Service Development 144
  • 145. What is the Largest Data Set? 32% 20% 19% 11% 9% 5% 3% 1% <250GB <500GB <1TB <5TB <10TB <25TB <50TB >50TB © 2012 Alan Quayle Business and Service Development 145
  • 146. Number of Data Sources • A significant part of data analytics exercises is the amalgamation of data from multiple disparate sources. • The next slide show 57% of these organizations are pulling from at least three unique data sources, and one-quarter (25%) are integrating data from five or more sources. © 2012 Alan Quayle Business and Service Development 146
  • 147. Number of Data Sources 25% 21% 17% 16% 12% 9% Single Source 2 3 4 5 >5 © 2012 Alan Quayle Business and Service Development 147
  • 148. Update Frequency • Many organizations identified improving business intelligence and/or delivery of real-time business information as a key business initiative that will have an impact on IT spending decisions. • Considering the volumes of data organizations intend to analyze in shorter timeframes, organizations will need to evaluate whether their current approaches are adaptable to these demanding and constantly changing requirements. As part of the same spending survey, organizations also identified major application deployments or upgrades as a top IT priority, which is significant since every newly deployed or upgraded application will have a corresponding impact on existing data integration processes. • When asked about the rate with which their largest data set data is updated, nearly two thirds (65%) of organizations revealed that the changes take place at an either real-time or near real-time pace. © 2012 Alan Quayle Business and Service Development 148
  • 149. Frequency of Update 37% 35% 28% Realtime (streams) Near realtime Batch © 2012 Alan Quayle Business and Service Development 149
  • 150. Integration of Data Sources • When asked about the primary method to integrate data sources comprising their organization’s largest data sets, nearly four fifths of respondents identified purpose-built applications such as Informatica, Oracle, and Teradata. • An additional 30% use custom extract, transform, load (ETL) scripts or custom extract, load, transform (ELT) scripts for data source integration purposes. © 2012 Alan Quayle Business and Service Development 150
  • 151. Main Method of Integrating Data Sources 39% 30% 12% 10% 9% Purpose built Custom ETL EAI Open Source Other © 2012 Alan Quayle Business and Service Development 151
  • 152. Data Set Responsibility • In terms of the sources responsible for populating organizations’ largest data sets, nearly half (51%) of respondents identified back office applications, such as resource planning, human capital management, and accounting systems. o For example, many years of order or payment information can yield useful insight into customer patterns. • Another common source involves the information gleaned from corporate data centers and computer networks in the form of network traffic and system log files. This information is important to not only those organizations looking to maximize network and system performance and utilization metrics, but also to those that rely on security analytics to help shape information privacy and information protection strategies. • Enterprise organizations were significantly more likely to identify internal back and front office applications, internal data center or computer networks, e- commerce applications (i.e., point-of-sale, supply chain, etc.), and scientific research as data sources that comprise their largest data sets. © 2012 Alan Quayle Business and Service Development 152
  • 153. Responsible for Populating Data Set Scientific research 7% Third Party 10% External public data 11% Telemetry 10% Social media 12% Web Applications 34% Front office 35% Internal data center 45% Internal back-office 51% © 2012 Alan Quayle Business and Service Development 153
  • 154. Types of Data • What data types end up in organizations’ largest data sets from the aforementioned sources? More than half (52%) of respondents indicated that their largest data set is comprised of database data. • Nearly half (48%) of organizations have some measure of transactional data— such as point-of-sale (POS) or inventory—residing in their largest data set. • What is interesting is the number of organizations that report that unstructured data—especially machine-generated content such as log files and sensor data— populates their largest data sets. These data types precipitated the concept of big data and there are emerging signs that these will consume a vast amount of bandwidth, compute, and storage resources. Probably the most significant takeaway is that big data becomes really big when an organization starts to see unstructured / machine-generated data grow to the size of—or even surpass— relational information, which will serve to further exacerbate the integration challenges mentioned above. © 2012 Alan Quayle Business and Service Development 154
  • 155. Source of Data Sensor data 9% Audio / video 11% Web log files 16% Location data 18% Text / messages 19% Log files 22% Office documents 30% Transaction database 48% Relational database 52% © 2012 Alan Quayle Business and Service Development 155
  • 156. Challenges • When asked to identify the data processing and/or analytics challenges associated with their organization’s largest data set, nearly half cited security / regulation / compliance. • Personally identifiable information (PII) and other sensitive information is what drives this. • About one third of respondents identified data quality (35%) and data cleansing tasks (33%) since data cleansing and preparation was categorized as the most time-consuming data processing and analytics activity. • While lack of skills is a middle of the pack challenge according to respondents. • Clearly, responses involving process-related considerations (i.e., data security, integration, cleansing, etc.) gravitated to the top of the challenges list © 2012 Alan Quayle Business and Service Development 156
  • 157. Data Processing Challenges Lack of Skills 17% Costs 18% Data Synchronization 19% Business expectations 25% Data integration 29% Cleansing 32% Data quality 35% Security / Regulation / Compliance 48% © 2012 Alan Quayle Business and Service Development 157
  • 158. Benefits • Cost containment is still an important business initiative to many organizations, especially when it comes to IT investments. • More than half (55%) of respondents identified reduced costs as a key benefit associated with their data analytics platform. • Other top benefits centered on simplicity and efficiency, including easier management and process improvements, as well as improved business agility, which is particularly significant since business requirements are constantly changing when it comes to data analytics. © 2012 Alan Quayle Business and Service Development 158
  • 159. Benefits from Data Analytics Platform Fraud detection 21% Event monitoring 25% Better accuracy 32% Business agility 33% Process improvement 37% Cost reduction 55% © 2012 Alan Quayle Business and Service Development 159
  • 160. Conclusions and Recommendations © 2012 Alan Quayle Business and Service Development 160
  • 161. Recommendations to the Big Data Buyer • Recognize the value of unified information access and analysis in supporting fact-based decisions by individuals, groups, and systems. • Recognize the shortcomings of operating without having the right information at the right time. Use this awareness to help build the business case for addressing those shortcomings – fine an anchor tenant for the project. NO ENTERPRISE WIDE PLATFORM PROJECTS YET, LOOK TO THE CLOUD. • Formulate a Big Data strategy that includes evaluation of decision makers‘ requirements, decision processes, existing and new technology, and availability and quality of data. NOT TECHNOLOGY LED. • The application of Big Data technology will fall into two primary categories: o doing more efficiently (including at lower costs) tasks that have been done for years and doing completely new things that were never before possible, o Driving up long-term strategic organizational value. o Identify opportunities to apply Big Data to both. © 2012 Alan Quayle Business and Service Development 161
  • 162. Recommendations to the Big Data Buyer • Beware of the confusion and hyperbolic marketing in the Big Data market today. WE ARE AT PEAK BS. • IT organizations will need to consider a coordinated approach to planning implementations - when more than one project exists. • It is important to develop an IT infrastructure strategy that optimizes the server, storage, and network resources. Well-developed plans for networking support of Big Data projects should address optimizing the network both within a Big Data domain and in the connection to traditional enterprise infrastructure. LEGACY MATTERS. • Consider the breadth of Big Data technologies and the functionality each technology brings to the overall portfolio of tools for collecting, accessing, analyzing, monitoring, and managing data. © 2012 Alan Quayle Business and Service Development 162
  • 163. Recommendations to the Big Data Vendor • Revenue opportunities exist at all levels of the Big Data technology stack as well as in services. Service is where the bulk of the growth exists. • Articulate your value proposition by connecting technology capabilities to business problems or opportunities. NOT TECHNOLOGY LED. • Big Data technology is not an end in itself. NOT TECHNOLOGY LED. • Recognize the value of Big Data to drive employee and customer decisions and actions. • Decide if you want to be a niche player or enter the mainstream. o If the former, then build a network of consultants and partners to support your technology. o If the latter, then build a business case that assumes eventual acquisition. • The growth in appliances, cloud, and outsourcing deals for Big Data technology will likely mean that end users will choose new applications and services, based less on the technology itself and more on the business value they deliver. © 2012 Alan Quayle Business and Service Development 163
  • 164. Recommendations to the Big Data Vendor • Whether the application is based on a database or is search based, and whether the database is row based or column based, is in-memory or disk based, or uses SQL or NoSQL technologies will become less relevant over time. Thus technology will provide only a short-lived competitive advantage to any vendor. • System performance, availability, security, and manageability will all matter greatly. However, how they are achieved will be less of a point for differentiation. • HPC vendors have an edge in Big Data because leading-edge data-intensive computing has been an integral part of HPC for decades. • Most HPC Big Data work involves established methods of analyzing increasingly large data volume related to numerical modeling and simulation. © 2012 Alan Quayle Business and Service Development 164
  • 165. Recommendations to the Big Data Vendor • Vendors should tout, not hide, their HPC histories. A number of vendors with HPC origins and strong HPC reputations have not capitalized on these assets when attempting to address Big Data markets outside of HPC. • It is better to position your high-end HPC experience as a strength for meeting the presumably less-difficult, data-intensive challenges in the mainstream market. • Useful tools are largely lacking for very large data sets. Tools such as Hadoop and MapReduce can effectively expedite searches through the large, irregular data sets that characterize some of the newer Big Data problems. • These tools can be great for retrieving and moving through complex data, but they do not allow researchers to take the next step and pose intelligent questions. In addition, the going gets tough when data sets cross the 100TB threshold. © 2012 Alan Quayle Business and Service Development 165
  • 166. Recommendations to the Big Data Vendor • Sophisticated tools for data integration and analysis on this scale are largely lacking today. There are opportunities to create tools and applications for Big Data. Vendors that create tools and applications for use at this scale can use them as a lever to seize market leadership positions in the Big Data market. • Not all Big Data use cases involve analytics. Analytics may be at the heart of most Big Data opportunities in the enterprise market, but there are also opportunities to support operational workloads and information access applications. • Some of the emerging technologies and the vendors behind them will likely end up as components or features of broader information management, access, and analysis platforms of larger vendors. Specialized application and service providers with localized and industry expertise will be critical to expanding the market. © 2012 Alan Quayle Business and Service Development 166
  • 167. Walk or Run to Big Data? It depends on your situation. For most telcos the move to Big Data will be incremental and complementary to existing platforms and investments. Focus on the solution: the application of the Analytics to the Business – people and process not technology. © 2012 Alan Quayle Business and Service Development 167