SlideShare a Scribd company logo
1 of 27
Download to read offline
What	
  is	
  Hadoop,	
  and	
  When	
  Should	
  I	
  
               Consider	
  Using	
  It?	
  

                Houston	
  HUG	
  
                June	
  6th,	
  2011	
  
           Vikram	
  Oberoi,	
  Cloudera	
  


           Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
About	
  me	
  
•  Data	
  engineer	
  at	
  Cloudera,	
  present	
  
     •  	
   Using	
  data	
  and	
  Hadoop	
  to	
  enable	
  more	
  responsive	
  support	
  
•  Data	
  engineer	
  at	
  Meebo,	
  Aug	
  ’09	
  –	
  Nov’10	
  
     •  Data	
  infrastructure,	
  analyLcs	
  
•  CS	
  at	
  Stanford,	
  ’09	
  
     •  Senior	
  project:	
  ext3	
  and	
  XFS	
  under	
  Hadoop	
  MapReduce	
  
        workloads	
  
•  Data	
  engineer	
  at	
  Meebo,	
  ’08	
  
     •  Built	
  an	
  A/B	
  tesLng	
  system	
  
•  SDE	
  Intern	
  at	
  Amazon,	
  ’07	
  
     •  R&D	
  on	
  item-­‐to-­‐item	
  similariLes	
  

                             Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
What	
  will	
  I	
  talk	
  about?	
  

•  What	
  is	
  Hadoop?	
  
	
  
•  Typical	
  Hadoop-­‐able	
  problems	
  and	
  use	
  cases	
  
    	
  
•  Cloudera	
  overview	
  




                        Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
What	
  is	
  Hadoop?	
  




 Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Big	
  Data	
  Problem:	
  Exploding	
  Data	
  Volumes	
  
•  Online	
  
    •  Web-­‐ready	
  devices	
  
    •  Social	
  media	
  
                                                                                    Complex, Unstructured
    •  Digital	
  content	
  
•  Enterprise	
  
    •  TransacLons	
  	
  
                                                           Relational
    •  R&D	
  data	
  
    •  OperaLonal	
  (control)	
  data	
  
•  Open	
  data	
  iniLaLves	
  

     •  2,500 exabytes of new information in 2012 with Internet as primary driver
     •  Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year

                                            Source: An    IDC White Paper - sponsored by EMC. As the Economy Contracts, the
                                                                                                      Digital Universe Expands. May 2009.
                               Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
                                     .
Big	
  Data	
  Problem:	
  Data	
  Economics	
  
  • 	
  Return	
  on	
  Byte	
  =	
  value	
  to	
  be	
  extracted	
  from	
  that	
  byte	
  /	
  cost	
  of	
  storing	
  that	
  
  byte	
  
  • 	
  If	
  ROB	
  is	
  <	
  1	
  then	
  it	
  will	
  be	
  buried	
  into	
  tape	
  wasteland,	
  thus	
  we	
  need	
  
  cheaper	
  ac#ve	
  storage.	
  




                                                                                                               High	
  ROB	
  


                                                                                                               Low	
  ROB	
  



                                      Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Hadoop:	
  A	
  Data	
  PlaEorm	
  with	
  Unique	
  Benefits	
  

                                                                       • 	
  Consolidates	
  Everything	
  
                                                                                 • 	
  Move	
  complex	
  and	
  relaLonal	
  	
  
                                                                                 data	
  into	
  a	
  single	
  repository	
  

                                                                       • 	
  Stores	
  Inexpensively	
  
           MapReduce	
                                                           • 	
  Keep	
  raw	
  data	
  always	
  available	
  
                                                                                 • 	
  Use	
  commodity	
  hardware	
  

                                                                       • 	
  Processes	
  at	
  the	
  Source	
  
      Hadoop	
  Distributed	
                                                    • 	
  Eliminate	
  ETL	
  boglenecks	
  
      File	
  System	
  (HDFS)	
                                                 • 	
  Mine	
  data	
  first,	
  govern	
  later	
  	
  




                         Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Hadoop	
  Distributed	
  File	
  System	
  (HDFS)	
  
       “How	
  is	
  data	
  stored?”	
  

•  Based	
  on	
  design	
  of	
  Google’s	
  GFS	
  
•  Data	
  stored	
  in	
  large	
  files	
  
            •  Files	
  can	
  contain	
  any	
  data	
  
•  Files	
  separated	
  into	
  blocks	
  
            •  64MB	
  up	
  to	
  256MB	
  per	
  block	
  (tunable)	
  
            •  Each	
  block	
  replicated	
  across	
  a	
  cluster	
  (tunable,	
  usually	
  3	
  
               replicas	
  across	
  the	
  cluster)	
  
            •  This	
  buys	
  you:	
  fault	
  tolerance,	
  parallelizable	
  disk	
  reads	
  
•  Store	
  whatever	
  you	
  want	
  in	
  it	
  
            •  This	
  buys	
  you:	
  flexibility	
  
	
  
                                            Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
MapReduce	
  
“How	
  is	
  data	
  processed?”	
  

•  Framework	
  designed	
  for	
  parallel	
  processing	
  of	
  large	
  disk	
  
   bound	
  batch	
  jobs	
  
•  Data	
  processed	
  at	
  the	
  source	
  
     •  File	
  ‘foo’	
  has	
  5	
  blocks,	
  processing	
  happens	
  on	
  5	
  nodes	
  
     •  Parallelized	
  disk	
  reads	
  à	
  remove	
  disk	
  bogleneck	
  
•  Way	
  to	
  express	
  algorithms	
  such	
  that	
  they	
  are	
  
   parallelizable	
  
•  Two	
  funcLons	
  at	
  the	
  core	
  of	
  every	
  job:	
  
     •  Map	
  funcLon	
  (group	
  by)	
  
     •  Reduce	
  funcLon	
  (perform	
  acLon	
  on	
  group)	
  

                                 Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
What	
  is	
  Hadoop?	
  
•  A	
  scalable	
  fault-­‐tolerant	
  distributed	
  system	
  	
  for	
  data	
  storage	
  
   and	
  processing	
  (open	
  source	
  under	
  the	
  Apache	
  license)	
  

•  Scalable	
  data	
  processing	
  engine	
  
     •  Hadoop	
  Distributed	
  File	
  System	
  (HDFS):	
  self-­‐healing	
  high-­‐bandwidth	
  
        clustered	
  storage	
  
     •  MapReduce:	
  fault-­‐tolerant	
  distributed	
  processing	
  
	
  
•  Key	
  value	
  
     •    Flexible	
  -­‐>	
  store	
  data	
  without	
  a	
  schema	
  and	
  add	
  it	
  later	
  as	
  needed	
  
     •    Affordable	
  -­‐>	
  cost	
  /	
  TB	
  at	
  a	
  fracLon	
  of	
  tradiLonal	
  opLons	
  
     •    Broadly	
  adopted	
  -­‐>	
  a	
  large	
  and	
  acLve	
  ecosystem	
  
     •    Proven	
  at	
  scale	
  -­‐>	
  dozens	
  of	
  petabyte	
  +	
  implementaLons	
  in	
  
          producLon	
  today	
  
                                    Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Cloudera’s	
  DistribuSon	
  Including	
  Apache	
  Hadoop	
  
	
  The	
  Industry’s	
  Leading	
  Hadoop	
  Distribu<on	
  




                                                                      Hue	
                                           Hue	
  SDK	
  

                                            Oozie	
                                           Oozie	
                       Hive	
  
                                                                                                          Pig/	
  
                                                                                                          Hive	
  


                      Flume,	
  Sqoop	
                                                                                  HBase	
  

                                                                                                                     Zookeeper	
  



•    Open	
  source	
  –	
  100%	
  Apache	
  licensed	
  and	
  free	
  for	
  download	
  
•    Simplified	
  –	
  Component	
  versions	
  &	
  dependencies	
  managed	
  for	
  you	
  
•    Integrated	
  –	
  All	
  components	
  &	
  funcLons	
  interoperate	
  through	
  standard	
  API’s	
  
•    Reliable	
  –	
  Patched	
  with	
  fixes	
  from	
  future	
  releases	
  to	
  improve	
  stability	
  
•    Supported	
  –	
  Employs	
  project	
  founders	
  and	
  commigers	
  for	
  >90%	
  of	
  components	
  
                                            Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Typical	
  Hadoop-­‐able	
  problems	
  




          Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
What	
  is	
  common	
  across	
  Hadoop-­‐able	
  problems?	
  

 Nature	
  of	
  the	
  data	
  
 •  Complex	
  data	
  
 •  MulLple	
  data	
  sources	
  
 •  Lots	
  of	
  it	
  

 Nature	
  of	
  the	
  analysis	
  
 •  Batch	
  processing	
  
 •  Parallelizable	
  



                             Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     13	
  
What	
  kinds	
  of	
  analyses	
  are	
  possible	
  with	
  Hadoop?	
  


 •  Text	
  mining	
                                                   •  CollaboraLve	
  filtering	
  
 •  Index	
  building	
                                                •  PredicLon	
  models	
  
 •  Graph	
  creaLon	
  and	
                                          •  SenLment	
  analysis	
  
    analysis	
  
                                                                       •  Risk	
  assessment	
  
 •  Pagern	
  recogniLon	
  
                                                                       	
  




                         Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
            14	
  
Top	
  10	
  Hadoop-­‐able	
  Problems	
  
 See	
  archived	
  webinar	
  on	
  cloudera.com	
  

1.  Modeling	
  True	
  Risk	
  
2.  Customer	
  Churn	
  Analysis	
  
3.  RecommendaSon	
  engines	
  
4.  Ad	
  TargeSng	
  
5.  Point	
  Of	
  Sale	
  TransacSon	
  Analysis	
  
6.  Analysing	
  Network	
  Data	
  To	
  Predict	
  Failure	
  
7.  Threat	
  Analysis/Fraud	
  DetecSon	
  
8.  Trade	
  Surveillance	
  
9.  Search	
  Quality	
  
10.  Data	
  “Sandbox”	
  


                                    Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Example:	
  Modeling	
  True	
  Risk	
  




                   Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     16	
  
Example:	
  Modeling	
  True	
  Risk	
  
 SoluSon	
  with	
  Hadoop	
  
 •  Source,	
  parse	
  and	
  aggregate	
  disparate	
  data	
  	
  
    sources	
  to	
  build	
  comprehensive	
  data	
  picture	
  
     •  e.g.	
  credit	
  card	
  records,	
  call	
  recordings,	
  chat	
  
        sessions,	
  emails,	
  banking	
  acLvity	
  
 •  Structure	
  and	
  analyze	
  
     •  SenLment	
  analysis,	
  graph	
  creaLon,	
  pagern	
  
        recogniLon	
  

 Typical	
  Industry	
  
 •  Financial	
  Services	
  (Banks,	
  Insurance)	
  	
  
                           Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     17	
  
Example:	
  
Threat	
  Analysis	
  




                    Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     18	
  
Example:	
  
Threat	
  Analysis	
  
 SoluSon	
  with	
  Hadoop	
  

 •  Parallel	
  processing	
  over	
  huge	
  datasets	
  
 •  Pagern	
  recogniLon	
  to	
  idenLfy	
  anomalies	
  i.e.	
  threats	
  

 Typical	
  Industry	
  
 •  Security	
  
 •  Financial	
  Services	
  
 •  General:	
  spam	
  fighLng,	
  	
  
    click	
  fraud	
  	
  
                           Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     19	
  
Example:	
  RecommendaSon	
  Engine	
  




               Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     20	
  
Example:	
  	
  
RecommendaSon	
  Engine	
  
 SoluSon	
  with	
  Hadoop	
  

 •  Batch	
  processing	
  framework	
  
     •  Allow	
  execuLon	
  in	
  in	
  parallel	
  over	
  large	
  datasets	
  
 •  CollaboraLve	
  filtering	
  
     •  CollecLng	
  ‘taste’	
  informaLon	
  from	
  many	
  users	
  
     •  ULlizing	
  informaLon	
  to	
  predict	
  what	
  similar	
  
        users	
  like	
  

 Typical	
  Industry	
  
 •  Ecommerce,	
  Manufacturing,	
  Retail	
  	
  
                           Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     21	
  
Example:	
  Analyzing	
  Network	
  Data	
  to	
  Predict	
  
Failure	
  




                   Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     22	
  
Example:	
  Analyzing	
  Network	
  Data	
  to	
  Predict	
  
Failure	
  
 SoluSon	
  with	
  Hadoop	
  
 •  Take	
  the	
  computaLon	
  to	
  the	
  data	
  
     •  Expand	
  the	
  range	
  of	
  indexing	
  techniques	
  from	
  simple	
  
        scans	
  to	
  more	
  complex	
  data	
  mining	
  	
  
 •  Beger	
  understand	
  how	
  the	
  network	
  reacts	
  to	
  fluctuaLons	
  
     •  How	
  previously	
  thought	
  discrete	
  anomalies	
  may,	
  in	
  
        fact,	
  be	
  interconnected	
  
 •  IdenLfy	
  leading	
  indicators	
  of	
  component	
  failure	
  

 Typical	
  Industry	
  
 •  ULliLes,	
  TelecommunicaLons,	
  	
  
    Data	
  Centers	
  	
  
                           Copyright	
  2010	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
     23	
  
Example:	
  SupporSng	
  Hadoop	
  at	
  Cloudera	
  

•  Collect	
  data	
  from	
  customer	
  clusters	
  
    •  OS	
  configs,	
  Hadoop	
  configs,	
  command	
  outputs,	
  logs	
  
    •  Data	
  served	
  by	
  HBase,	
  used	
  by	
  supporters	
  
•  Consolidate	
  data	
  about	
  Hadoop	
  in	
  HDFS	
  
    •  Mailing	
  lists,	
  issue	
  trackers,	
  wiki	
  pages,	
  IRC,	
  books	
  
    •  Customer	
  cluster	
  data	
  
•  Analyze	
  many	
  data	
  sources	
  to	
  understand	
  Hadoop	
  
   issues	
  and	
  deployments	
  
    •  Build	
  tools	
  to	
  enable	
  easier	
  diagnosis	
  or	
  proacLve	
  support	
  


                              Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Cloudera	
  overview	
  




     Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Cloudera	
  Offerings	
  
Enabling	
  the	
  Enterprise	
  Adop<on	
  of	
  Apache	
  Hadoop	
  


                         PLATFORM	
                                            SUPPORT	
  &	
  APPLICATIONS	
  




                PROFESSIONAL	
  SERVICES	
                                                        TRAINING	
  




                                 Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  
Contact/Resources/QuesSons	
  

•  vikram@cloudera.com	
  
•  irc.freenode.net	
  #cloudera	
  #hadoop	
  
•  @cloudera	
  

•  Cloudera	
  Groups:	
  hgp://groups.cloudera.org	
  
•  Hadoop	
  the	
  DefiniLve	
  Guide	
  
•  10	
  Hadoop-­‐able	
  problems	
  on	
  Slideshare	
  

•  QuesLons?	
  (P.S.	
  We’re	
  hiring	
  SA’s	
  in	
  Houston!)	
  

                         Copyright	
  2011	
  Cloudera	
  Inc.	
  All	
  rights	
  reserved	
  

More Related Content

What's hot

Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructureelliando dias
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoopDataWorks Summit
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopGeorge Ang
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Krishnan Parasuraman
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applicationsrussell_jurney
 

What's hot (20)

Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoop
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using Hadoop
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Security data deluge
Security data delugeSecurity data deluge
Security data deluge
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 

Viewers also liked

Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social MediaGerald Hensel
 
Big Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in EmergenciesBig Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in EmergenciesThomas Dybro Lundorf
 
Big Data and Social Media
Big Data and Social MediaBig Data and Social Media
Big Data and Social MediaAmy Shuen
 
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...Santiago Castelo
 
Klarity - Asia digital analytic summit
Klarity -  Asia digital analytic summitKlarity -  Asia digital analytic summit
Klarity - Asia digital analytic summitNDN Group
 
Product Placement: The Present & The Future
Product Placement: The Present & The FutureProduct Placement: The Present & The Future
Product Placement: The Present & The Futureitandlaw
 
Big Data Social Media & Smart Apps
Big Data Social Media & Smart AppsBig Data Social Media & Smart Apps
Big Data Social Media & Smart AppsGiacomo Nasilli
 
Big Data und Social Media
Big Data und Social MediaBig Data und Social Media
Big Data und Social MediaLukas Ott
 
Social Media, Big Data, and the Public Sphere
Social Media, Big Data, and the Public SphereSocial Media, Big Data, and the Public Sphere
Social Media, Big Data, and the Public SphereAxel Bruns
 
Making Sense of Twitter: New Research Methods in the Digital Humanities
Making Sense of Twitter: New Research Methods in the Digital HumanitiesMaking Sense of Twitter: New Research Methods in the Digital Humanities
Making Sense of Twitter: New Research Methods in the Digital HumanitiesAxel Bruns
 
Social Media, Big Data and Libraries. The Next Step
Social Media, Big Data and Libraries. The Next StepSocial Media, Big Data and Libraries. The Next Step
Social Media, Big Data and Libraries. The Next StepLorena Fernández
 
Big Data & Social Media / ChangeGroup
Big Data & Social Media / ChangeGroupBig Data & Social Media / ChangeGroup
Big Data & Social Media / ChangeGroupChangeGroup
 
Social Media, Big Data
Social Media, Big Data Social Media, Big Data
Social Media, Big Data robin fay
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...eswcsummerschool
 
Presentation big data and social media final_video
Presentation big data and social media final_videoPresentation big data and social media final_video
Presentation big data and social media final_videoramikaurraminder
 
Big Data, Social Media & Wine
Big Data, Social Media & WineBig Data, Social Media & Wine
Big Data, Social Media & WineMick Yates
 
Big data and social media, BAE Systems Detica
Big data and social media, BAE Systems DeticaBig data and social media, BAE Systems Detica
Big data and social media, BAE Systems DeticaInternet World
 
Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?
Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?
Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?Sotrender
 
Informatica big data and social media
Informatica big data and social mediaInformatica big data and social media
Informatica big data and social mediaRamy Mahrous
 

Viewers also liked (20)

Social media & big data
Social media & big dataSocial media & big data
Social media & big data
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social Media
 
Big Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in EmergenciesBig Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in Emergencies
 
Big Data and Social Media
Big Data and Social MediaBig Data and Social Media
Big Data and Social Media
 
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
 
Klarity - Asia digital analytic summit
Klarity -  Asia digital analytic summitKlarity -  Asia digital analytic summit
Klarity - Asia digital analytic summit
 
Product Placement: The Present & The Future
Product Placement: The Present & The FutureProduct Placement: The Present & The Future
Product Placement: The Present & The Future
 
Big Data Social Media & Smart Apps
Big Data Social Media & Smart AppsBig Data Social Media & Smart Apps
Big Data Social Media & Smart Apps
 
Big Data und Social Media
Big Data und Social MediaBig Data und Social Media
Big Data und Social Media
 
Social Media, Big Data, and the Public Sphere
Social Media, Big Data, and the Public SphereSocial Media, Big Data, and the Public Sphere
Social Media, Big Data, and the Public Sphere
 
Making Sense of Twitter: New Research Methods in the Digital Humanities
Making Sense of Twitter: New Research Methods in the Digital HumanitiesMaking Sense of Twitter: New Research Methods in the Digital Humanities
Making Sense of Twitter: New Research Methods in the Digital Humanities
 
Social Media, Big Data and Libraries. The Next Step
Social Media, Big Data and Libraries. The Next StepSocial Media, Big Data and Libraries. The Next Step
Social Media, Big Data and Libraries. The Next Step
 
Big Data & Social Media / ChangeGroup
Big Data & Social Media / ChangeGroupBig Data & Social Media / ChangeGroup
Big Data & Social Media / ChangeGroup
 
Social Media, Big Data
Social Media, Big Data Social Media, Big Data
Social Media, Big Data
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
 
Presentation big data and social media final_video
Presentation big data and social media final_videoPresentation big data and social media final_video
Presentation big data and social media final_video
 
Big Data, Social Media & Wine
Big Data, Social Media & WineBig Data, Social Media & Wine
Big Data, Social Media & Wine
 
Big data and social media, BAE Systems Detica
Big data and social media, BAE Systems DeticaBig data and social media, BAE Systems Detica
Big data and social media, BAE Systems Detica
 
Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?
Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?
Insighty z social media - jak je wyciągnąć i dlaczego nie zawsze ma to sens?
 
Informatica big data and social media
Informatica big data and social mediaInformatica big data and social media
Informatica big data and social media
 

Similar to Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01eimhee
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems WebinarCloudera, Inc.
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 

Similar to Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera (20)

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 

More from Mark Kerzner

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingMark Kerzner
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overviewMark Kerzner
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentationMark Kerzner
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Mark Kerzner
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpMark Kerzner
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Mark Kerzner
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discoveryMark Kerzner
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryMark Kerzner
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandMark Kerzner
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetableMark Kerzner
 

More from Mark Kerzner (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Toorcamp 2016
Toorcamp 2016Toorcamp 2016
Toorcamp 2016
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overview
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentation
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscovery
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera

  • 1. What  is  Hadoop,  and  When  Should  I   Consider  Using  It?   Houston  HUG   June  6th,  2011   Vikram  Oberoi,  Cloudera   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 2. About  me   •  Data  engineer  at  Cloudera,  present   •    Using  data  and  Hadoop  to  enable  more  responsive  support   •  Data  engineer  at  Meebo,  Aug  ’09  –  Nov’10   •  Data  infrastructure,  analyLcs   •  CS  at  Stanford,  ’09   •  Senior  project:  ext3  and  XFS  under  Hadoop  MapReduce   workloads   •  Data  engineer  at  Meebo,  ’08   •  Built  an  A/B  tesLng  system   •  SDE  Intern  at  Amazon,  ’07   •  R&D  on  item-­‐to-­‐item  similariLes   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 3. What  will  I  talk  about?   •  What  is  Hadoop?     •  Typical  Hadoop-­‐able  problems  and  use  cases     •  Cloudera  overview   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 4. What  is  Hadoop?   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 5. Big  Data  Problem:  Exploding  Data  Volumes   •  Online   •  Web-­‐ready  devices   •  Social  media   Complex, Unstructured •  Digital  content   •  Enterprise   •  TransacLons     Relational •  R&D  data   •  OperaLonal  (control)  data   •  Open  data  iniLaLves   •  2,500 exabytes of new information in 2012 with Internet as primary driver •  Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. Copyright  2011  Cloudera  Inc.  All  rights  reserved   .
  • 6. Big  Data  Problem:  Data  Economics   •   Return  on  Byte  =  value  to  be  extracted  from  that  byte  /  cost  of  storing  that   byte   •   If  ROB  is  <  1  then  it  will  be  buried  into  tape  wasteland,  thus  we  need   cheaper  ac#ve  storage.   High  ROB   Low  ROB   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 7. Hadoop:  A  Data  PlaEorm  with  Unique  Benefits   •   Consolidates  Everything   •   Move  complex  and  relaLonal     data  into  a  single  repository   •   Stores  Inexpensively   MapReduce   •   Keep  raw  data  always  available   •   Use  commodity  hardware   •   Processes  at  the  Source   Hadoop  Distributed   •   Eliminate  ETL  boglenecks   File  System  (HDFS)   •   Mine  data  first,  govern  later     Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 8. Hadoop  Distributed  File  System  (HDFS)   “How  is  data  stored?”   •  Based  on  design  of  Google’s  GFS   •  Data  stored  in  large  files   •  Files  can  contain  any  data   •  Files  separated  into  blocks   •  64MB  up  to  256MB  per  block  (tunable)   •  Each  block  replicated  across  a  cluster  (tunable,  usually  3   replicas  across  the  cluster)   •  This  buys  you:  fault  tolerance,  parallelizable  disk  reads   •  Store  whatever  you  want  in  it   •  This  buys  you:  flexibility     Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 9. MapReduce   “How  is  data  processed?”   •  Framework  designed  for  parallel  processing  of  large  disk   bound  batch  jobs   •  Data  processed  at  the  source   •  File  ‘foo’  has  5  blocks,  processing  happens  on  5  nodes   •  Parallelized  disk  reads  à  remove  disk  bogleneck   •  Way  to  express  algorithms  such  that  they  are   parallelizable   •  Two  funcLons  at  the  core  of  every  job:   •  Map  funcLon  (group  by)   •  Reduce  funcLon  (perform  acLon  on  group)   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 10. What  is  Hadoop?   •  A  scalable  fault-­‐tolerant  distributed  system    for  data  storage   and  processing  (open  source  under  the  Apache  license)   •  Scalable  data  processing  engine   •  Hadoop  Distributed  File  System  (HDFS):  self-­‐healing  high-­‐bandwidth   clustered  storage   •  MapReduce:  fault-­‐tolerant  distributed  processing     •  Key  value   •  Flexible  -­‐>  store  data  without  a  schema  and  add  it  later  as  needed   •  Affordable  -­‐>  cost  /  TB  at  a  fracLon  of  tradiLonal  opLons   •  Broadly  adopted  -­‐>  a  large  and  acLve  ecosystem   •  Proven  at  scale  -­‐>  dozens  of  petabyte  +  implementaLons  in   producLon  today   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 11. Cloudera’s  DistribuSon  Including  Apache  Hadoop    The  Industry’s  Leading  Hadoop  Distribu<on   Hue   Hue  SDK   Oozie   Oozie   Hive   Pig/   Hive   Flume,  Sqoop   HBase   Zookeeper   •  Open  source  –  100%  Apache  licensed  and  free  for  download   •  Simplified  –  Component  versions  &  dependencies  managed  for  you   •  Integrated  –  All  components  &  funcLons  interoperate  through  standard  API’s   •  Reliable  –  Patched  with  fixes  from  future  releases  to  improve  stability   •  Supported  –  Employs  project  founders  and  commigers  for  >90%  of  components   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 12. Typical  Hadoop-­‐able  problems   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 13. What  is  common  across  Hadoop-­‐able  problems?   Nature  of  the  data   •  Complex  data   •  MulLple  data  sources   •  Lots  of  it   Nature  of  the  analysis   •  Batch  processing   •  Parallelizable   Copyright  2010  Cloudera  Inc.  All  rights  reserved   13  
  • 14. What  kinds  of  analyses  are  possible  with  Hadoop?   •  Text  mining   •  CollaboraLve  filtering   •  Index  building   •  PredicLon  models   •  Graph  creaLon  and   •  SenLment  analysis   analysis   •  Risk  assessment   •  Pagern  recogniLon     Copyright  2010  Cloudera  Inc.  All  rights  reserved   14  
  • 15. Top  10  Hadoop-­‐able  Problems   See  archived  webinar  on  cloudera.com   1.  Modeling  True  Risk   2.  Customer  Churn  Analysis   3.  RecommendaSon  engines   4.  Ad  TargeSng   5.  Point  Of  Sale  TransacSon  Analysis   6.  Analysing  Network  Data  To  Predict  Failure   7.  Threat  Analysis/Fraud  DetecSon   8.  Trade  Surveillance   9.  Search  Quality   10.  Data  “Sandbox”   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 16. Example:  Modeling  True  Risk   Copyright  2010  Cloudera  Inc.  All  rights  reserved   16  
  • 17. Example:  Modeling  True  Risk   SoluSon  with  Hadoop   •  Source,  parse  and  aggregate  disparate  data     sources  to  build  comprehensive  data  picture   •  e.g.  credit  card  records,  call  recordings,  chat   sessions,  emails,  banking  acLvity   •  Structure  and  analyze   •  SenLment  analysis,  graph  creaLon,  pagern   recogniLon   Typical  Industry   •  Financial  Services  (Banks,  Insurance)     Copyright  2010  Cloudera  Inc.  All  rights  reserved   17  
  • 18. Example:   Threat  Analysis   Copyright  2010  Cloudera  Inc.  All  rights  reserved   18  
  • 19. Example:   Threat  Analysis   SoluSon  with  Hadoop   •  Parallel  processing  over  huge  datasets   •  Pagern  recogniLon  to  idenLfy  anomalies  i.e.  threats   Typical  Industry   •  Security   •  Financial  Services   •  General:  spam  fighLng,     click  fraud     Copyright  2010  Cloudera  Inc.  All  rights  reserved   19  
  • 20. Example:  RecommendaSon  Engine   Copyright  2010  Cloudera  Inc.  All  rights  reserved   20  
  • 21. Example:     RecommendaSon  Engine   SoluSon  with  Hadoop   •  Batch  processing  framework   •  Allow  execuLon  in  in  parallel  over  large  datasets   •  CollaboraLve  filtering   •  CollecLng  ‘taste’  informaLon  from  many  users   •  ULlizing  informaLon  to  predict  what  similar   users  like   Typical  Industry   •  Ecommerce,  Manufacturing,  Retail     Copyright  2010  Cloudera  Inc.  All  rights  reserved   21  
  • 22. Example:  Analyzing  Network  Data  to  Predict   Failure   Copyright  2010  Cloudera  Inc.  All  rights  reserved   22  
  • 23. Example:  Analyzing  Network  Data  to  Predict   Failure   SoluSon  with  Hadoop   •  Take  the  computaLon  to  the  data   •  Expand  the  range  of  indexing  techniques  from  simple   scans  to  more  complex  data  mining     •  Beger  understand  how  the  network  reacts  to  fluctuaLons   •  How  previously  thought  discrete  anomalies  may,  in   fact,  be  interconnected   •  IdenLfy  leading  indicators  of  component  failure   Typical  Industry   •  ULliLes,  TelecommunicaLons,     Data  Centers     Copyright  2010  Cloudera  Inc.  All  rights  reserved   23  
  • 24. Example:  SupporSng  Hadoop  at  Cloudera   •  Collect  data  from  customer  clusters   •  OS  configs,  Hadoop  configs,  command  outputs,  logs   •  Data  served  by  HBase,  used  by  supporters   •  Consolidate  data  about  Hadoop  in  HDFS   •  Mailing  lists,  issue  trackers,  wiki  pages,  IRC,  books   •  Customer  cluster  data   •  Analyze  many  data  sources  to  understand  Hadoop   issues  and  deployments   •  Build  tools  to  enable  easier  diagnosis  or  proacLve  support   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 25. Cloudera  overview   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 26. Cloudera  Offerings   Enabling  the  Enterprise  Adop<on  of  Apache  Hadoop   PLATFORM   SUPPORT  &  APPLICATIONS   PROFESSIONAL  SERVICES   TRAINING   Copyright  2011  Cloudera  Inc.  All  rights  reserved  
  • 27. Contact/Resources/QuesSons   •  vikram@cloudera.com   •  irc.freenode.net  #cloudera  #hadoop   •  @cloudera   •  Cloudera  Groups:  hgp://groups.cloudera.org   •  Hadoop  the  DefiniLve  Guide   •  10  Hadoop-­‐able  problems  on  Slideshare   •  QuesLons?  (P.S.  We’re  hiring  SA’s  in  Houston!)   Copyright  2011  Cloudera  Inc.  All  rights  reserved