SlideShare a Scribd company logo
1 of 45
A	
  New	
  GeneraAon	
  of	
  Data	
  Transfer	
  
     	
  
          Tools	
  for	
  Hadoop:	
  Sqoop	
  2	
  
   Bilung	
  Lee	
  (blee	
  at	
  cloudera	
  dot	
  com)	
  
   Kathleen	
  Ting	
  (kathleen	
  at	
  cloudera	
  dot	
  com)	
  
   	
  
                       Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                      Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Who	
  Are	
  We?	
  
•  Bilung	
  Lee	
  
    –  Apache	
  Sqoop	
  CommiQer	
  
    –  So=ware	
  Engineer,	
  Cloudera	
  
    	
  
•  Kathleen	
  Ting	
  
    –  Apache	
  Sqoop	
  CommiQer	
  
    –  Support	
  Manager,	
  Cloudera	
  


                           Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                           2	
  
                          Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
What	
  is	
  Sqoop?	
  
•  Bulk	
  data	
  transfer	
  tool	
  
    –  Import/Export	
  from/to	
  relaAonal	
  databases,	
  
       enterprise	
  data	
  warehouses,	
  and	
  NoSQL	
  systems	
  
    –  Populate	
  tables	
  in	
  HDFS,	
  Hive,	
  and	
  HBase	
  
    –  Integrate	
  with	
  Oozie	
  as	
  an	
  acAon	
  
    –  Support	
  plugins	
  via	
  connector	
  based	
  architecture	
  
    May ‘09                      March ‘10                                      August ‘11      April ‘12


  First version                  Moved to                                      Moved to          Apache
(HADOOP-5815)                     GitHub                                       Apache        Top Level Project

                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                                             3	
  
                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1	
  Architecture	
  
                                                                Document
                                   Enterprise                     Based
                                     Data                        Systems
                                   Warehouse




                                                                           Relational
                                                                           Database



command
                        Hadoop



                                               Map Task



 Sqoop




                                                            HDFS/HBase/
                                                               Hive




           Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                        4	
  
          Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1	
  Challenges	
  
•  CrypAc,	
  contextual	
  command	
  line	
  arguments	
  
•  Tight	
  coupling	
  between	
  data	
  transfer	
  and	
  
   output	
  format	
  	
  
•  Security	
  concerns	
  with	
  openly	
  shared	
  
   credenAals	
  
•  Not	
  easy	
  to	
  manage	
  installaAon/configuraAon	
  
•  Connectors	
  are	
  forced	
  to	
  follow	
  JDBC	
  model	
  

                       Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                       5	
  
                      Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2	
  Architecture	
  




       Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                       6	
  
      Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2	
  Themes	
  
•  Ease	
  of	
  Use	
  

•  Ease	
  of	
  Extension	
  

•  Security	
  




                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                               7	
  
                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2	
  Themes	
  
•  Ease	
  of	
  Use	
  

•  Ease	
  of	
  Extension	
  

•  Security	
  




                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                               8	
  
                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Ease	
  of	
  Use	
  
Sqoop	
  1	
                                                                    Sqoop	
  2	
  
Client-­‐only	
  Architecture	
                                                 Client/Server	
  Architecture	
  
CLI	
  based	
                                                                  CLI	
  +	
  Web	
  based	
  
Client	
  access	
  to	
  Hive,	
  HBase	
                                      Server	
  access	
  to	
  Hive,	
  HBase	
  
Oozie	
  and	
  Sqoop	
  Aghtly	
  coupled	
                                    Oozie	
  finds	
  REST	
  API	
  




                                                Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                                                               9	
  
                                               Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  Client-­‐side	
  Tool	
  
•  Client-­‐side	
  installaAon	
  +	
  configuraAon	
  
   –  Connectors	
  are	
  installed/configured	
  locally	
  
   –  Local	
  requires	
  root	
  privileges	
  
   –  JDBC	
  drivers	
  are	
  needed	
  locally	
  	
  
   –  Database	
  connecAvity	
  is	
  needed	
  locally	
  




                       Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                       10	
  
                      Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  Sqoop	
  as	
  a	
  Service	
  
•  Server-­‐side	
  installaAon	
  +	
  configuraAon	
  
   –  Connectors	
  are	
  installed/configured	
  in	
  one	
  place	
  
   –  Managed	
  by	
  administrator	
  and	
  run	
  by	
  operator	
  
   –  JDBC	
  drivers	
  are	
  needed	
  in	
  one	
  place	
  
   –  Database	
  connecAvity	
  is	
  needed	
  on	
  the	
  server	
  




                         Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                         11	
  
                        Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Client	
  Interface	
  
•  Sqoop	
  1	
  client	
  interface:	
  
    –  Command	
  line	
  interface	
  (CLI)	
  based	
  
    –  Can	
  be	
  automated	
  via	
  scripAng	
  


•  Sqoop	
  2	
  client	
  interface:	
  
    –  CLI	
  based	
  (in	
  either	
  interacAve	
  or	
  script	
  mode)	
  
    –  Web	
  based	
  (remotely	
  accessible)	
  
    –  REST	
  API	
  is	
  exposed	
  for	
  external	
  tool	
  integraAon	
  

                            Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                            12	
  
                           Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  Service	
  Level	
  IntegraAon	
  
•  Hive,	
  HBase	
  
    –  Require	
  local	
  installaAon	
  
•  Oozie	
  
    –  von	
  Neumann(esque)	
  integraAon:	
  	
  
        •  Package	
  Sqoop	
  as	
  an	
  acAon	
  
        •  Then	
  run	
  Sqoop	
  from	
  node	
  machines,	
  causing	
  one	
  MR	
  
           job	
  to	
  be	
  dependent	
  on	
  another	
  MR	
  job	
  
        •  Error-­‐prone,	
  difficult	
  to	
  debug	
  


                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                             13	
  
                            Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  Service	
  Level	
  IntegraAon	
  
•  Hive,	
  HBase	
  
    –  Server-­‐side	
  integraAon	
  
•  Oozie	
  
    –  REST	
  API	
  integraAon	
  




                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                          14	
  
                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Ease	
  of	
  Use	
  
Sqoop	
  1	
                                                                    Sqoop	
  2	
  
Client-­‐only	
  Architecture	
                                                 Client/Server	
  Architecture	
  
CLI	
  based	
                                                                  CLI	
  +	
  Web	
  based	
  
Client	
  access	
  to	
  Hive,	
  HBase	
                                      Server	
  access	
  to	
  Hive,	
  HBase	
  
Oozie	
  and	
  Sqoop	
  Aghtly	
  coupled	
                                    Oozie	
  finds	
  REST	
  API	
  




                                                Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                                                               15	
  
                                               Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2	
  Themes	
  
•  Ease	
  of	
  Use	
  

•  Ease	
  of	
  Extension	
  

•  Security	
  




                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                               16	
  
                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Ease	
  of	
  Extension	
  
Sqoop	
  1	
                                                              Sqoop	
  2	
  
Connector	
  forced	
  to	
  follow	
  JDBC	
  model	
                    Connector	
  given	
  free	
  rein	
  
Connectors	
  must	
  implement	
  funcAonality	
   Connectors	
  benefit	
  from	
  common	
  
                                                    framework	
  of	
  funcAonality	
  
Connector	
  selecAon	
  is	
  implicit	
                                 Connector	
  selecAon	
  is	
  explicit	
  	
  




                                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                                                            17	
  
                                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  ImplemenAng	
  Connectors	
  
•  Connectors	
  are	
  forced	
  to	
  follow	
  JDBC	
  model	
  
   –  Connectors	
  are	
  limited/required	
  to	
  use	
  common	
  
      JDBC	
  vocabulary	
  (URL,	
  database,	
  table,	
  etc)	
  
•  Connectors	
  must	
  implement	
  all	
  Sqoop	
  
   funcAonality	
  they	
  want	
  to	
  support	
  
   –  New	
  funcAonality	
  may	
  not	
  be	
  available	
  for	
  
      previously	
  implemented	
  connectors	
  



                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                          18	
  
                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  ImplemenAng	
  Connectors	
  
•  Connectors	
  are	
  not	
  restricted	
  to	
  JDBC	
  model	
  
   –  Connectors	
  can	
  define	
  own	
  domain	
  
•  Common	
  funcAonality	
  are	
  abstracted	
  out	
  of	
  
   connectors	
  
   –  Connectors	
  are	
  only	
  responsible	
  for	
  data	
  transfer	
  
   –  Common	
  Reduce	
  phase	
  implements	
  data	
  
      transformaAon	
  and	
  system	
  integraAon	
  
   –  Connectors	
  can	
  benefit	
  from	
  future	
  development	
  
      of	
  common	
  funcAonality	
  
                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                          19	
  
                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Different	
  OpAons,	
  Different	
  Results	
  
Which	
  is	
  running	
  MySQL?	
  
$ sqoop import --connect jdbc:mysql://localhost/db 
--username foo --table TEST

$ sqoop import --connect jdbc:mysql://localhost/db 
--driver com.mysql.jdbc.Driver --username foo --table TEST
	
  


•  Different	
  opAons	
  may	
  lead	
  to	
  unpredictable	
  
   results	
  
       –  Sqoop	
  2	
  requires	
  explicit	
  selecAon	
  of	
  a	
  connector,	
  
          thus	
  disambiguaAng	
  the	
  process	
  
                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                               20	
  
                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  Using	
  Connectors	
  
•  Choice	
  of	
  connector	
  is	
  implicit	
  
    –  In	
  a	
  simple	
  case,	
  based	
  on	
  the	
  URL	
  in	
  -­‐-­‐connect	
  
       string	
  to	
  access	
  the	
  database	
  
    –  SpecificaAon	
  of	
  different	
  opAons	
  can	
  lead	
  to	
  
       different	
  connector	
  selecAon	
  
    –  Error-­‐prone	
  but	
  good	
  for	
  power	
  users	
  




                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                               21	
  
                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  Using	
  Connectors	
  
•  Require	
  knowledge	
  of	
  database	
  idiosyncrasies	
  	
  
    –  e.g.	
  Couchbase	
  does	
  not	
  need	
  to	
  specify	
  a	
  table	
  
       name,	
  which	
  is	
  required,	
  causing	
  -­‐-­‐table	
  to	
  get	
  
       overloaded	
  as	
  backfill	
  or	
  dump	
  operaAon	
  
    –  e.g.	
  -­‐-­‐null-­‐string	
  representaAon	
  is	
  not	
  supported	
  
       by	
  all	
  connectors	
  

•  FuncAonality	
  is	
  limited	
  to	
  what	
  the	
  implicitly	
  
   chosen	
  connector	
  supports	
  


                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                             22	
  
                            Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  Using	
  Connectors	
  
•  Users	
  make	
  explicit	
  connector	
  choice	
  
    –  Less	
  error-­‐prone,	
  more	
  predictable	
  
•  Users	
  need	
  not	
  be	
  aware	
  of	
  the	
  funcAonality	
  
   of	
  all	
  connectors	
  
    –  Couchbase	
  users	
  need	
  not	
  care	
  that	
  other	
  
       connectors	
  use	
  tables	
  




                           Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                           23	
  
                          Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  Using	
  Connectors	
  
•  Common	
  funcAonality	
  is	
  available	
  to	
  all	
  
   connectors	
  
    –  Connectors	
  need	
  not	
  worry	
  about	
  common	
  
       downstream	
  funcAonality,	
  such	
  as	
  transformaAon	
  
       into	
  various	
  formats	
  and	
  integraAon	
  with	
  other	
  
       systems	
  




                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                          24	
  
                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Ease	
  of	
  Extension	
  
Sqoop	
  1	
                                                              Sqoop	
  2	
  
Connector	
  forced	
  to	
  follow	
  JDBC	
  model	
                    Connector	
  given	
  free	
  rein	
  
Connectors	
  must	
  implement	
  funcAonality	
   Connectors	
  benefit	
  from	
  common	
  
                                                    framework	
  of	
  funcAonality	
  
Connector	
  selecAon	
  is	
  implicit	
                                 Connector	
  selecAon	
  is	
  explicit	
  	
  




                                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                                                            25	
  
                                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2	
  Themes	
  
•  Ease	
  of	
  Use	
  

•  Ease	
  of	
  Extension	
  

•  Security	
  




                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                               26	
  
                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Security	
  
Sqoop	
  1	
                                                                   Sqoop	
  2	
  
Support	
  only	
  for	
  Hadoop	
  security	
                                 Support	
  for	
  Hadoop	
  security	
  and	
  role-­‐
                                                                               based	
  access	
  control	
  to	
  external	
  systems	
  
High	
  risk	
  of	
  abusing	
  access	
  to	
  external	
                    Reduced	
  risk	
  of	
  abusing	
  access	
  to	
  external	
  
systems	
                                                                      systems	
  
No	
  resource	
  management	
  policy	
                                       Resource	
  management	
  policy	
  




                                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                                                                            27	
  
                                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  Security	
  
•  Inherit/Propagate	
  Kerberos	
  principal	
  for	
  the	
  
   jobs	
  it	
  launches	
  
•  Access	
  to	
  files	
  on	
  HDFS	
  can	
  be	
  controlled	
  via	
  
   HDFS	
  security	
  
•  Limited	
  support	
  (user/password)	
  for	
  secure	
  
   access	
  to	
  external	
  systems	
  




                           Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                           28	
  
                          Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  Security	
  
•  Inherit/Propagate	
  Kerberos	
  principal	
  for	
  the	
  
   jobs	
  it	
  launches	
  
•  Access	
  to	
  files	
  on	
  HDFS	
  can	
  be	
  controlled	
  via	
  
   HDFS	
  security	
  
•  Support	
  for	
  secure	
  access	
  to	
  external	
  systems	
  
   via	
  role-­‐based	
  access	
  to	
  connecAon	
  objects	
  
    –  Administrators	
  create/edit/delete	
  connecAons	
  
    –  Operators	
  use	
  connecAons	
  



                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                          29	
  
                         Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  External	
  System	
  Access	
  
•  Every	
  invocaAon	
  requires	
  necessary	
  
   credenAals	
  to	
  access	
  external	
  systems	
  (e.g.	
  
   relaAonal	
  database)	
  
   –  Workaround:	
  create	
  a	
  user	
  with	
  limited	
  access	
  in	
  
      lieu	
  of	
  giving	
  out	
  password	
  
       •  Does	
  not	
  scale	
  
       •  Permission	
  granularity	
  is	
  hard	
  to	
  obtain	
  
•  Hard	
  to	
  prevent	
  misuse	
  once	
  credenAals	
  are	
  
   given	
  
                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                             30	
  
                            Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  External	
  System	
  Access	
  
•  ConnecAons	
  are	
  enabled	
  as	
  first-­‐class	
  objects	
  
   –  ConnecAons	
  encompass	
  credenAals	
  
   –  ConnecAons	
  are	
  created	
  once	
  and	
  then	
  used	
  
      many	
  Ames	
  for	
  various	
  import/export	
  jobs	
  
   –  ConnecAons	
  are	
  created	
  by	
  administrator	
  and	
  
      used	
  by	
  operator	
  
       •  Safeguard	
  credenAal	
  access	
  from	
  end	
  users	
  
•  ConnecAons	
  can	
  be	
  restricted	
  in	
  scope	
  based	
  
   on	
  operaAon	
  (import/export)	
  
   –  Operators	
  cannot	
  abuse	
  credenAals	
  
                            Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                            31	
  
                           Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  1:	
  Resource	
  Management	
  
•  No	
  explicit	
  resource	
  management	
  policy	
  
   –  Users	
  specify	
  the	
  number	
  of	
  map	
  jobs	
  to	
  run	
  
   –  Cannot	
  throQle	
  load	
  on	
  external	
  systems	
  




                            Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                            32	
  
                           Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Sqoop	
  2:	
  Resource	
  Management	
  
•  ConnecAons	
  allow	
  specificaAon	
  of	
  resource	
  
   management	
  policy	
  	
  
   –  Administrators	
  can	
  limit	
  the	
  total	
  number	
  of	
  
      physical	
  connecAons	
  open	
  at	
  one	
  Ame	
  
   –  ConnecAons	
  can	
  also	
  be	
  disabled	
  




                           Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                           33	
  
                          Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Security	
  
Sqoop	
  1	
                                                                   Sqoop	
  2	
  
Support	
  only	
  for	
  Hadoop	
  security	
                                 Support	
  for	
  Hadoop	
  security	
  and	
  role-­‐
                                                                               based	
  access	
  control	
  to	
  external	
  systems	
  
High	
  risk	
  of	
  abusing	
  access	
  to	
  external	
                    Reduced	
  risk	
  of	
  abusing	
  access	
  to	
  external	
  
systems	
                                                                      systems	
  
No	
  resource	
  management	
  policy	
                                       Resource	
  management	
  policy	
  




                                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                                                                            34	
  
                                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Demo	
  Screenshots 	
  	
  




     Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                     35	
  
    Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Demo	
  Screenshots 	
  	
  




     Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                     36	
  
    Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Demo	
  Screenshots 	
  	
  




     Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                     37	
  
    Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Demo	
  Screenshots 	
  	
  




     Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                     38	
  
    Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Demo	
  Screenshots 	
  	
  




     Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                     39	
  
    Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Takeaway	
  	
  
Sqoop	
  2	
  Highights:	
  
    –  Ease	
  of	
  Use:	
  Sqoop	
  as	
  a	
  Service	
  
    –  Ease	
  of	
  Extension:	
  Connectors	
  benefit	
  from	
  
       shared	
  funcAonality	
  
    –  Security:	
  ConnecAons	
  as	
  first-­‐class	
  objects	
  and	
  
       role-­‐based	
  security	
  




                           Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                           40	
  
                          Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Current	
  Status:	
  work-­‐in-­‐progress	
  
•  Sqoop2	
  Development:	
  
  	
  hQp://issues.apache.org/jira/browse/SQOOP-­‐365	
  

•  Sqoop2	
  Blog	
  Post:	
  
  	
  hQp://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop	
  

•  Sqoop2	
  Design:	
  
  	
  hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop+2	
  




                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                               41	
  
                              Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Current	
  Status:	
  work-­‐in-­‐progress	
  
•  Sqoop2	
  Quickstart:	
  
  	
  hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Quickstart	
  

•  Sqoop2	
  Resource	
  Layout:	
  
  	
  hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+-­‐+Resource+Layout	
  

•  Sqoop2	
  Feature	
  Requests:	
  
  	
  hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Feature+Requests	
  




                            Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                            42	
  
                           Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                 43	
  
Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  
Reception
Reception takes place in the Community Showcase (Hall 2)




                                                     Page 44
Sqoop	
  &	
  Flume	
  Meetups	
  Tonight	
  

          6:30pm	
  (Sqoop)	
  
          7:45pm	
  (Flume)	
  
                    at	
  
           Hilton	
  San	
  Jose	
  	
  
        (San	
  Carlos	
  Conf	
  Rm)	
  
                Hadoop	
  Summit	
  2012.	
  6/13/12	
  Apache	
  Sqoop	
  
                                                                                45	
  
               Copyright	
  2012	
  The	
  Apache	
  So=ware	
  FoundaAon	
  

More Related Content

What's hot

Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop UsersKathleen Ting
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
 

What's hot (20)

SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Sqoop tutorial
Sqoop tutorialSqoop tutorial
Sqoop tutorial
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Apache sqoop
Apache sqoopApache sqoop
Apache sqoop
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 

Viewers also liked

Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartIMC Institute
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for KidsIMC Institute
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015IMC Institute
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in ChinaIMC Institute
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/ProductionIMC Institute
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibIMC Institute
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by stepGiulio Roggero
 

Viewers also liked (13)

Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kids
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in China
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by step
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 

Similar to New Data Transfer Tools for Hadoop: Sqoop 2

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with HadoopSagar Jauhari
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera, Inc.
 
Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdSATOSHI TAGOMORI
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 

Similar to New Data Transfer Tools for Hadoop: Sqoop 2 (20)

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with Hadoop
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentd
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

New Data Transfer Tools for Hadoop: Sqoop 2

  • 1. A  New  GeneraAon  of  Data  Transfer     Tools  for  Hadoop:  Sqoop  2   Bilung  Lee  (blee  at  cloudera  dot  com)   Kathleen  Ting  (kathleen  at  cloudera  dot  com)     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 2. Who  Are  We?   •  Bilung  Lee   –  Apache  Sqoop  CommiQer   –  So=ware  Engineer,  Cloudera     •  Kathleen  Ting   –  Apache  Sqoop  CommiQer   –  Support  Manager,  Cloudera   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   2   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 3. What  is  Sqoop?   •  Bulk  data  transfer  tool   –  Import/Export  from/to  relaAonal  databases,   enterprise  data  warehouses,  and  NoSQL  systems   –  Populate  tables  in  HDFS,  Hive,  and  HBase   –  Integrate  with  Oozie  as  an  acAon   –  Support  plugins  via  connector  based  architecture   May ‘09 March ‘10 August ‘11 April ‘12 First version Moved to Moved to Apache (HADOOP-5815) GitHub Apache Top Level Project Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   3   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 4. Sqoop  1  Architecture   Document Enterprise Based Data Systems Warehouse Relational Database command Hadoop Map Task Sqoop HDFS/HBase/ Hive Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   4   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 5. Sqoop  1  Challenges   •  CrypAc,  contextual  command  line  arguments   •  Tight  coupling  between  data  transfer  and   output  format     •  Security  concerns  with  openly  shared   credenAals   •  Not  easy  to  manage  installaAon/configuraAon   •  Connectors  are  forced  to  follow  JDBC  model   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   5   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 6. Sqoop  2  Architecture   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   6   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 7. Sqoop  2  Themes   •  Ease  of  Use   •  Ease  of  Extension   •  Security   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   7   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 8. Sqoop  2  Themes   •  Ease  of  Use   •  Ease  of  Extension   •  Security   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   8   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 9. Ease  of  Use   Sqoop  1   Sqoop  2   Client-­‐only  Architecture   Client/Server  Architecture   CLI  based   CLI  +  Web  based   Client  access  to  Hive,  HBase   Server  access  to  Hive,  HBase   Oozie  and  Sqoop  Aghtly  coupled   Oozie  finds  REST  API   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   9   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 10. Sqoop  1:  Client-­‐side  Tool   •  Client-­‐side  installaAon  +  configuraAon   –  Connectors  are  installed/configured  locally   –  Local  requires  root  privileges   –  JDBC  drivers  are  needed  locally     –  Database  connecAvity  is  needed  locally   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   10   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 11. Sqoop  2:  Sqoop  as  a  Service   •  Server-­‐side  installaAon  +  configuraAon   –  Connectors  are  installed/configured  in  one  place   –  Managed  by  administrator  and  run  by  operator   –  JDBC  drivers  are  needed  in  one  place   –  Database  connecAvity  is  needed  on  the  server   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   11   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 12. Client  Interface   •  Sqoop  1  client  interface:   –  Command  line  interface  (CLI)  based   –  Can  be  automated  via  scripAng   •  Sqoop  2  client  interface:   –  CLI  based  (in  either  interacAve  or  script  mode)   –  Web  based  (remotely  accessible)   –  REST  API  is  exposed  for  external  tool  integraAon   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   12   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 13. Sqoop  1:  Service  Level  IntegraAon   •  Hive,  HBase   –  Require  local  installaAon   •  Oozie   –  von  Neumann(esque)  integraAon:     •  Package  Sqoop  as  an  acAon   •  Then  run  Sqoop  from  node  machines,  causing  one  MR   job  to  be  dependent  on  another  MR  job   •  Error-­‐prone,  difficult  to  debug   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   13   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 14. Sqoop  2:  Service  Level  IntegraAon   •  Hive,  HBase   –  Server-­‐side  integraAon   •  Oozie   –  REST  API  integraAon   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   14   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 15. Ease  of  Use   Sqoop  1   Sqoop  2   Client-­‐only  Architecture   Client/Server  Architecture   CLI  based   CLI  +  Web  based   Client  access  to  Hive,  HBase   Server  access  to  Hive,  HBase   Oozie  and  Sqoop  Aghtly  coupled   Oozie  finds  REST  API   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   15   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 16. Sqoop  2  Themes   •  Ease  of  Use   •  Ease  of  Extension   •  Security   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   16   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 17. Ease  of  Extension   Sqoop  1   Sqoop  2   Connector  forced  to  follow  JDBC  model   Connector  given  free  rein   Connectors  must  implement  funcAonality   Connectors  benefit  from  common   framework  of  funcAonality   Connector  selecAon  is  implicit   Connector  selecAon  is  explicit     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   17   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 18. Sqoop  1:  ImplemenAng  Connectors   •  Connectors  are  forced  to  follow  JDBC  model   –  Connectors  are  limited/required  to  use  common   JDBC  vocabulary  (URL,  database,  table,  etc)   •  Connectors  must  implement  all  Sqoop   funcAonality  they  want  to  support   –  New  funcAonality  may  not  be  available  for   previously  implemented  connectors   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   18   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 19. Sqoop  2:  ImplemenAng  Connectors   •  Connectors  are  not  restricted  to  JDBC  model   –  Connectors  can  define  own  domain   •  Common  funcAonality  are  abstracted  out  of   connectors   –  Connectors  are  only  responsible  for  data  transfer   –  Common  Reduce  phase  implements  data   transformaAon  and  system  integraAon   –  Connectors  can  benefit  from  future  development   of  common  funcAonality   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   19   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 20. Different  OpAons,  Different  Results   Which  is  running  MySQL?   $ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST $ sqoop import --connect jdbc:mysql://localhost/db --driver com.mysql.jdbc.Driver --username foo --table TEST   •  Different  opAons  may  lead  to  unpredictable   results   –  Sqoop  2  requires  explicit  selecAon  of  a  connector,   thus  disambiguaAng  the  process   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   20   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 21. Sqoop  1:  Using  Connectors   •  Choice  of  connector  is  implicit   –  In  a  simple  case,  based  on  the  URL  in  -­‐-­‐connect   string  to  access  the  database   –  SpecificaAon  of  different  opAons  can  lead  to   different  connector  selecAon   –  Error-­‐prone  but  good  for  power  users   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   21   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 22. Sqoop  1:  Using  Connectors   •  Require  knowledge  of  database  idiosyncrasies     –  e.g.  Couchbase  does  not  need  to  specify  a  table   name,  which  is  required,  causing  -­‐-­‐table  to  get   overloaded  as  backfill  or  dump  operaAon   –  e.g.  -­‐-­‐null-­‐string  representaAon  is  not  supported   by  all  connectors   •  FuncAonality  is  limited  to  what  the  implicitly   chosen  connector  supports   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   22   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 23. Sqoop  2:  Using  Connectors   •  Users  make  explicit  connector  choice   –  Less  error-­‐prone,  more  predictable   •  Users  need  not  be  aware  of  the  funcAonality   of  all  connectors   –  Couchbase  users  need  not  care  that  other   connectors  use  tables   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   23   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 24. Sqoop  2:  Using  Connectors   •  Common  funcAonality  is  available  to  all   connectors   –  Connectors  need  not  worry  about  common   downstream  funcAonality,  such  as  transformaAon   into  various  formats  and  integraAon  with  other   systems   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   24   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 25. Ease  of  Extension   Sqoop  1   Sqoop  2   Connector  forced  to  follow  JDBC  model   Connector  given  free  rein   Connectors  must  implement  funcAonality   Connectors  benefit  from  common   framework  of  funcAonality   Connector  selecAon  is  implicit   Connector  selecAon  is  explicit     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   25   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 26. Sqoop  2  Themes   •  Ease  of  Use   •  Ease  of  Extension   •  Security   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   26   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 27. Security   Sqoop  1   Sqoop  2   Support  only  for  Hadoop  security   Support  for  Hadoop  security  and  role-­‐ based  access  control  to  external  systems   High  risk  of  abusing  access  to  external   Reduced  risk  of  abusing  access  to  external   systems   systems   No  resource  management  policy   Resource  management  policy   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   27   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 28. Sqoop  1:  Security   •  Inherit/Propagate  Kerberos  principal  for  the   jobs  it  launches   •  Access  to  files  on  HDFS  can  be  controlled  via   HDFS  security   •  Limited  support  (user/password)  for  secure   access  to  external  systems   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   28   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 29. Sqoop  2:  Security   •  Inherit/Propagate  Kerberos  principal  for  the   jobs  it  launches   •  Access  to  files  on  HDFS  can  be  controlled  via   HDFS  security   •  Support  for  secure  access  to  external  systems   via  role-­‐based  access  to  connecAon  objects   –  Administrators  create/edit/delete  connecAons   –  Operators  use  connecAons   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   29   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 30. Sqoop  1:  External  System  Access   •  Every  invocaAon  requires  necessary   credenAals  to  access  external  systems  (e.g.   relaAonal  database)   –  Workaround:  create  a  user  with  limited  access  in   lieu  of  giving  out  password   •  Does  not  scale   •  Permission  granularity  is  hard  to  obtain   •  Hard  to  prevent  misuse  once  credenAals  are   given   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   30   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 31. Sqoop  2:  External  System  Access   •  ConnecAons  are  enabled  as  first-­‐class  objects   –  ConnecAons  encompass  credenAals   –  ConnecAons  are  created  once  and  then  used   many  Ames  for  various  import/export  jobs   –  ConnecAons  are  created  by  administrator  and   used  by  operator   •  Safeguard  credenAal  access  from  end  users   •  ConnecAons  can  be  restricted  in  scope  based   on  operaAon  (import/export)   –  Operators  cannot  abuse  credenAals   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   31   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 32. Sqoop  1:  Resource  Management   •  No  explicit  resource  management  policy   –  Users  specify  the  number  of  map  jobs  to  run   –  Cannot  throQle  load  on  external  systems   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   32   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 33. Sqoop  2:  Resource  Management   •  ConnecAons  allow  specificaAon  of  resource   management  policy     –  Administrators  can  limit  the  total  number  of   physical  connecAons  open  at  one  Ame   –  ConnecAons  can  also  be  disabled   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   33   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 34. Security   Sqoop  1   Sqoop  2   Support  only  for  Hadoop  security   Support  for  Hadoop  security  and  role-­‐ based  access  control  to  external  systems   High  risk  of  abusing  access  to  external   Reduced  risk  of  abusing  access  to  external   systems   systems   No  resource  management  policy   Resource  management  policy   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   34   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 35. Demo  Screenshots     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   35   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 36. Demo  Screenshots     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   36   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 37. Demo  Screenshots     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   37   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 38. Demo  Screenshots     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   38   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 39. Demo  Screenshots     Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   39   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 40. Takeaway     Sqoop  2  Highights:   –  Ease  of  Use:  Sqoop  as  a  Service   –  Ease  of  Extension:  Connectors  benefit  from   shared  funcAonality   –  Security:  ConnecAons  as  first-­‐class  objects  and   role-­‐based  security   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   40   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 41. Current  Status:  work-­‐in-­‐progress   •  Sqoop2  Development:    hQp://issues.apache.org/jira/browse/SQOOP-­‐365   •  Sqoop2  Blog  Post:    hQp://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop   •  Sqoop2  Design:    hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop+2   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   41   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 42. Current  Status:  work-­‐in-­‐progress   •  Sqoop2  Quickstart:    hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Quickstart   •  Sqoop2  Resource  Layout:    hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+-­‐+Resource+Layout   •  Sqoop2  Feature  Requests:    hQp://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Feature+Requests   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   42   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 43. Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   43   Copyright  2012  The  Apache  So=ware  FoundaAon  
  • 44. Reception Reception takes place in the Community Showcase (Hall 2) Page 44
  • 45. Sqoop  &  Flume  Meetups  Tonight   6:30pm  (Sqoop)   7:45pm  (Flume)   at   Hilton  San  Jose     (San  Carlos  Conf  Rm)   Hadoop  Summit  2012.  6/13/12  Apache  Sqoop   45   Copyright  2012  The  Apache  So=ware  FoundaAon