SlideShare a Scribd company logo
1 of 112
Download to read offline
Cloud	
  Architecture	
  Tutorial	
  
      Pla$orm	
  Component	
  Architecture	
  
                               	
  
                   Part	
  2	
  of	
  3

    Qcon	
  London	
  March	
  5th,	
  2012	
  
             Adrian	
  Cockcro?	
  
            @adrianco	
  #ne$lixcloud	
  
    hCp://www.linkedin.com/in/adriancockcro?	
  
Don’t	
  Do	
  That!	
  

A	
  Discussion	
  of	
  AnM-­‐Architecture	
  
       (wriCen	
  as	
  an	
  Ignite	
  talk)	
  
Architecture	
  
PaCerns	
  to	
  guide	
  detailed	
  
 design	
  and	
  construcMon	
  
AnM-­‐Architecture	
  
Constraints	
  that	
  limit	
  detailed	
  
  design	
  and	
  construcMon	
  
Misplaced	
  Enthusiasm	
  
How	
  could	
  that	
  happen?	
  
Anatomy	
  of	
  a	
  Failure	
  
What	
  I	
  Wanted	
  
•  Moving	
  to	
  Cassandra	
  as	
  primary	
  data	
  store	
  
•  We	
  need	
  backups!	
  
•  We	
  are	
  running	
  on	
  AWS…	
  

	
  
           I	
  want	
  Cassandra	
  backups	
  to	
  S3	
  
       Start	
  with	
  full	
  backup,	
  incremental	
  later	
  
       Restore	
  to	
  a	
  different	
  Cassandra	
  cluster	
  
AddiMonal	
  Goals	
  
I	
  would	
  like	
  it	
  next	
  week	
  -­‐	
  Keep	
  it	
  simple	
  
            No	
  single	
  point	
  of	
  failure	
  
Get	
  once	
  a	
  day	
  full	
  backup	
  working	
  first	
  
Prototype	
  
•    Created	
  S3	
  bucket	
  
•    Carefully	
  figured	
  out	
  a	
  good	
  S3	
  path	
  hierarchy	
  
•    Wrote	
  a	
  simple	
  backup	
  script	
  
•    Added	
  it	
  to	
  cron	
  
•    ….	
  
•    Profit!	
  

(total	
  Mme	
  half	
  a	
  day)	
  
Now	
  comes	
  the	
  hard	
  part!	
  
Restore	
  is	
  trickier,	
  Cassandra	
  is	
  wriCen	
  in	
  Java,	
  
 programmer	
  from	
  another	
  team	
  takes	
  over…	
  

    Here’s	
  the	
  S3	
  bucket,	
  backups	
  are	
  being	
  
   collected	
  already,	
  please	
  figure	
  out	
  how	
  to	
  
     restore	
  it.	
  Done	
  by	
  next	
  week	
  perhaps?	
  
Days	
  Pass…	
  
•    Programmer	
  is	
  re-­‐wriMng	
  backup	
  in	
  python	
  
•    Installs	
  Python	
  2.7	
  on	
  CentOS,	
  breaks	
  yum	
  
•    Backup	
  remotely	
  invoked	
  from	
  a	
  central	
  point	
  
•    Cassandra	
  patched	
  to	
  do	
  incremental	
  backups	
  
Weeks	
  Pass…	
  
•    Python	
  based	
  full	
  backup	
  &	
  restore	
  works!	
  
•    But	
  only	
  to	
  the	
  Cassandra	
  cluster	
  it	
  came	
  from	
  
•    Incremental	
  backup	
  works!	
  
•    Restore	
  not	
  done	
  yet…	
  
Cassandra	
  in	
  ProducMon	
  
      We	
  do	
  have	
  backups	
  running	
  now,	
  right?	
  
                     We’ll	
  get	
  right	
  on	
  it…	
  
I	
  want	
  the	
  producKon	
  backup	
  restored	
  in	
  test.	
  
      Oh,	
  didn’t	
  implement	
  that	
  feature	
  yet…	
  
Whoops!	
  
ProducMon	
  data	
  trashed	
  while	
  sefng	
  up	
  backup	
  
Luckily	
  –	
  it	
  was	
  recoverable	
  from	
  elsewhere	
  
Months	
  Pass	
  
•    Python	
  prototype	
  re-­‐wriCen	
  in	
  Java	
  (Priam)	
  
•    Integrated	
  with	
  other	
  management	
  funcMons	
  
•    Decentralized	
  backups	
  again	
  (yay!)	
  
•    Reliable	
  backups	
  
•    Restore	
  to	
  test	
  
•    Not	
  simple	
  
•    Took	
  too	
  long…	
  
AnM-­‐Architecture	
  
•    Define	
  the	
  things	
  you	
  don’t	
  want	
  
•    Constrain	
  the	
  outcome	
  
•    Check	
  that	
  the	
  constraints	
  are	
  being	
  met	
  
•    …	
  
•    Profit!	
  
AnM-­‐Architecture	
  Success	
  
hCp://techblog.ne$lix.com/2011/04/lessons-­‐ne$lix-­‐learned-­‐from-­‐aws-­‐outage.html	
  

	
  
AnM-­‐Architecture	
  
Define	
  the	
  space	
  the	
  thing	
  will	
  inhabit	
  

                                   	
  
                                   	
  
    (All	
  pictures	
  in	
  this	
  secMon	
  were	
  found	
  
                  on	
  google	
  images)	
  
Cloud	
  Architecture	
  PaCerns	
  

        Where	
  do	
  we	
  start?	
  
Goals	
  
•  Faster	
  
     –  Lower	
  latency	
  than	
  the	
  equivalent	
  datacenter	
  web	
  pages	
  and	
  API	
  calls	
  
     –  Measured	
  as	
  mean	
  and	
  99th	
  percenMle	
  
     –  For	
  both	
  first	
  hit	
  (e.g.	
  home	
  page)	
  and	
  in-­‐session	
  hits	
  for	
  the	
  same	
  user	
  
•  Scalable	
  
     –  Avoid	
  needing	
  any	
  more	
  datacenter	
  capacity	
  as	
  subscriber	
  count	
  increases	
  
     –  No	
  central	
  verMcally	
  scaled	
  databases	
  
     –  Leverage	
  AWS	
  elasMc	
  capacity	
  effecMvely	
  
•  Available	
  
     –  SubstanMally	
  higher	
  robustness	
  and	
  availability	
  than	
  datacenter	
  services	
  
     –  Leverage	
  mulMple	
  AWS	
  availability	
  zones	
  
     –  No	
  scheduled	
  down	
  Mme,	
  no	
  central	
  database	
  schema	
  to	
  change	
  
•  ProducMve	
  
     –  OpMmize	
  agility	
  of	
  a	
  large	
  development	
  team	
  with	
  automaMon	
  and	
  tools	
  
     –  Leave	
  behind	
  complex	
  tangled	
  datacenter	
  code	
  base	
  (~8	
  year	
  old	
  architecture)	
  
     –  Enforce	
  clean	
  layered	
  interfaces	
  and	
  re-­‐usable	
  components	
  
Datacenter	
  AnM-­‐PaCerns	
  

 What	
  do	
  we	
  currently	
  do	
  in	
  the	
  
datacenter	
  that	
  prevents	
  us	
  from	
  
         meeMng	
  our	
  goals?	
  
                       	
  
Architecture	
  
•  So?ware	
  Architecture	
  
   –  The	
  abstracMons	
  and	
  interfaces	
  that	
  developers	
  build	
  
      against	
  
•  Systems	
  Architecture	
  
   –  The	
  service	
  instances	
  that	
  define	
  availability,	
  
      scalability	
  
•  Compose-­‐ability	
  
   –  so?ware	
  architecture	
  that	
  is	
  independent	
  of	
  the	
  
      systems	
  architecture	
  
   –  decoupled	
  flexible	
  building	
  block	
  components	
  	
  
Rewrite	
  from	
  Scratch	
  

Not	
  everything	
  is	
  cloud	
  specific	
  
  Pay	
  down	
  technical	
  debt	
  
          Robust	
  paCerns	
  
Ne$lix	
  Datacenter	
  vs.	
  Cloud	
  Arch	
  
   Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

SMcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

      ChaCy	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

    Instrumented	
  Code	
              Instrumented	
  Service	
  PaCerns	
  

   Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

 Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
The	
  Central	
  SQL	
  Database	
  
•  Datacenter	
  has	
  a	
  central	
  database	
  
   –  Everything	
  in	
  one	
  place	
  is	
  convenient	
  unMl	
  it	
  fails	
  
   –  Customers,	
  movies,	
  history,	
  configuraMon	
  
•  Schema	
  changes	
  require	
  downMme	
  
                              	
  
    AnK-­‐paMern	
  impacts	
  scalability,	
  availability	
  
The	
  Distributed	
  Key-­‐Value	
  Store	
  
•  Cloud	
  has	
  many	
  key-­‐value	
  data	
  stores	
  
    –  More	
  complex	
  to	
  keep	
  track	
  of,	
  do	
  backups	
  etc.	
  
    –  Each	
  store	
  is	
  much	
  simpler	
  to	
  administer	
  
                                                                                  DBA	
  
    –  Joins	
  take	
  place	
  in	
  java	
  code	
  
    –  No	
  schema	
  to	
  change,	
  no	
  scheduled	
  downMme	
  

•  Mean	
  Latency	
  for	
  Simple	
  Key	
  Lookup	
  Queries	
  
    –  Memcached	
  is	
  dominated	
  by	
  network	
  latency	
  <1ms	
  
    –  Cassandra	
  around	
  one	
  millisecond	
  
    –  Oracle	
  for	
  simple	
  queries	
  is	
  a	
  few	
  milliseconds	
  
    –  DynamoDB	
  around	
  5ms	
  
    –  SimpleDB	
  replicaMon	
  and	
  REST	
  overheads	
  >10ms	
  
The	
  SMcky	
  Session	
  
•  Datacenter	
  SMcky	
  Load	
  Balancing	
  
   –  Efficient	
  caching	
  for	
  low	
  latency	
  
   –  Tricky	
  session	
  handling	
  code	
  
•  Encourages	
  concentrated	
  funcMonality	
  
   –  one	
  service	
  that	
  does	
  everything	
  
   –  Middle	
  Mer	
  load	
  balancer	
  had	
  issues	
  in	
  pracMce	
  
                              	
  
  AnK-­‐paMern	
  impacts	
  producKvity,	
  availability	
  
Shared	
  Session	
  State	
  
•  ElasMc	
  Load	
  Balancer	
  	
  
    –  We	
  don’t	
  use	
  the	
  cookie	
  based	
  rouMng	
  opMon	
  
    –  External	
  “session	
  caching”	
  with	
  memcached	
  


•  More	
  flexible	
  fine	
  grain	
  services	
  
    –  Any	
  instance	
  can	
  serve	
  any	
  request	
  
    –  Works	
  beCer	
  with	
  auto-­‐scaled	
  instance	
  counts	
  
ChaCy	
  Opaque	
  and	
  BriCle	
  Protocols	
  
•  Datacenter	
  service	
  protocols	
  
    –  Assumed	
  low	
  latency	
  for	
  many	
  simple	
  requests	
  
•  Based	
  on	
  serializing	
  exisMng	
  java	
  objects	
  
    –  Inefficient	
  formats	
  
    –  IncompaMble	
  when	
  definiMons	
  change	
  
                               	
  
   AnK-­‐paMern	
  causes	
  producKvity,	
  latency	
  and	
  
                     availability	
  issues	
  
Robust	
  and	
  Flexible	
  Protocols	
  
•  Cloud	
  service	
  protocols	
  
   –  JSR311/Jersey	
  is	
  used	
  for	
  REST/HTTP	
  service	
  calls	
  
   –  Custom	
  client	
  code	
  includes	
  service	
  discovery	
  
   –  Support	
  complex	
  data	
  types	
  in	
  a	
  single	
  request	
  
•  Apache	
  Avro	
  
   –  Evolved	
  from	
  Protocol	
  Buffers	
  and	
  Thri?	
  
   –  Includes	
  JSON	
  header	
  defining	
  key/value	
  protocol	
  
   –  Avro	
  serializaMon	
  is	
  half	
  the	
  size	
  and	
  several	
  Mmes	
  
      faster	
  than	
  Java	
  serializaMon,	
  more	
  work	
  to	
  code	
  
Persisted	
  Protocols	
  
•  Persist	
  Avro	
  in	
  Memcached	
  
   –  Save	
  space/latency	
  (zigzag	
  encoding,	
  half	
  the	
  size)	
  
   –  New	
  keys	
  are	
  ignored	
  
   –  Missing	
  keys	
  are	
  handled	
  cleanly	
  
•  Avro	
  protocol	
  definiMons	
  
   –  Less	
  briCle	
  across	
  versions	
  
   –  Can	
  be	
  wriCen	
  in	
  JSON	
  or	
  generated	
  from	
  POJOs	
  
   –  It’s	
  hard,	
  needs	
  beCer	
  tooling	
  
Tangled	
  Service	
  Interfaces	
  
•  Datacenter	
  implementaMon	
  is	
  exposed	
  
   –  Oracle	
  SQL	
  queries	
  mixed	
  into	
  business	
  logic	
  
•  Tangled	
  code	
  
   –  Deep	
  dependencies,	
  false	
  sharing	
  
•  Data	
  providers	
  with	
  sideways	
  dependencies	
  
   –  Everything	
  depends	
  on	
  everything	
  else	
  


   AnK-­‐paMern	
  affects	
  producKvity,	
  availability	
  
Untangled	
  Service	
  Interfaces	
  
•  New	
  Cloud	
  Code	
  With	
  Strict	
  Layering	
  
    –  Compile	
  against	
  interface	
  jar	
  
    –  Can	
  use	
  spring	
  runMme	
  binding	
  to	
  enforce	
  
    –  Fine	
  grain	
  services	
  as	
  components	
  
•  Service	
  interface	
  is	
  the	
  service	
  
    –  ImplementaMon	
  is	
  completely	
  hidden	
  
    –  Can	
  be	
  implemented	
  locally	
  or	
  remotely	
  
    –  ImplementaMon	
  can	
  evolve	
  independently	
  
Untangled	
  Service	
  Interfaces	
  
Two	
  layers:	
  
•  SAL	
  -­‐	
  Service	
  Access	
  Library	
  
    –  Basic	
  serializaMon	
  and	
  error	
  handling	
  
    –  REST	
  or	
  POJO’s	
  defined	
  by	
  data	
  provider	
  
•  ESL	
  -­‐	
  Extended	
  Service	
  Library	
  
    –  Caching,	
  conveniences,	
  can	
  combine	
  several	
  SALs	
  
    –  Exposes	
  faceted	
  type	
  system	
  (described	
  later)	
  
    –  Interface	
  defined	
  by	
  data	
  consumer	
  in	
  many	
  cases	
  
Service	
  InteracMon	
  PaCern	
  
    Sample	
  Swimlane	
  Diagram	
  
Service	
  Architecture	
  PaCerns	
  
•  Internal	
  Interfaces	
  Between	
  Services	
  
   –  Common	
  paCerns	
  as	
  templates	
  
   –  Highly	
  instrumented,	
  observable,	
  analyMcs	
  
   –  Service	
  Level	
  Agreements	
  –	
  SLAs	
  
•  Library	
  templates	
  for	
  generic	
  features	
  
   –  Instrumented	
  Ne$lix	
  Base	
  Servlet	
  template	
  
   –  Instrumented	
  generic	
  client	
  interface	
  template	
  
   –  Instrumented	
  S3,	
  SimpleDB,	
  Memcached	
  clients	
  
CLIENT	
  
                                                                  Request	
  Start	
  
                                                                   Timestamp,	
               Client	
  
                                          Inbound	
               Request	
  End	
          outbound	
  
                                       deserialize	
  end	
        Timestamp	
            serialize	
  start	
  
                                         Mmestamp	
  
                                                                                           Mmestamp	
  

                  Inbound	
                                                                                            Client	
  
                 deserialize	
                                                                                      outbound	
  
                    start	
                                                                                        serialize	
  end	
  
                 Mmestamp	
                                                                                         Mmestamp	
  




Client	
  network	
  
    receive	
  
  Mmestamp	
  
                                       Service	
  Request	
                                                                       Client	
  Network	
  
                                                                                                                                       send	
  
                                                                                                                                    Mmestamp	
  



                                      Instruments	
  Every	
  
   Service	
  
network	
  send	
  
 Mmestamp	
  
                                        Step	
  in	
  the	
  call	
                                                                   Service	
  
                                                                                                                                      Network	
  
                                                                                                                                      receive	
  
                                                                                                                                     Mmestamp	
  




                  Service	
                                                                                           Service	
  
                outbound	
                                                                                           inbound	
  
               serialize	
  end	
                                                                                  serialize	
  start	
  
                Mmestamp	
                                                                                          Mmestamp	
  

                                           Service	
                                         Service	
  
                                          outbound	
                                        inbound	
  
                                        serialize	
  start	
     SERVICE	
  execute	
     serialize	
  end	
  
                                                                   request	
  start	
  
                                         Mmestamp	
                                        Mmestamp	
  
                                                                    Mmestamp,	
  
                                                                 execute	
  request	
  
                                                                  end	
  Mmestamp	
  
Boundary	
  Interfaces	
  
•  Isolate	
  teams	
  from	
  external	
  dependencies	
  
   –  Fake	
  SAL	
  built	
  by	
  cloud	
  team	
  
   –  Real	
  SAL	
  provided	
  by	
  data	
  provider	
  team	
  later	
  
   –  ESL	
  built	
  by	
  cloud	
  team	
  using	
  faceted	
  objects	
  
•  Fake	
  data	
  sources	
  allow	
  development	
  to	
  start	
  
   –  e.g.	
  Fake	
  IdenMty	
  SAL	
  for	
  a	
  test	
  set	
  of	
  customers	
  
   –  Development	
  solidifies	
  dependencies	
  early	
  
   –  Helps	
  external	
  team	
  provide	
  the	
  right	
  interface	
  
One	
  Object	
  That	
  Does	
  Everything	
  
•  Datacenter	
  uses	
  a	
  few	
  big	
  complex	
  objects	
  
    –  Movie	
  and	
  Customer	
  objects	
  are	
  the	
  foundaMon	
  
    –  Good	
  choice	
  for	
  a	
  small	
  team	
  and	
  one	
  instance	
  
    –  ProblemaMc	
  for	
  large	
  teams	
  and	
  many	
  instances	
  
•  False	
  sharing	
  causes	
  tangled	
  dependencies	
  
    –  UnproducMve	
  re-­‐integraMon	
  work	
  
                            	
  
       AnK-­‐paMern	
  impacKng	
  producKvity	
  and	
  
                         availability	
  
An	
  Interface	
  For	
  Each	
  Component	
  
•  Cloud	
  uses	
  faceted	
  Video	
  and	
  Visitor	
  
    –  Basic	
  types	
  hold	
  only	
  the	
  idenMfier	
  
    –  Facets	
  scope	
  the	
  interface	
  you	
  actually	
  need	
  
    –  Each	
  component	
  can	
  define	
  its	
  own	
  facets	
  
•  No	
  false-­‐sharing	
  and	
  dependency	
  chains	
  
    –  Type	
  manager	
  converts	
  between	
  facets	
  as	
  needed	
  
    –  video.asA(PresentaMonVideo)	
  for	
  www	
  
    –  video.asA(MerchableVideo)	
  for	
  middle	
  Mer	
  
Basic	
  Types	
  

Epistemology	
  and	
  Design	
  
     By	
  Stan	
  Lanning	
  
Avoiding	
  “Level	
  Confusion”	
  [Catataxis] 	
  	
  
•  Business	
  Level	
  Objects	
  (BLO?)	
  
    –  Customers,	
  Movies,	
  etc	
  
    –  Conceptual:	
  Exist	
  only	
  between	
  the	
  ears	
  
•  Abstract	
  Types	
  
    –  AbstracMons	
  that	
  try	
  to	
  model	
  aspects	
  of	
  the	
  business	
  
       level	
  objects	
  
    –  O?en	
  captured	
  by	
  Java	
  interfaces	
  
•  ImplementaMons	
  
    –  Specific	
  coded	
  implementaMons	
  of	
  the	
  abstract	
  types	
  
    –  Java	
  class,	
  or	
  a	
  collecMon	
  of	
  rows	
  in	
  a	
  database…	
  
Facets	
  
•  No	
  single	
  Abstract	
  Type	
  captures	
  everything	
  
   about	
  a	
  BLO	
  
    –  Different	
  teams	
  see	
  different	
  “facets”	
  
         •  Customer:	
  Account	
  status;	
  	
  Billing	
  history;	
  Viewing	
  
            history;	
  A/B	
  test	
  assignments	
  
         •  Movie:	
  Availability;	
  Popularity;	
  Synopsis;	
  Cast	
  
    –  Loosely	
  coupled,	
  Mghtly	
  aligned(!)	
  
•  All	
  facets	
  for	
  a	
  BLO	
  should	
  inherit	
  from	
  one	
  
   “basic”	
  type	
  that	
  has	
  minimal	
  behavior	
  
Basic	
  Types	
  
•  Module	
  external	
  interfaces	
  deal	
  in	
  basic	
  types;	
  
   internal	
  calls	
  are	
  free	
  to	
  use	
  more	
  complex	
  
   facets	
  
•  Generic	
  machinery	
  to	
  switch	
  between	
  facets	
  
       Business	
  Level	
  Object	
        Java	
  Basic	
  Type	
  
       Movie	
  (TV	
  show…)	
             Video	
  
       Customer	
                           Visitor	
  
       Category	
                           VTag	
  
       Country	
                            ISOCountry	
  
Type	
  Manager	
  
•  Holds	
  the	
  “factory”	
  objects	
  that	
  manage	
  
   instances	
  of	
  facets	
  
   –  Typically	
  one	
  factory	
  per	
  facet	
  
   –  Factories	
  free	
  to	
  implement	
  any	
  instance	
  
      management	
  policy	
  they	
  want	
  
•  Factories	
  register	
  with	
  the	
  Type	
  Manager	
  
   –  callers	
  never	
  interact	
  directly	
  with	
  the	
  factories	
  
   –  Mock	
  managers?	
  
Switching	
  Facets	
  
•  Each	
  Basic	
  Type	
  B	
  implements	
  a	
  method	
  that	
  
   uses	
  the	
  Type	
  Manager	
  to	
  find	
  facet	
  
   implementaMons	
  of	
  the	
  same	
  BLO	
  
   	
  	
  	
  	
  <T extends B> T asA(Class<T> c)!
•  Example:	
  
   	
  	
  	
  Visitor visitor = xxx;

      ABClient abClient = visitor.asA(ABClient.class);

      assert(visitor.equals(abClient));!

•  Look	
  Ma,	
  no	
  cast!	
  
    –  Facets	
  are	
  equal,	
  but	
  not	
  necessarily	
  ==.	
  
IDs!	
  (huh)	
  What	
  are	
  they	
  good	
  for?	
  
•  IDs	
  exist	
  because	
  implementaKons	
  need	
  to	
  
   externalize	
  objects	
  and	
  maintain	
  their	
  idenKty	
  
       –  Persist	
  in	
  a	
  DB,	
  or	
  talk	
  to	
  a	
  remote	
  service	
  
       –  Different	
  implementaMons	
  of	
  a	
  type	
  of	
  BLO	
  model	
  
          the	
  same	
  object	
  iff	
  they	
  have	
  the	
  same	
  ID	
  
       –  Basic	
  Types	
  use	
  IDs	
  to	
  manage	
  facets,	
  determine	
  
          equality,	
  etc	
  
	
  
ConverMng	
  IDs	
  ßàObjects	
  
Long id = xx;!
MyVisitor visitor =!
    TypeManager.findObject(Visitor.class, id)!
               .asA(MyVisitor.class);!
assert(id.equals(visitor.getId());!
// Or more efficiently…!
MyVisitor visitor2 =!
    TypeManager.findObject(Visitor.class, id,!
                            MyVisitor.class);!
// There are also efficient bulk conversion methods!
Collection<Long> ids = xxx;!
List<MyVisitor> visitors =!
    TypeManager.findObjects(Visitor.class, ids,!
                             MyVisitor.class);!
!

	
  
Stan’s	
  Soap	
  Box	
  
•  Don’t	
  pass	
  around	
  IDs	
  when	
  you	
  mean	
  to	
  refer	
  
   to	
  the	
  BLO;	
  that	
  is	
  Level	
  Confusion	
  
•  Using	
  Basic	
  Types	
  helps	
  the	
  compiler	
  help	
  you;	
  
   compile	
  Mme	
  problems	
  are	
  beCer	
  than	
  run	
  
   Mme	
  problems	
  
•  More	
  readable	
  by	
  people,	
  but	
  beware	
  that	
  
   asA	
  operaMons	
  may	
  be	
  a	
  lot	
  of	
  work	
  
•  (Is	
  this	
  a	
  way	
  to	
  approximate	
  mulMple-­‐
   inheritance	
  in	
  Java?)	
  
So?ware	
  Architecture	
  PaCerns	
  
•  Object	
  Models	
  
   –  Basic	
  and	
  derived	
  types,	
  facets,	
  serializable	
  
   –  Pass	
  by	
  reference	
  within	
  a	
  service	
  
   –  Pass	
  by	
  value	
  between	
  services	
  
•  ComputaMon	
  and	
  I/O	
  Models	
  
   –  Service	
  ExecuMon	
  using	
  Best	
  Effort	
  /	
  Futures	
  
   –  Common	
  thread	
  pool	
  management	
  
   –  Circuit	
  breakers	
  to	
  manage	
  and	
  contain	
  failures	
  
Model	
  Driven	
  Architecture	
  
•  TradiMonal	
  Datacenter	
  PracMces	
  
   –  Lots	
  of	
  unique	
  hand-­‐tweaked	
  systems	
  
   –  Hard	
  to	
  enforce	
  paCerns	
  
   –  Some	
  use	
  of	
  Puppet	
  to	
  automate	
  changes	
  

•  Model	
  Driven	
  Cloud	
  Architecture	
  
   –  Perforce/Ivy/Jenkins	
  based	
  builds	
  for	
  everything	
  
   –  Every	
  producMon	
  instance	
  is	
  a	
  pre-­‐baked	
  AMI	
  
   –  Every	
  applicaMon	
  is	
  managed	
  by	
  an	
  Autoscaler	
  

                       Every	
  change	
  is	
  a	
  new	
  AMI	
  
Ne$lix	
  Cloud	
  Pla$orm	
  

              Ne$lix	
  ApplicaMons	
  
   Ne$lix	
  Cloud	
  Pla$orm	
  /	
  PaaS	
  
AWS	
  Specific	
                                  Partner	
  
                      Ne$lix	
  Legacy	
  
  Code	
                                         Interfaces	
  

                       Datacenter	
  
AWS	
  Services	
                            Partner	
  Services	
  
                        Services	
  
Ne$lix	
  PaaS	
  Principles	
  
•  Maximum	
  FuncMonality	
  
    –  Developer	
  producMvity	
  and	
  agility	
  
•  Leverage	
  as	
  much	
  of	
  AWS	
  as	
  possible	
  
    –  AWS	
  is	
  making	
  huge	
  investments	
  in	
  features/scale	
  
•  Interfaces	
  that	
  isolate	
  Apps	
  from	
  AWS	
  
    –  Avoid	
  lock-­‐in	
  to	
  specific	
  AWS	
  API	
  details	
  
•  Portability	
  is	
  a	
  long	
  term	
  goal	
  
    –  Gets	
  easier	
  as	
  other	
  vendors	
  catch	
  up	
  with	
  AWS	
  
Ne$lix	
  Global	
  PaaS	
  
•    Architecture	
  Features	
  and	
  Overview	
  
•    Portals	
  and	
  Explorers	
  
•    Pla$orm	
  Services	
  
•    Pla$orm	
  APIs	
  
•    Pla$orm	
  Frameworks	
  
•    Persistence	
  
•    Scalability	
  Benchmark	
  
Global	
  PaaS?	
  
            Toys	
  are	
  nice,	
  but	
  this	
  is	
  the	
  real	
  thing…	
  
•    Supports	
  all	
  AWS	
  Availability	
  Zones	
  and	
  Regions	
  
•    Supports	
  mulMple	
  AWS	
  accounts	
  {test,	
  prod,	
  etc.}	
  
•    Cross	
  Region/Acct	
  Data	
  ReplicaMon	
  and	
  Archiving	
  
•    InternaMonalized,	
  Localized	
  and	
  GeoIP	
  rouMng	
  
•    Security	
  is	
  fine	
  grain,	
  dynamic	
  AWS	
  keys	
  
•    Autoscaling	
  to	
  thousands	
  of	
  instances	
  
•    Monitoring	
  for	
  millions	
  of	
  metrics	
  
•    ProducMve	
  for	
  100s	
  of	
  developers	
  on	
  one	
  product	
  
•    23M+	
  users	
  USA,	
  Canada,	
  LaMn	
  America,	
  UK,	
  Eire	
  
Basic	
  PaaS	
  EnMMes	
  
•  AWS	
  Based	
  EnMMes	
  
    –  Instances	
  and	
  Machine	
  Images,	
  ElasMc	
  IP	
  Addresses	
  
    –  Security	
  Groups,	
  Load	
  Balancers,	
  Autoscale	
  Groups	
  
    –  Availability	
  Zones	
  and	
  Geographic	
  Regions	
  


•  Ne$lix	
  PaaS	
  EnMMes	
  
    –  ApplicaMons	
  (registered	
  services)	
  
    –  Clusters	
  (versioned	
  Autoscale	
  Groups	
  for	
  an	
  App)	
  
    –  ProperMes	
  (dynamic	
  hierarchical	
  configuraMon)	
  
Core	
  PaaS	
  Services	
  
•  AWS	
  Based	
  Services	
  
    –  S3	
  storage,	
  to	
  5TB	
  files,	
  parallel	
  mulMpart	
  writes	
  
    –  SQS	
  –	
  Simple	
  Queue	
  Service.	
  Messaging	
  layer.	
  

•  Ne$lix	
  Based	
  Services	
  
    –  EVCache	
  –	
  memcached	
  based	
  ephemeral	
  cache	
  
    –  Cassandra	
  –	
  distributed	
  data	
  store	
  

•  External	
  Services	
  
    –  GeoIP	
  Lookup	
  interfaced	
  to	
  a	
  vendor	
  
    –  Keystore	
  HSM	
  in	
  Ne$lix	
  Datacenter	
  
Instance	
  Architecture	
  

Linux	
  Base	
  AMI	
  (CentOS	
  or	
  Ubuntu)	
  
   OpMonal	
  
   Apache	
  
  frontend,	
  
                          Java	
  (JDK	
  6	
  or	
  7)	
  
memcached,	
  
non-­‐java	
  apps	
  


                                                    Tomcat	
  
                          AppDynamics	
  
                            appagent	
  
 Monitoring	
  
 Log	
  rotaMon	
                                     ApplicaMon	
  servlet,	
  base	
           Healthcheck,	
  status	
  
    to	
  S3	
            GC	
  and	
  thread	
      server,	
  pla$orm,	
  interface	
        servlets,	
  JMX	
  interface,	
  
AppDynamics	
             dump	
  logging	
         jars	
  for	
  dependent	
  services	
         Servo	
  autoscale	
  
machineagent	
  
        Epic	
  	
  
Security	
  Architecture	
  
•  Instance	
  Level	
  Security	
  baked	
  into	
  base	
  AMI	
  
    –  Login:	
  ssh	
  only	
  allowed	
  via	
  portal	
  (not	
  between	
  instances)	
  
    –  Each	
  app	
  type	
  runs	
  as	
  its	
  own	
  userid	
  app{test|prod}	
  

•  AWS	
  Security,	
  IdenMty	
  and	
  Access	
  Management	
  
    –  Each	
  app	
  has	
  its	
  own	
  security	
  group	
  (firewall	
  ports)	
  
    –  Fine	
  grain	
  user	
  roles	
  and	
  resource	
  ACLs	
  

•  Key	
  Management	
  
    –  AWS	
  Keys	
  dynamically	
  provisioned,	
  easy	
  updates	
  
    –  High	
  grade	
  app	
  specific	
  key	
  management	
  support	
  
Core	
  Pla$orm	
  Frameworks	
  and	
  APIs	
  
Portals	
  and	
  Explorers	
  
•  Ne$lix	
  ApplicaMon	
  Console	
  (NAC)	
  
   –  Primary	
  AWS	
  provisioning/config	
  interface	
  
•  AWS	
  Usage	
  Analyzer	
  
   –  Breaks	
  down	
  costs	
  by	
  applicaMon	
  and	
  resource	
  
•  Cassandra	
  Explorer	
  
   –  Browse	
  clusters,	
  keyspaces,	
  column	
  families	
  
•  Base	
  Server	
  Explorer	
  
   –  Browse	
  service	
  endpoints	
  configuraMon,	
  perf	
  
AWS	
  Usage	
  
for	
  test,	
  carefully	
  omifng	
  any	
  $	
  numbers…   	
  
Cassandra	
  Explorer	
  
Cassandra	
  Explorer	
  
Pla$orm	
  Services	
  
•    Discovery	
  –	
  service	
  registry	
  for	
  “ApplicaMons”	
  
•    IntrospecMon	
  –	
  Entrypoints	
  
•    Cryptex	
  –	
  Dynamic	
  security	
  key	
  management	
  
•    Geo	
  –	
  Geographic	
  IP	
  lookup	
  
•    Pla$ormservice	
  –	
  Dynamic	
  property	
  configuraMon	
  
•    LocalizaMon	
  –	
  manage	
  and	
  lookup	
  local	
  translaMons	
  
•    Evcache	
  –	
  ephemeral	
  volaMle	
  cache	
  
•    Cassandra	
  –	
  Cross	
  zone/region	
  distributed	
  data	
  store	
  
•    Zookeeper	
  –	
  Distributed	
  CoordinaMon	
  (Curator)	
  
•    Various	
  proxies	
  –	
  access	
  to	
  old	
  datacenter	
  stuff	
  
IntrospecMon	
  -­‐	
  Entrypoints	
  
•  REST	
  API	
  for	
  tools,	
  apps,	
  explorers,	
  monkeys…	
  
   –  E.g.	
  GET	
  /REST/v1/instance/$INSTANCE_ID	
  


•  AWS	
  Resources	
  
   –  Autoscaling	
  Groups,	
  EIP	
  Groups,	
  Instances	
  


•  Ne$lix	
  PaaS	
  Resources	
  
   –  Discovery	
  ApplicaMons,	
  Clusters	
  of	
  ASGs,	
  History	
  
Entrypoints	
  Queries	
  
    MongoDB	
  used	
  for	
  low	
  traffic	
  complex	
  queries	
  against	
  complex	
  objects                	
  
DescripAon	
                                                       Range	
  expression	
  
Find	
  all	
  acMve	
  instances.	
  	
                           all()	
  
Find	
  all	
  instances	
  associated	
  with	
  a	
  group	
     %(cloudmonkey)	
  
name.	
  
Find	
  all	
  instances	
  associated	
  with	
  a	
              /^cloudmonkey$/discovery()	
  
discovery	
  group. 	
  	
  
Find	
  all	
  auto	
  scale	
  groups	
  with	
  no	
  instances.	
   asg(),-­‐has(INSTANCES;asg())	
  
How	
  many	
  instances	
  are	
  not	
  in	
  an	
  auto	
       count(all(),-­‐info(eval(INSTANCES;asg())))       	
  	
  
scale	
  group?	
  
What	
  groups	
  include	
  an	
  instance?	
                     *(i-­‐4e108521)	
  
What	
  auto	
  scale	
  groups	
  and	
  elasMc	
  load	
         filter(TYPE;asg,elb;*(i-­‐4e108521))	
  
balancers	
  include	
  an	
  instance?	
  
What	
  instance	
  has	
  a	
  given	
  public	
  ip?	
           filter(PUBLIC_IP;174.129.188.{0..255};all())	
  
Metrics	
  Framework	
  
•  System	
  and	
  ApplicaMon	
  
    –  CollecMon,	
  AggregaMon,	
  Querying	
  and	
  ReporMng	
  
    –  Non-­‐blocking	
  logging,	
  avoids	
  log4j	
  lock	
  contenMon	
  
    –  Honu-­‐Streaming	
  -­‐>	
  S3	
  -­‐>	
  EMR	
  -­‐>	
  Hive	
  
•  Performance,	
  Robustness,	
  Monitoring,	
  Analysis	
  
    –  Tracers,	
  Counters	
  –	
  explicit	
  code	
  instrumentaMon	
  log	
  
    –  Real	
  Time	
  Tracers/Counters	
  
    –  SLA	
  –	
  service	
  level	
  response	
  Mme	
  percenMles	
  
    –  Servo	
  annotated	
  JMX	
  extract	
  to	
  Cloudwatch	
  
•  Latency	
  Monkey	
  Infrastructure	
  
    –  Inject	
  random	
  delays	
  into	
  service	
  responses	
  
ConfiguraAon	
  Management	
  
•  Ne$lixConfiguraMon	
  
     –  ValidaMon	
  Framework	
  
     –  Sitewide	
  ProperMes	
  Explorer	
  
•    Pla$ormService	
  
•    Mapping	
  Service	
  
•    ZooKeeper	
  (Curator)	
  
•    InstanceIdenMty	
  
Interprocess	
  CommunicaAon	
  
•  Discovery	
  Service	
  registry	
  for	
  “applicaMons”	
  
    –  “here	
  I	
  am”	
  call	
  every	
  30s,	
  drop	
  a?er	
  3	
  missed	
  
    –  “where	
  is	
  everyone”	
  call	
  
    –  Redundant,	
  distributed,	
  moving	
  to	
  Zookeeper	
  
•  NIWS	
  –	
  Ne$lix	
  Internal	
  Web	
  Service	
  client	
  
    –  So?ware	
  Middle	
  Tier	
  Load	
  Balancer	
  
    –  Failure	
  retry	
  moves	
  to	
  next	
  instance	
  
    –  Many	
  opMons	
  for	
  encoding,	
  etc.	
  
Security	
  Key	
  Management	
  
•  AKMS	
  
    –  Dynamic	
  Key	
  Management	
  interface	
  
    –  Update	
  AWS	
  keys	
  at	
  runMme,	
  no	
  restart	
  
    –  All	
  keys	
  stored	
  securely,	
  none	
  on	
  disk	
  or	
  in	
  AMI	
  
•  Cryptex	
  -­‐	
  Flexible	
  key	
  store	
  
    –  Low	
  grade	
  keys	
  processed	
  in	
  client	
  
    –  Medium	
  grade	
  keys	
  processed	
  by	
  Cryptex	
  service	
  
    –  High	
  grade	
  keys	
  processed	
  by	
  hardware	
  (Ingrian)	
  
AWS	
  Persistence	
  Services	
  
•  SimpleDB	
  
    –  Got	
  us	
  started,	
  migrated	
  to	
  Cassandra	
  now	
  
    –  NFSDB	
  -­‐	
  Instrumented	
  wrapper	
  library	
  
    –  Domain	
  and	
  Item	
  sharding	
  (workarounds)	
  
•  S3	
  
    –  Upgraded/Instrumented	
  JetS3t	
  based	
  interface	
  
    –  Supports	
  mulMpart	
  upload	
  and	
  5TB	
  files	
  
    –  Global	
  S3	
  endpoint	
  management	
  
Aside:	
  Adrian’s	
  Rant	
  on	
  CAP	
  Theorem	
  
         Choose	
  Consistency	
  or	
  Availability	
  when	
  ParAAoned	
  

•    Instances	
  and	
  Networks	
  will	
  fail	
  
•    Network	
  failure	
  =	
  ParMMon	
  “P”	
  is	
  a	
  given	
  
•    Distributed	
  Systems:	
  two	
  choices	
  –	
  CP	
  or	
  AP	
  
•    “Vendor	
  claims	
  CA”	
  
      –  Usually	
  they	
  mean	
  available	
  when	
  instances	
  fail	
  
•  Master-­‐Slave	
  =	
  Consistent	
  when	
  ParMMoned	
  
      –  You	
  can’t	
  write	
  unless	
  you	
  can	
  see	
  the	
  master	
  
•  No-­‐Master	
  =	
  Available	
  when	
  ParMMoned	
  
      –  Writes	
  proceed,	
  conflicts	
  will	
  be	
  patched	
  up	
  later	
  
What	
  Ne$lix	
  Needed	
  from	
  NoSQL	
  
Basic	
  Requirements	
  
•    Supports	
  running	
  on	
  Amazon	
  EC2	
  
•    Supports	
  Amazon	
  Availability	
  Zones	
  
•    Low	
  latency,	
  low	
  latency	
  variance	
  
•    High	
  and	
  scalable	
  read	
  and	
  write	
  throughput	
  
•    Large	
  and	
  scalable	
  capacity,	
  no	
  external	
  sharding	
  
•    “AP”	
  Eventually	
  Consistent	
  
•    Data	
  integrity	
  checks	
  and	
  repairs	
  
•    Online	
  Snapshot	
  Backup,	
  Restore/Rollback	
  
Scenario	
  –	
  Immediate	
  Read	
  a?er	
  Write	
  
    Q1:	
  Is	
  rouMng	
  and	
  replicaMon	
  zone	
  aware?	
  	
  

                                                   TV	
  Device	
  



                    New	
                                                                 New	
  
                  Favorite	
                  Round	
  Robin	
                          Favorites	
  
                                             Load	
  Balancer	
  
                                                                                           List	
  


                                    API	
                               API	
  
                                 (zone	
  A)	
                        (Zone	
  B)	
  
           Append	
                                                                                New	
  
             New	
                                                                               Favorites	
  
           Favorite	
                                                                               List	
  
                                 Favorites	
                          Favorites	
  
                                 (zone	
  A)	
                        (Zone	
  B)	
  

                                              ReplicaMon	
  
Network	
  ParMMon	
  
         Q2:	
  What	
  happens	
  next?	
  

                                        TV	
  Device	
  



         New	
                                                                 New	
  
       Favorite	
                  Round	
  Robin	
                          Favorites	
  
                                  Load	
  Balancer	
  
                                                                                List	
  


                         API	
                               API	
  
                      (zone	
  A)	
                        (Zone	
  B)	
  
Append	
                                                                                New	
  
  New	
                                                                               Favorites	
  
Favorite	
                                                                               List	
  
                      Favorites	
                          Favorites	
  
                      (zone	
  A)	
                        (Zone	
  B)	
  

                               No	
  ReplicaMon	
  
Network	
  ParMMon	
  
Q3:	
  Supports	
  Append	
  vs.	
  Read/Modify/Write?	
  

                                                      TV	
  Device	
  



                       New	
                                                                 New	
  
                     Favorite	
                  Round	
  Robin	
                          Favorites	
  
                                                Load	
  Balancer	
  
                                                                                              List	
  

           RMW	
  
                                       API	
                               API	
  
                                    (zone	
  A)	
                        (Zone	
  B)	
  
  Old	
           New	
                                                                               New	
  
Favorites	
     Favorites	
                                                                         Favorites	
  
   List	
          List	
                                                                              List	
  
                                    Favorites	
                          Favorites	
  
                                    (zone	
  A)	
                        (Zone	
  B)	
  

                                               ReplicaMon	
  
Silent	
  Data	
  CorrupMon	
  
Q4:	
  How	
  is	
  it	
  detected	
  and	
  corrected?	
  	
  

                                          TV	
  Device	
  



         New	
                                                                   New	
  
       Favorite	
                    Round	
  Robin	
                          Favorites	
  
                                    Load	
  Balancer	
  
                                                                                  List	
  


                           API	
                               API	
  
                        (zone	
  A)	
                        (Zone	
  B)	
  
Append	
                                                                                  New	
  
  New	
                                                                                 Favorites	
  
Favorite	
                                                                                 List	
  
                        Favorites	
                          Favorites	
  
                        (zone	
  A)	
                        (Zone	
  B)	
  

          ReplicaMon	
  corrupted	
  on	
  disk	
  or	
  via	
  network	
  
NePlix	
  PlaPorm	
  Persistence	
  
•  Ephemeral	
  VolaMle	
  Cache	
  –	
  evcache	
  
   –  Discovery-­‐aware	
  memcached	
  based	
  backend	
  
   –  Client	
  abstracMons	
  for	
  zone	
  aware	
  replicaMon	
  
   –  OpMon	
  to	
  write	
  to	
  all	
  zones,	
  fast	
  read	
  from	
  local	
  
•  Cassandra	
  
   –  Highly	
  available	
  and	
  scalable	
  (more	
  later…)	
  
•  MongoDB	
  
   –  Complex	
  object/query	
  model	
  for	
  small	
  scale	
  use	
  
•  MySQL	
  
   –  Hard	
  to	
  scale,	
  legacy	
  and	
  small	
  relaMonal	
  models	
  
Why	
  Cassandra?	
  
•  We	
  value	
  Availability	
  over	
  Consistency	
  –	
  AP	
  
     –  Cassandra	
  is	
  a	
  Java	
  distributed	
  systems	
  toolkit	
  
•  We	
  have	
  a	
  building	
  full	
  of	
  Java	
  engineers	
  
     –  Riak	
  is	
  in	
  Erlang	
  –	
  a	
  blessing	
  and	
  a	
  curse…	
  
•  We	
  want	
  FOSS	
  +	
  Support	
  
     –  Voldemort	
  doesn’t	
  have	
  a	
  support	
  model	
  
•  Writes	
  are	
  intrinsically	
  harder	
  than	
  reads	
  
     –  Hbase	
  is	
  CP	
  opMmized	
  for	
  reads	
  &	
  single	
  namenode	
  issues	
  
•  Cassandra	
  works,	
  running	
  ~55	
  clusters	
  
     –  Step	
  by	
  step	
  into	
  full	
  producMon	
  over	
  the	
  last	
  year	
  
Priam	
  –	
  Cassandra	
  AutomaMon	
  
                Available	
  at	
  hCp://github.com/ne$lix	
  

•    Ne$lix	
  Pla$orm	
  Tomcat	
  Code	
  
•    Zero	
  touch	
  auto-­‐configuraMon	
  
•    State	
  management	
  for	
  Cassandra	
  JVM	
  
•    Token	
  allocaMon	
  and	
  assignment	
  
•    Broken	
  node	
  auto-­‐replacement	
  
•    Full	
  and	
  incremental	
  backup	
  to	
  S3	
  
•    Restore	
  sequencing	
  from	
  S3	
  
•    Grow/Shrink	
  Cassandra	
  “ring”	
  
Astyanax	
  
                         Available	
  at	
  hCp://github.com/ne$lix	
  

•  Cassandra	
  java	
  client	
  
•  API	
  abstracMon	
  on	
  top	
  of	
  Thri?	
  protocol	
  
•  “Fixed”	
  ConnecMon	
  Pool	
  abstracMon	
  (vs.	
  Hector)	
  
      –    Round	
  robin	
  with	
  Failover	
  
      –    Retry-­‐able	
  operaMons	
  not	
  Med	
  to	
  a	
  connecMon	
  
      –    Ne$lix	
  PaaS	
  Discovery	
  service	
  integraMon	
  
      –    Host	
  reconnect	
  (fixed	
  interval	
  or	
  exponenMal	
  backoff)	
  
      –    Token	
  aware	
  to	
  save	
  a	
  network	
  hop	
  –	
  lower	
  latency	
  
      –    Latency	
  aware	
  to	
  avoid	
  compacMng/repairing	
  nodes	
  –	
  lower	
  variance	
  
•    Batch	
  mutaMon:	
  set,	
  put,	
  delete,	
  increment	
  
•    Simplified	
  use	
  of	
  serializers	
  via	
  method	
  overloading	
  (vs.	
  Hector)	
  
•    ConnecMonPoolMonitor	
  interface	
  for	
  counters	
  and	
  tracers	
  
•    Composite	
  Column	
  Names	
  replacing	
  deprecated	
  SuperColumns	
  
IniMalizing	
  Astyanax	
  
// Configuration either set in code or nfastyanax.properties
platform.ListOfComponentsToInit=LOGGING,APPINFO,DISCOVERY
netflix.environment=test
default.astyanax.readConsistency=CL_QUORUM
default.astyanax.writeConsistency=CL_QUORUM
MyCluster.MyKeyspace.astyanax.servers=127.0.0.1

// Must initialize platform for discovery to work
NFLibraryManager.initLibrary(PlatformManager.class, props, false, true);
NFLibraryManager.initLibrary(NFAstyanaxManager.class, props, true, false);

// Open a keyspace instance
Keyspace keyspace = KeyspaceFactory.openKeyspace(”MyCluster”,”MyKeyspace");
Astyanax	
  Query	
  Example	
  
Paginate	
  through	
  all	
  columns	
  in	
  a	
  row	
  
ColumnList<String>	
  columns;	
  
int	
  pageize	
  =	
  10;	
  
try	
  {	
  
	
  	
  	
  	
  RowQuery<String,	
  String>	
  query	
  =	
  keyspace	
  
	
  	
  	
  	
  	
  	
  	
  	
  .prepareQuery(CF_STANDARD1)	
  
	
  	
  	
  	
  	
  	
  	
  	
  .getKey("A")	
  
	
  	
  	
  	
  	
  	
  	
  	
  .setIsPaginaMng()	
  
	
  	
  	
  	
  	
  	
  	
  	
  .withColumnRange(new	
  RangeBuilder().setMaxSize(pageize).build());	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  while	
  (!(columns	
  =	
  query.execute().getResult()).isEmpty())	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  for	
  (Column<String>	
  c	
  :	
  columns)	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  }	
  
}	
  catch	
  (ConnecMonExcepMon	
  e)	
  {	
  
} 	
  	
  
	
  
Data	
  MigraMon	
  to	
  Cassandra	
  
Distributed	
  Key-­‐Value	
  Stores	
  
•  Cloud	
  has	
  many	
  key-­‐value	
  data	
  stores	
  
    –  More	
  complex	
  to	
  keep	
  track	
  of,	
  do	
  backups	
  etc.	
  
    –  Each	
  store	
  is	
  much	
  simpler	
  to	
  administer	
   DBA	
  
    –  Joins	
  take	
  place	
  in	
  java	
  code	
  
•  No	
  schema	
  to	
  change,	
  no	
  scheduled	
  downMme	
  
•  Latency	
  for	
  typical	
  queries	
  
    –  Memcached	
  is	
  dominated	
  by	
  network	
  latency	
  <1ms	
  
    –  Cassandra	
  takes	
  a	
  few	
  milliseconds	
  
    –  SimpleDB	
  replicaMon	
  and	
  REST	
  auth	
  overheads	
  >10ms	
  
MulA-­‐Regional	
  Data	
  ReplicaAon	
  
•  IR	
  Framework	
  –	
  Datacenter	
  Item	
  Replicator	
  
    –  Built	
  in	
  2009,	
  first	
  step	
  to	
  the	
  cloud	
  
    –  Oracle	
  to	
  SimpleDB	
  or	
  Cassandra	
  via	
  poll	
  and	
  push	
  
    –  Return	
  updates	
  to	
  Oracle	
  via	
  SQS	
  message	
  queue	
  
•  SimpleDB	
  or	
  S3	
  to	
  Cassandra	
  
    –  Data	
  migraMon	
  tool	
  for	
  global	
  Ne$lix	
  
•  Global	
  SimpleDB	
  and	
  S3	
  ReplicaMon	
  
    –  Cross	
  region	
  async	
  updates	
  USA	
  to	
  Europe	
  
TransiAonal	
  Steps	
  
•  BidirecMonal	
  ReplicaMon	
  
   –  Oracle	
  to	
  SimpleDB	
  
   –  Queued	
  reverse	
  path	
  using	
  SQS	
  
   –  Backups	
  remain	
  in	
  Datacenter	
  via	
  Oracle	
  
•  New	
  Cloud-­‐Only	
  Data	
  Sources	
  
   –  Cassandra	
  based	
  
   –  No	
  replicaMon	
  to	
  Datacenter	
  
   –  Backups	
  performed	
  in	
  the	
  cloud	
  
API	
  
AWS	
  EC2	
  
                                            Front	
  End	
  Load	
  Balancer	
  
             Discovery	
  
              Service	
                               API	
  Proxy	
                              API	
  etc.	
  

                                                   Load	
  Balancer	
  


          Component	
                                      API	
               SQS	
  
           Services	
                                                                           Oracl
                                                                                                 e	
  
                                                                                                 Oracle	
  
                                                                                                       Oracle	
  
Cassandra	
             memcached	
                                            ReplicaMon	
  
                                                            memcached	
  
           EC2	
  
         Internal	
  
           Disks	
  

                                                                                                NePlix	
  
                                   S3	
                                                         Data	
  Center	
  
                                                                         SimpleDB	
  
Cufng	
  the	
  Umbilical	
  
•  TransiMon	
  Oracle	
  Data	
  Sources	
  to	
  Cassandra	
  
    –  Offload	
  Datacenter	
  Oracle	
  hardware	
  
    –  Free	
  up	
  capacity	
  for	
  growth	
  of	
  remaining	
  services	
  
•  TransiMon	
  SimpleDB+Memcached	
  to	
  Cassandra	
  
    –  Primary	
  data	
  sources	
  that	
  need	
  backup	
  
    –  Keep	
  simplest	
  small	
  use	
  cases	
  for	
  now	
  
•  New	
  challenges	
  
    –  Backup,	
  restore,	
  archive,	
  business	
  conMnuity	
  
    –  Business	
  Intelligence	
  integraMon	
  
API	
  
AWS	
  EC2	
  
                                   Front	
  End	
  Load	
  Balancer	
  
            Discovery	
  
             Service	
                        API	
  Proxy	
  

                                          Load	
  Balancer	
  


          Component	
                             API	
  
           Services	
  



                 memcached	
                  Cassandra	
  
                                                              EC2	
  
                                                            Internal	
  
                                                              Disks	
  

                                 Backup	
  
                   S3	
  
                                                                           SimpleDB	
  
High	
  Availability	
  
•  Cassandra	
  stores	
  3	
  local	
  copies,	
  1	
  per	
  zone	
  
       –  Synchronous	
  access,	
  durable,	
  highly	
  available	
  
       –  Read/Write	
  One	
  fastest,	
  least	
  consistent	
  -­‐	
  ~1ms	
  
       –  Read/Write	
  Quorum	
  2	
  of	
  3,	
  consistent	
  -­‐	
  ~3ms	
  
•  AWS	
  Availability	
  Zones	
  
       –  Separate	
  buildings	
  
       –  Separate	
  power	
  etc.	
  
       –  Fairly	
  close	
  together	
  
	
  
“TradiMonal”	
  Cassandra	
  Write	
  Data	
  Flows	
  
            Single	
  Region,	
  MulMple	
  Availability	
  Zone,	
  Not	
  Token	
  Aware	
  

                                                               Cassandra	
  
                                                               • Disks	
  
                                                               • Zone	
  A	
  
                                                              2	
                 2	
  
                                                                        4	
   2	
  
1.  Client	
  Writes	
  to	
  any	
     Cassandra	
  3	
                                  3	
  
                                                                                           Cassandra	
         If	
  a	
  node	
  goes	
  offline,	
  
    Cassandra	
  Node	
                 • Disks	
   5                                      • Disks	
   5	
     hinted	
  handoff	
  
2.  Coordinator	
  Node	
               • Zone	
  C	
                  1                   • Zone	
  A	
       completes	
  the	
  write	
  
    replicates	
  to	
  nodes	
                                                                                when	
  the	
  node	
  comes	
  
    and	
  Zones	
  
                                                             Non	
  Token	
                                    back	
  up.	
  
3.  Nodes	
  return	
  ack	
  to	
  
                                                              Aware	
                                          	
  
    coordinator	
                                             Clients	
                                        Requests	
  can	
  choose	
  to	
  
4.  Coordinator	
  returns	
                                                                 3	
               wait	
  for	
  one	
  node,	
  a	
  
                                        Cassandra	
                                        Cassandra	
  
    ack	
  to	
  client	
               • Disks	
                                          • Disks	
   5	
     quorum,	
  or	
  all	
  nodes	
  to	
  
5.  Data	
  wriCen	
  to	
              • Zone	
  C	
                                      • Zone	
  B	
       ack	
  the	
  write	
  
    internal	
  commit	
  log	
                                                                                	
  
    disk	
  (no	
  more	
  than	
                              Cassandra	
                                     SSTable	
  disk	
  writes	
  and	
  
                                                               • Disks	
  
    10	
  seconds	
  later)	
                                  • Zone	
  B	
  
                                                                                                               compacMons	
  occur	
  
                                                                                                               asynchronously	
  
Astyanax	
  -­‐	
  Cassandra	
  Write	
  Data	
  Flows	
  
                Single	
  Region,	
  MulMple	
  Availability	
  Zone,	
  Token	
  Aware	
  

                                                            Cassandra	
  
                                                            • Disks	
  
                                                            • Zone	
  A	
  

1.  Client	
  Writes	
  to	
           Cassandra	
  2	
                       2	
  
                                                                               Cassandra	
         If	
  a	
  node	
  goes	
  offline,	
  
    nodes	
  and	
  Zones	
            • Disks	
   3                           • Disks	
   3	
     hinted	
  handoff	
  
2.  Nodes	
  return	
  ack	
  to	
     • Zone	
  C	
                1          • Zone	
  A	
       completes	
  the	
  write	
  
    client	
  
3.  Data	
  wriCen	
  to	
  
                                                            Token	
                                when	
  the	
  node	
  comes	
  
                                                                                                   back	
  up.	
  
    internal	
  commit	
  log	
                             Aware	
                                	
  
    disks	
  (no	
  more	
  than	
                          Clients	
            2	
  
                                                                                                   Requests	
  can	
  choose	
  to	
  
    10	
  seconds	
  later)	
          Cassandra	
                             Cassandra	
         wait	
  for	
  one	
  node,	
  a	
  
                                       • Disks	
                               • Disks	
   3	
     quorum,	
  or	
  all	
  nodes	
  to	
  
                                       • Zone	
  C	
                           • Zone	
  B	
       ack	
  the	
  write	
  
                                                                                                   	
  
                                                            Cassandra	
                            SSTable	
  disk	
  writes	
  and	
  
                                                            • Disks	
  
                                                            • Zone	
  B	
  
                                                                                                   compacMons	
  occur	
  
                                                                                                   asynchronously	
  
Data	
  Flows	
  for	
  MulM-­‐Region	
  Writes	
  
              Token	
  Aware,	
  Consistency	
  Level	
  =	
  Local	
  Quorum	
  

1.  Client	
  writes	
  to	
  local	
  replicas	
                                If	
  a	
  node	
  or	
  region	
  goes	
  offline,	
  hinted	
  handoff	
  
2.  Local	
  write	
  acks	
  returned	
  to	
                                   completes	
  the	
  write	
  when	
  the	
  node	
  comes	
  back	
  up.	
  
    Client	
  which	
  conMnues	
  when	
                                        Nightly	
  global	
  compare	
  and	
  repair	
  jobs	
  ensure	
  
    2	
  of	
  3	
  local	
  nodes	
  are	
                                      everything	
  stays	
  consistent.	
  
    commiCed	
  
3.  Local	
  coordinator	
  writes	
  to	
  
    remote	
  coordinator.	
  	
                                                  Cassandra	
                           100+ms	
  latency	
  
4.  When	
  data	
  arrives,	
  remote	
  
                                                                                                                                                                Cassandra	
  
                                                                                  •  Disks	
                                                                    •  Disks	
  
                                                                                  •  Zone	
  A	
                                                                •  Zone	
  A	
  

    coordinator	
  node	
  acks	
  and	
              Cassandra	
        2	
                          2	
  
                                                                                                     Cassandra	
                           Cassandra	
                             4	
  
                                                                                                                                                                                    Cassandra	
  
                                                                6	
                                                6	
   3	
            5	
   Disks	
  6	
  
    copies	
  to	
  other	
  remote	
  zones	
                                                                                                                                              6	
  
                                                      •  Disks	
                                     •  Disks	
  
                                                      •  Zone	
  C	
                                 •  Zone	
  A	
  
                                                                                                                                         • 
                                                                                                                                           •  Zone	
  C	
                          4	
  Disks	
  A	
  
                                                                                                                                                                                    • 
                                                                                                                                                                                    •  Zone	
  
                                                                                           1	
  
                                                                                                                                                                                           4	
  
5.  Remote	
  nodes	
  ack	
  to	
  local	
                                        US	
                                                                          EU	
  
    coordinator	
                                                                Clients	
                                                                     Clients	
  
                                                      Cassandra	
                                          2	
  
                                                                                                     Cassandra	
                           Cassandra	
                              5	
  
                                                                                                                                                                                    Cassandra	
  
6.  Data	
  flushed	
  to	
  internal	
                •  Disks	
  
                                                      •  Zone	
  C	
  
                                                                                                     •  Disks	
  
                                                                                                                   6	
  
                                                                                                     •  Zone	
  B	
  
                                                                                                                                           •  Disks	
  
                                                                                                                                           •  Zone	
  C	
  
                                                                                                                                                                                    •  Disks	
  6	
  
                                                                                                                                                                                    •  Zone	
  B	
  

    commit	
  log	
  disks	
  (no	
  more	
                                       Cassandra	
                                                                   Cassandra	
  

    than	
  10	
  seconds	
  later)	
  
                                                                                  •  Disks	
                                                                    •  Disks	
  
                                                                                  •  Zone	
  B	
                                                                •  Zone	
  B	
  
Remote	
  Copies	
  
•  Cassandra	
  duplicates	
  across	
  AWS	
  regions	
  
    –  Asynchronous	
  write,	
  replicates	
  at	
  desMnaMon	
  
    –  Doesn’t	
  directly	
  affect	
  local	
  read/write	
  latency	
  
•  Global	
  Coverage	
  
    –  Business	
  agility	
  
    –  Follow	
  AWS…	
             ?
•  Local	
  Access	
                                        ?
                                                        ?
    –  BeCer	
  latency	
               3
                                             A                              3
    –  Fault	
  IsolaMon	
  
    	
  
Cassandra	
  Backup	
  	
  
•  Full	
  Backup	
                                                                      Cassandra	
  

                                                                  Cassandra	
                                   Cassandra	
  

    –  Time	
  based	
  snapshot	
  
    –  SSTable	
  compress	
  -­‐>	
  S3	
        Cassandra	
                                                                   Cassandra	
  




•  Incremental	
                                                                           S3	
  
                                                                                         Backup	
  
                                               Cassandra	
                                                                         Cassandra	
  

    –  SSTable	
  write	
  triggers	
  
       compressed	
  copy	
  to	
  S3	
                  Cassandra	
                                                     Cassandra	
  


•  Archive	
                                                                 Cassandra	
             Cassandra	
  


    –  Copy	
  cross	
  region	
  
                                                      A	
  
Cassandra	
  Restore	
  
•  Full	
  Restore	
                                                                   Cassandra	
  

                                                                Cassandra	
                                   Cassandra	
  

    –  Replace	
  previous	
  data	
  
•  New	
  Ring	
  from	
  Backup	
              Cassandra	
                                                                   Cassandra	
  




    –  New	
  name	
  old	
  data	
                                                      S3	
  
                                                                                       Backup	
  
                                             Cassandra	
                                                                         Cassandra	
  

•  Scripted	
  
    –  Create	
  new	
  instances	
                    Cassandra	
                                                     Cassandra	
  



    –  Parallel	
  load	
  -­‐	
  fast	
                                   Cassandra	
             Cassandra	
  
Cassandra	
  Online	
  AnalyMcs	
  
•  Brisk	
  =	
  Hadoop	
  +	
  Cass	
                                                   Cassandra	
  



    –  “Cassandra	
  Enterprise”	
  
                                                                 Brisk	
                                        Cassandra	
  




    –  Use	
  split	
  Brisk	
  ring	
           Brisk	
                                                                        Cassandra	
  


    –  Size	
  each	
  separately	
  
                                                                                           S3	
  

•  Direct	
  Access	
                      Cassandra	
  
                                                                                         Backup	
  
                                                                                                                                   Cassandra	
  



    –  Keyspaces	
  
    –  Hive/Pig/Map-­‐Reduce	
                       Cassandra	
                                                         Cassandra	
  




    –  Hdfs	
  as	
  a	
  keyspace	
                                         Cassandra	
             Cassandra	
  



    –  Distributed	
  namenode	
  
ETL	
  for	
  Cassandra	
  
•    Data	
  is	
  de-­‐normalized	
  over	
  many	
  clusters!	
  
•    Too	
  many	
  to	
  restore	
  from	
  backups	
  for	
  ETL	
  
•    SoluMon	
  –	
  read	
  backup	
  files	
  using	
  Hadoop	
  
•    Aegisthus	
  
      –  hCp://techblog.ne$lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html	
  

      –  High	
  throughput	
  raw	
  SSTable	
  processing	
  
      –  Re-­‐normalizes	
  many	
  clusters	
  to	
  a	
  consistent	
  view	
  
      –  Extract,	
  Transform,	
  then	
  Load	
  into	
  Teradata	
  
Cassandra	
  Archive	
                                             A	
  

                     Appropriate	
  level	
  of	
  paranoia	
  needed…                       	
  
•  Archive	
  could	
  be	
  un-­‐readable	
  
     –  Restore	
  S3	
  backups	
  weekly	
  from	
  prod	
  to	
  test,	
  and	
  daily	
  ETL	
  

•  Archive	
  could	
  be	
  stolen	
  
     –  PGP	
  Encrypt	
  archive	
  

•  AWS	
  East	
  Region	
  could	
  have	
  a	
  problem	
  
     –  Copy	
  data	
  to	
  AWS	
  West	
  

•  ProducMon	
  AWS	
  Account	
  could	
  have	
  an	
  issue	
  
     –  Separate	
  Archive	
  account	
  with	
  no-­‐delete	
  S3	
  ACL	
  

•  AWS	
  S3	
  could	
  have	
  a	
  global	
  problem	
  
     –  Create	
  an	
  extra	
  copy	
  on	
  a	
  different	
  cloud	
  vendor….	
  
Extending	
  to	
  MulM-­‐Region	
  
                                   In	
  producMon	
  for	
  UK/Eire	
  support	
  


1.    Create	
  cluster	
  in	
  EU	
                                      Take	
  a	
  Boeing	
  737	
  on	
  a	
  domesMc	
  flight,	
  upgrade	
  it	
  to	
  
                                                                           a	
  747	
  by	
  adding	
  more	
  engines,	
  fuel	
  and	
  bigger	
  wings	
  
2.    Backup	
  US	
  cluster	
  to	
  S3	
  
                                                                           and	
  fly	
  it	
  to	
  Europe	
  without	
  landing	
  it	
  on	
  the	
  way…	
  
3.    Restore	
  backup	
  in	
  EU	
  
4.    Local	
  repair	
  EU	
  cluster	
  
5.    Global	
  repair/join	
  
                                                                             Cassandra	
                           100+ms	
  latency	
                    Cassandra	
        1	
  
                                                                             •  Disks	
                                                                   •  Disks	
  
                                                                             •  Zone	
  A	
                                                               •  Zone	
  A	
  


                                                Cassandra	
                                     Cassandra	
                         Cassandra	
                                Cassandra	
  
                                                •  Disks	
                                      •  Disks	
                          •  Disks	
                                 •  Disks	
  
                                                •  Zone	
  C	
                                  •  Zone	
  A	
                      •  Zone	
  C	
                             •  Zone	
  A	
  


                                                                             US	
                                          5	
                             EU	
  
                                                                           Clients	
                                                                     Clients	
  
                                                Cassandra	
                                     Cassandra	
                         Cassandra	
                                Cassandra	
  
                                                •  Disks	
                                      •  Disks	
                          •  Disks	
                                 •  Disks	
  
                                                •  Zone	
  C	
                                  •  Zone	
  B	
                      •  Zone	
  C	
                             •  Zone	
  B	
  


                                                                             Cassandra	
                                                                  Cassandra	
  
                                                                             •  Disks	
                                                                   •  Disks	
  
                                                                             •  Zone	
  B	
  
                                                                                                                                                 3	
      •  Zone	
  B	
  
                                                                                                                                                                                 4	
  
                                                                   2	
  
                                                                                   S3	
  
Tools	
  and	
  AutomaMon	
  
•  Developer	
  and	
  Build	
  Tools	
  
      –  Jira,	
  Perforce,	
  Eclipse,	
  Jenkins,	
  Ivy,	
  ArMfactory	
  
      –  Builds,	
  creates	
  .war	
  file,	
  .rpm,	
  bakes	
  AMI	
  and	
  launches	
  

•  Custom	
  Ne$lix	
  ApplicaMon	
  Console	
  
      –  AWS	
  Features	
  at	
  Enterprise	
  Scale	
  (hide	
  the	
  AWS	
  security	
  keys!)	
  
      –  Auto	
  Scaler	
  Group	
  is	
  unit	
  of	
  deployment	
  to	
  producMon	
  

•  Open	
  Source	
  +	
  Support	
  
      –  Apache,	
  Tomcat,	
  Cassandra,	
  Hadoop	
  
      –  Datastax	
  support	
  for	
  Cassandra,	
  AWS	
  support	
  for	
  Hadoop	
  via	
  EMR	
  

•  Monitoring	
  Tools	
  
      –  Alert	
  processing	
  gateway	
  into	
  Pagerduty	
  
      –  AppDynamics	
  –	
  Developer	
  focus	
  for	
  cloud	
  hCp://appdynamics.com	
  
NoSQL	
  Developer	
  MigraMon	
  
•  Jason	
  Brown	
  @jasobrown	
  
   –  Cassandra	
  from	
  the	
  Trenches	
  
   –  slideshare.net/ne$lix	
  
•  Mark	
  Atwood,	
  "Guide	
  to	
  NoSQL,	
  redux”	
  
   –  YouTube	
  hCp://youtu.be/zAbFRiyT3LU	
  
Open	
  Sourcing	
  the	
  Ne$lix	
  PaaS	
  
Open	
  Source	
  Strategy	
  
•  Release	
  PaaS	
  Components	
  git-­‐by-­‐git	
  
   –  Source	
  at	
  github.com/ne$lix	
  
   –  Intros	
  and	
  techniques	
  at	
  techblog.ne$lix.com	
  
   –  Blog	
  post	
  or	
  new	
  code	
  every	
  week	
  or	
  so	
  
•  MoMvaMons	
  
   –  Give	
  back	
  to	
  Apache	
  licensed	
  OSS	
  community	
  
   –  MoMvate,	
  retain,	
  hire	
  top	
  engineers	
  
   –  Create	
  a	
  community	
  that	
  adds	
  features	
  and	
  fixes	
  
Current	
  OSS	
  Projects	
  and	
  Posts	
  
Github	
  /	
  Techblog	
  
                                 Priam	
       Exhibitor	
             Servo	
  
  Apache	
  Project	
  

  Techblog	
  Post	
           Astyanax	
       Curator	
      Autoscaling	
  scripts	
  



                              CassJMeter	
     Zookeeper	
             Honu	
  



                              Cassandra	
       EVCache	
        Circuit	
  Breaker	
  



                               Aegisthus	
  
Takeaway	
  
                                                     	
  
 NePlix	
  has	
  built	
  and	
  deployed	
  a	
  scalable	
  global	
  PlaPorm	
  as	
  a	
  Service.	
  
                                                     	
  
Key	
  components	
  of	
  the	
  NePlix	
  PaaS	
  are	
  being	
  released	
  as	
  Open	
  Source	
  
                   projects	
  so	
  you	
  can	
  build	
  your	
  own	
  custom	
  PaaS.	
  
                                                     	
  
                                  hCp://github.com/Ne$lix	
  
                                 hCp://techblog.ne$lix.com	
  
                                 hCp://slideshare.net/Ne$lix	
  
                                               	
  
                          hCp://www.linkedin.com/in/adriancockcro?	
  
                                  @adrianco	
  #ne$lixcloud	
  
                                               	
  

                                  End	
  of	
  Part	
  2	
  of	
  3	
  

More Related Content

What's hot

Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksSudhir Tonse
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Ceate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureCeate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureAmazon Web Services
 
How to Migrate your Startup to AWS
How to Migrate your Startup to AWSHow to Migrate your Startup to AWS
How to Migrate your Startup to AWSAmazon Web Services
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon AuroraAmazon Web Services
 
High Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesHigh Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesRightScale
 
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseDay 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseAmazon Web Services
 
ENT307 VMware and AWS Together - VMware Cloud on AWS
ENT307 VMware and AWS Together - VMware Cloud on AWSENT307 VMware and AWS Together - VMware Cloud on AWS
ENT307 VMware and AWS Together - VMware Cloud on AWSAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...Amazon Web Services
 
AWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applicationsAWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applicationsAmazon Web Services
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
 
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...Amazon Web Services
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 

What's hot (20)

Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building Blocks
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Ceate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureCeate a Scalable Cloud Architecture
Ceate a Scalable Cloud Architecture
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
How to Migrate your Startup to AWS
How to Migrate your Startup to AWSHow to Migrate your Startup to AWS
How to Migrate your Startup to AWS
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
High Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesHigh Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best Practices
 
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseDay 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance Database
 
ENT307 VMware and AWS Together - VMware Cloud on AWS
ENT307 VMware and AWS Together - VMware Cloud on AWSENT307 VMware and AWS Together - VMware Cloud on AWS
ENT307 VMware and AWS Together - VMware Cloud on AWS
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
 
AWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applicationsAWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applications
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
 
Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 

Viewers also liked

Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Cloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best PracticesCloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best PracticesSascha Möllering
 
(ARC401) Cloud First: New Architecture for New Infrastructure
(ARC401) Cloud First: New Architecture for New Infrastructure(ARC401) Cloud First: New Architecture for New Infrastructure
(ARC401) Cloud First: New Architecture for New InfrastructureAmazon Web Services
 
Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...
Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...
Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...Milan Guenther (eda.c)
 
Microsoft's New Platform
Microsoft's New PlatformMicrosoft's New Platform
Microsoft's New PlatformJohn Rymer
 
Nuts and bolts of running a popular site in the aws cloud
Nuts and bolts of running a popular site in the aws cloudNuts and bolts of running a popular site in the aws cloud
Nuts and bolts of running a popular site in the aws cloudDavid Veksler
 
Enterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless ApproachEnterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless ApproachRightScale
 
Millicomputing Usenix 2008
Millicomputing Usenix 2008Millicomputing Usenix 2008
Millicomputing Usenix 2008Adrian Cockcroft
 
Architecture Best Practices
Architecture Best PracticesArchitecture Best Practices
Architecture Best PracticesAWS Germany
 
Orchestrating Cloud-Native and Traditional Application Architectures
Orchestrating Cloud-Native and Traditional Application ArchitecturesOrchestrating Cloud-Native and Traditional Application Architectures
Orchestrating Cloud-Native and Traditional Application ArchitecturesApprenda
 
Auto scaling websites in the cloud
Auto scaling websites in the cloudAuto scaling websites in the cloud
Auto scaling websites in the cloudDavid Veksler
 
Building Cloud Native Applications
Building Cloud Native Applications Building Cloud Native Applications
Building Cloud Native Applications Munish Gupta
 
Tools and Platforms for OpenFlow/SDN
Tools and Platforms for OpenFlow/SDNTools and Platforms for OpenFlow/SDN
Tools and Platforms for OpenFlow/SDNUmesh Krishnaswamy
 
Enterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best PracticesEnterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best PracticesDavid Veksler
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 

Viewers also liked (20)

Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Speeding Up Innovation
Speeding Up InnovationSpeeding Up Innovation
Speeding Up Innovation
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Cloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best PracticesCloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best Practices
 
(ARC401) Cloud First: New Architecture for New Infrastructure
(ARC401) Cloud First: New Architecture for New Infrastructure(ARC401) Cloud First: New Architecture for New Infrastructure
(ARC401) Cloud First: New Architecture for New Infrastructure
 
Culture
CultureCulture
Culture
 
Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...
Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...
Enterprise Design and the Future of Enterprise Architecture - Dansk IT EA Con...
 
Microsoft's New Platform
Microsoft's New PlatformMicrosoft's New Platform
Microsoft's New Platform
 
Nuts and bolts of running a popular site in the aws cloud
Nuts and bolts of running a popular site in the aws cloudNuts and bolts of running a popular site in the aws cloud
Nuts and bolts of running a popular site in the aws cloud
 
Enterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless ApproachEnterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless Approach
 
Millicomputing Usenix 2008
Millicomputing Usenix 2008Millicomputing Usenix 2008
Millicomputing Usenix 2008
 
Azure cloud governance deck
Azure cloud governance deckAzure cloud governance deck
Azure cloud governance deck
 
Architecture Best Practices
Architecture Best PracticesArchitecture Best Practices
Architecture Best Practices
 
Orchestrating Cloud-Native and Traditional Application Architectures
Orchestrating Cloud-Native and Traditional Application ArchitecturesOrchestrating Cloud-Native and Traditional Application Architectures
Orchestrating Cloud-Native and Traditional Application Architectures
 
Auto scaling websites in the cloud
Auto scaling websites in the cloudAuto scaling websites in the cloud
Auto scaling websites in the cloud
 
Building Cloud Native Applications
Building Cloud Native Applications Building Cloud Native Applications
Building Cloud Native Applications
 
Tools and Platforms for OpenFlow/SDN
Tools and Platforms for OpenFlow/SDNTools and Platforms for OpenFlow/SDN
Tools and Platforms for OpenFlow/SDN
 
Enterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best PracticesEnterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best Practices
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 

Similar to Cloud Architecture Tutorial: Platform Component Architecture

Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScalemmoline
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance_Out.pptx
Performance_Out.pptxPerformance_Out.pptx
Performance_Out.pptxsanjanabal
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Rami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerRami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerWeb à Québec
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
 
ASP.NET Scalability - WebDD
ASP.NET Scalability - WebDDASP.NET Scalability - WebDD
ASP.NET Scalability - WebDDPhil Pursglove
 

Similar to Cloud Architecture Tutorial: Platform Component Architecture (20)

Oracle application container cloud back end integration using node final
Oracle application container cloud back end integration using node finalOracle application container cloud back end integration using node final
Oracle application container cloud back end integration using node final
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance_Out.pptx
Performance_Out.pptxPerformance_Out.pptx
Performance_Out.pptx
 
2 7
2 72 7
2 7
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
title
titletitle
title
 
Performance out
Performance outPerformance out
Performance out
 
Rami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerRami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with Docker
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
ASP.NET Scalability - WebDD
ASP.NET Scalability - WebDDASP.NET Scalability - WebDD
ASP.NET Scalability - WebDD
 

More from Adrian Cockcroft

CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 

More from Adrian Cockcroft (14)

CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
NoSQL for Netflix
NoSQL for NetflixNoSQL for Netflix
NoSQL for Netflix
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Cloud Architecture Tutorial: Platform Component Architecture

  • 1. Cloud  Architecture  Tutorial   Pla$orm  Component  Architecture     Part  2  of  3 Qcon  London  March  5th,  2012   Adrian  Cockcro?   @adrianco  #ne$lixcloud   hCp://www.linkedin.com/in/adriancockcro?  
  • 2. Don’t  Do  That!   A  Discussion  of  AnM-­‐Architecture   (wriCen  as  an  Ignite  talk)  
  • 3. Architecture   PaCerns  to  guide  detailed   design  and  construcMon  
  • 4. AnM-­‐Architecture   Constraints  that  limit  detailed   design  and  construcMon  
  • 6. How  could  that  happen?  
  • 7. Anatomy  of  a  Failure  
  • 8. What  I  Wanted   •  Moving  to  Cassandra  as  primary  data  store   •  We  need  backups!   •  We  are  running  on  AWS…     I  want  Cassandra  backups  to  S3   Start  with  full  backup,  incremental  later   Restore  to  a  different  Cassandra  cluster  
  • 9. AddiMonal  Goals   I  would  like  it  next  week  -­‐  Keep  it  simple   No  single  point  of  failure   Get  once  a  day  full  backup  working  first  
  • 10. Prototype   •  Created  S3  bucket   •  Carefully  figured  out  a  good  S3  path  hierarchy   •  Wrote  a  simple  backup  script   •  Added  it  to  cron   •  ….   •  Profit!   (total  Mme  half  a  day)  
  • 11. Now  comes  the  hard  part!   Restore  is  trickier,  Cassandra  is  wriCen  in  Java,   programmer  from  another  team  takes  over…   Here’s  the  S3  bucket,  backups  are  being   collected  already,  please  figure  out  how  to   restore  it.  Done  by  next  week  perhaps?  
  • 12. Days  Pass…   •  Programmer  is  re-­‐wriMng  backup  in  python   •  Installs  Python  2.7  on  CentOS,  breaks  yum   •  Backup  remotely  invoked  from  a  central  point   •  Cassandra  patched  to  do  incremental  backups  
  • 13. Weeks  Pass…   •  Python  based  full  backup  &  restore  works!   •  But  only  to  the  Cassandra  cluster  it  came  from   •  Incremental  backup  works!   •  Restore  not  done  yet…  
  • 14. Cassandra  in  ProducMon   We  do  have  backups  running  now,  right?   We’ll  get  right  on  it…   I  want  the  producKon  backup  restored  in  test.   Oh,  didn’t  implement  that  feature  yet…  
  • 15. Whoops!   ProducMon  data  trashed  while  sefng  up  backup   Luckily  –  it  was  recoverable  from  elsewhere  
  • 16. Months  Pass   •  Python  prototype  re-­‐wriCen  in  Java  (Priam)   •  Integrated  with  other  management  funcMons   •  Decentralized  backups  again  (yay!)   •  Reliable  backups   •  Restore  to  test   •  Not  simple   •  Took  too  long…  
  • 17. AnM-­‐Architecture   •  Define  the  things  you  don’t  want   •  Constrain  the  outcome   •  Check  that  the  constraints  are  being  met   •  …   •  Profit!  
  • 19. AnM-­‐Architecture   Define  the  space  the  thing  will  inhabit       (All  pictures  in  this  secMon  were  found   on  google  images)  
  • 20. Cloud  Architecture  PaCerns   Where  do  we  start?  
  • 21. Goals   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenMle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verMcally  scaled  databases   –  Leverage  AWS  elasMc  capacity  effecMvely   •  Available   –  SubstanMally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulMple  AWS  availability  zones   –  No  scheduled  down  Mme,  no  central  database  schema  to  change   •  ProducMve   –  OpMmize  agility  of  a  large  development  team  with  automaMon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  • 22. Datacenter  AnM-­‐PaCerns   What  do  we  currently  do  in  the   datacenter  that  prevents  us  from   meeMng  our  goals?    
  • 23. Architecture   •  So?ware  Architecture   –  The  abstracMons  and  interfaces  that  developers  build   against   •  Systems  Architecture   –  The  service  instances  that  define  availability,   scalability   •  Compose-­‐ability   –  so?ware  architecture  that  is  independent  of  the   systems  architecture   –  decoupled  flexible  building  block  components    
  • 24. Rewrite  from  Scratch   Not  everything  is  cloud  specific   Pay  down  technical  debt   Robust  paCerns  
  • 25. Ne$lix  Datacenter  vs.  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SMcky  In-­‐Memory  Session   Shared  Memcached  Session   ChaCy  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  PaCerns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 26. The  Central  SQL  Database   •  Datacenter  has  a  central  database   –  Everything  in  one  place  is  convenient  unMl  it  fails   –  Customers,  movies,  history,  configuraMon   •  Schema  changes  require  downMme     AnK-­‐paMern  impacts  scalability,  availability  
  • 27. The  Distributed  Key-­‐Value  Store   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   DBA   –  Joins  take  place  in  java  code   –  No  schema  to  change,  no  scheduled  downMme   •  Mean  Latency  for  Simple  Key  Lookup  Queries   –  Memcached  is  dominated  by  network  latency  <1ms   –  Cassandra  around  one  millisecond   –  Oracle  for  simple  queries  is  a  few  milliseconds   –  DynamoDB  around  5ms   –  SimpleDB  replicaMon  and  REST  overheads  >10ms  
  • 28. The  SMcky  Session   •  Datacenter  SMcky  Load  Balancing   –  Efficient  caching  for  low  latency   –  Tricky  session  handling  code   •  Encourages  concentrated  funcMonality   –  one  service  that  does  everything   –  Middle  Mer  load  balancer  had  issues  in  pracMce     AnK-­‐paMern  impacts  producKvity,  availability  
  • 29. Shared  Session  State   •  ElasMc  Load  Balancer     –  We  don’t  use  the  cookie  based  rouMng  opMon   –  External  “session  caching”  with  memcached   •  More  flexible  fine  grain  services   –  Any  instance  can  serve  any  request   –  Works  beCer  with  auto-­‐scaled  instance  counts  
  • 30. ChaCy  Opaque  and  BriCle  Protocols   •  Datacenter  service  protocols   –  Assumed  low  latency  for  many  simple  requests   •  Based  on  serializing  exisMng  java  objects   –  Inefficient  formats   –  IncompaMble  when  definiMons  change     AnK-­‐paMern  causes  producKvity,  latency  and   availability  issues  
  • 31. Robust  and  Flexible  Protocols   •  Cloud  service  protocols   –  JSR311/Jersey  is  used  for  REST/HTTP  service  calls   –  Custom  client  code  includes  service  discovery   –  Support  complex  data  types  in  a  single  request   •  Apache  Avro   –  Evolved  from  Protocol  Buffers  and  Thri?   –  Includes  JSON  header  defining  key/value  protocol   –  Avro  serializaMon  is  half  the  size  and  several  Mmes   faster  than  Java  serializaMon,  more  work  to  code  
  • 32. Persisted  Protocols   •  Persist  Avro  in  Memcached   –  Save  space/latency  (zigzag  encoding,  half  the  size)   –  New  keys  are  ignored   –  Missing  keys  are  handled  cleanly   •  Avro  protocol  definiMons   –  Less  briCle  across  versions   –  Can  be  wriCen  in  JSON  or  generated  from  POJOs   –  It’s  hard,  needs  beCer  tooling  
  • 33. Tangled  Service  Interfaces   •  Datacenter  implementaMon  is  exposed   –  Oracle  SQL  queries  mixed  into  business  logic   •  Tangled  code   –  Deep  dependencies,  false  sharing   •  Data  providers  with  sideways  dependencies   –  Everything  depends  on  everything  else   AnK-­‐paMern  affects  producKvity,  availability  
  • 34. Untangled  Service  Interfaces   •  New  Cloud  Code  With  Strict  Layering   –  Compile  against  interface  jar   –  Can  use  spring  runMme  binding  to  enforce   –  Fine  grain  services  as  components   •  Service  interface  is  the  service   –  ImplementaMon  is  completely  hidden   –  Can  be  implemented  locally  or  remotely   –  ImplementaMon  can  evolve  independently  
  • 35. Untangled  Service  Interfaces   Two  layers:   •  SAL  -­‐  Service  Access  Library   –  Basic  serializaMon  and  error  handling   –  REST  or  POJO’s  defined  by  data  provider   •  ESL  -­‐  Extended  Service  Library   –  Caching,  conveniences,  can  combine  several  SALs   –  Exposes  faceted  type  system  (described  later)   –  Interface  defined  by  data  consumer  in  many  cases  
  • 36. Service  InteracMon  PaCern   Sample  Swimlane  Diagram  
  • 37. Service  Architecture  PaCerns   •  Internal  Interfaces  Between  Services   –  Common  paCerns  as  templates   –  Highly  instrumented,  observable,  analyMcs   –  Service  Level  Agreements  –  SLAs   •  Library  templates  for  generic  features   –  Instrumented  Ne$lix  Base  Servlet  template   –  Instrumented  generic  client  interface  template   –  Instrumented  S3,  SimpleDB,  Memcached  clients  
  • 38. CLIENT   Request  Start   Timestamp,   Client   Inbound   Request  End   outbound   deserialize  end   Timestamp   serialize  start   Mmestamp   Mmestamp   Inbound   Client   deserialize   outbound   start   serialize  end   Mmestamp   Mmestamp   Client  network   receive   Mmestamp   Service  Request   Client  Network   send   Mmestamp   Instruments  Every   Service   network  send   Mmestamp   Step  in  the  call   Service   Network   receive   Mmestamp   Service   Service   outbound   inbound   serialize  end   serialize  start   Mmestamp   Mmestamp   Service   Service   outbound   inbound   serialize  start   SERVICE  execute   serialize  end   request  start   Mmestamp   Mmestamp   Mmestamp,   execute  request   end  Mmestamp  
  • 39. Boundary  Interfaces   •  Isolate  teams  from  external  dependencies   –  Fake  SAL  built  by  cloud  team   –  Real  SAL  provided  by  data  provider  team  later   –  ESL  built  by  cloud  team  using  faceted  objects   •  Fake  data  sources  allow  development  to  start   –  e.g.  Fake  IdenMty  SAL  for  a  test  set  of  customers   –  Development  solidifies  dependencies  early   –  Helps  external  team  provide  the  right  interface  
  • 40. One  Object  That  Does  Everything   •  Datacenter  uses  a  few  big  complex  objects   –  Movie  and  Customer  objects  are  the  foundaMon   –  Good  choice  for  a  small  team  and  one  instance   –  ProblemaMc  for  large  teams  and  many  instances   •  False  sharing  causes  tangled  dependencies   –  UnproducMve  re-­‐integraMon  work     AnK-­‐paMern  impacKng  producKvity  and   availability  
  • 41. An  Interface  For  Each  Component   •  Cloud  uses  faceted  Video  and  Visitor   –  Basic  types  hold  only  the  idenMfier   –  Facets  scope  the  interface  you  actually  need   –  Each  component  can  define  its  own  facets   •  No  false-­‐sharing  and  dependency  chains   –  Type  manager  converts  between  facets  as  needed   –  video.asA(PresentaMonVideo)  for  www   –  video.asA(MerchableVideo)  for  middle  Mer  
  • 42. Basic  Types   Epistemology  and  Design   By  Stan  Lanning  
  • 43. Avoiding  “Level  Confusion”  [Catataxis]     •  Business  Level  Objects  (BLO?)   –  Customers,  Movies,  etc   –  Conceptual:  Exist  only  between  the  ears   •  Abstract  Types   –  AbstracMons  that  try  to  model  aspects  of  the  business   level  objects   –  O?en  captured  by  Java  interfaces   •  ImplementaMons   –  Specific  coded  implementaMons  of  the  abstract  types   –  Java  class,  or  a  collecMon  of  rows  in  a  database…  
  • 44. Facets   •  No  single  Abstract  Type  captures  everything   about  a  BLO   –  Different  teams  see  different  “facets”   •  Customer:  Account  status;    Billing  history;  Viewing   history;  A/B  test  assignments   •  Movie:  Availability;  Popularity;  Synopsis;  Cast   –  Loosely  coupled,  Mghtly  aligned(!)   •  All  facets  for  a  BLO  should  inherit  from  one   “basic”  type  that  has  minimal  behavior  
  • 45. Basic  Types   •  Module  external  interfaces  deal  in  basic  types;   internal  calls  are  free  to  use  more  complex   facets   •  Generic  machinery  to  switch  between  facets   Business  Level  Object   Java  Basic  Type   Movie  (TV  show…)   Video   Customer   Visitor   Category   VTag   Country   ISOCountry  
  • 46. Type  Manager   •  Holds  the  “factory”  objects  that  manage   instances  of  facets   –  Typically  one  factory  per  facet   –  Factories  free  to  implement  any  instance   management  policy  they  want   •  Factories  register  with  the  Type  Manager   –  callers  never  interact  directly  with  the  factories   –  Mock  managers?  
  • 47. Switching  Facets   •  Each  Basic  Type  B  implements  a  method  that   uses  the  Type  Manager  to  find  facet   implementaMons  of  the  same  BLO          <T extends B> T asA(Class<T> c)! •  Example:        Visitor visitor = xxx;
 ABClient abClient = visitor.asA(ABClient.class);
 assert(visitor.equals(abClient));! •  Look  Ma,  no  cast!   –  Facets  are  equal,  but  not  necessarily  ==.  
  • 48. IDs!  (huh)  What  are  they  good  for?   •  IDs  exist  because  implementaKons  need  to   externalize  objects  and  maintain  their  idenKty   –  Persist  in  a  DB,  or  talk  to  a  remote  service   –  Different  implementaMons  of  a  type  of  BLO  model   the  same  object  iff  they  have  the  same  ID   –  Basic  Types  use  IDs  to  manage  facets,  determine   equality,  etc    
  • 49. ConverMng  IDs  ßàObjects   Long id = xx;! MyVisitor visitor =! TypeManager.findObject(Visitor.class, id)! .asA(MyVisitor.class);! assert(id.equals(visitor.getId());! // Or more efficiently…! MyVisitor visitor2 =! TypeManager.findObject(Visitor.class, id,! MyVisitor.class);! // There are also efficient bulk conversion methods! Collection<Long> ids = xxx;! List<MyVisitor> visitors =! TypeManager.findObjects(Visitor.class, ids,! MyVisitor.class);! !  
  • 50. Stan’s  Soap  Box   •  Don’t  pass  around  IDs  when  you  mean  to  refer   to  the  BLO;  that  is  Level  Confusion   •  Using  Basic  Types  helps  the  compiler  help  you;   compile  Mme  problems  are  beCer  than  run   Mme  problems   •  More  readable  by  people,  but  beware  that   asA  operaMons  may  be  a  lot  of  work   •  (Is  this  a  way  to  approximate  mulMple-­‐ inheritance  in  Java?)  
  • 51. So?ware  Architecture  PaCerns   •  Object  Models   –  Basic  and  derived  types,  facets,  serializable   –  Pass  by  reference  within  a  service   –  Pass  by  value  between  services   •  ComputaMon  and  I/O  Models   –  Service  ExecuMon  using  Best  Effort  /  Futures   –  Common  thread  pool  management   –  Circuit  breakers  to  manage  and  contain  failures  
  • 52. Model  Driven  Architecture   •  TradiMonal  Datacenter  PracMces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  paCerns   –  Some  use  of  Puppet  to  automate  changes   •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jenkins  based  builds  for  everything   –  Every  producMon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaMon  is  managed  by  an  Autoscaler   Every  change  is  a  new  AMI  
  • 53. Ne$lix  Cloud  Pla$orm   Ne$lix  ApplicaMons   Ne$lix  Cloud  Pla$orm  /  PaaS   AWS  Specific   Partner   Ne$lix  Legacy   Code   Interfaces   Datacenter   AWS  Services   Partner  Services   Services  
  • 54. Ne$lix  PaaS  Principles   •  Maximum  FuncMonality   –  Developer  producMvity  and  agility   •  Leverage  as  much  of  AWS  as  possible   –  AWS  is  making  huge  investments  in  features/scale   •  Interfaces  that  isolate  Apps  from  AWS   –  Avoid  lock-­‐in  to  specific  AWS  API  details   •  Portability  is  a  long  term  goal   –  Gets  easier  as  other  vendors  catch  up  with  AWS  
  • 55. Ne$lix  Global  PaaS   •  Architecture  Features  and  Overview   •  Portals  and  Explorers   •  Pla$orm  Services   •  Pla$orm  APIs   •  Pla$orm  Frameworks   •  Persistence   •  Scalability  Benchmark  
  • 56. Global  PaaS?   Toys  are  nice,  but  this  is  the  real  thing…   •  Supports  all  AWS  Availability  Zones  and  Regions   •  Supports  mulMple  AWS  accounts  {test,  prod,  etc.}   •  Cross  Region/Acct  Data  ReplicaMon  and  Archiving   •  InternaMonalized,  Localized  and  GeoIP  rouMng   •  Security  is  fine  grain,  dynamic  AWS  keys   •  Autoscaling  to  thousands  of  instances   •  Monitoring  for  millions  of  metrics   •  ProducMve  for  100s  of  developers  on  one  product   •  23M+  users  USA,  Canada,  LaMn  America,  UK,  Eire  
  • 57. Basic  PaaS  EnMMes   •  AWS  Based  EnMMes   –  Instances  and  Machine  Images,  ElasMc  IP  Addresses   –  Security  Groups,  Load  Balancers,  Autoscale  Groups   –  Availability  Zones  and  Geographic  Regions   •  Ne$lix  PaaS  EnMMes   –  ApplicaMons  (registered  services)   –  Clusters  (versioned  Autoscale  Groups  for  an  App)   –  ProperMes  (dynamic  hierarchical  configuraMon)  
  • 58. Core  PaaS  Services   •  AWS  Based  Services   –  S3  storage,  to  5TB  files,  parallel  mulMpart  writes   –  SQS  –  Simple  Queue  Service.  Messaging  layer.   •  Ne$lix  Based  Services   –  EVCache  –  memcached  based  ephemeral  cache   –  Cassandra  –  distributed  data  store   •  External  Services   –  GeoIP  Lookup  interfaced  to  a  vendor   –  Keystore  HSM  in  Ne$lix  Datacenter  
  • 59. Instance  Architecture   Linux  Base  AMI  (CentOS  or  Ubuntu)   OpMonal   Apache   frontend,   Java  (JDK  6  or  7)   memcached,   non-­‐java  apps   Tomcat   AppDynamics   appagent   Monitoring   Log  rotaMon   ApplicaMon  servlet,  base   Healthcheck,  status   to  S3   GC  and  thread   server,  pla$orm,  interface   servlets,  JMX  interface,   AppDynamics   dump  logging   jars  for  dependent  services   Servo  autoscale   machineagent   Epic    
  • 60. Security  Architecture   •  Instance  Level  Security  baked  into  base  AMI   –  Login:  ssh  only  allowed  via  portal  (not  between  instances)   –  Each  app  type  runs  as  its  own  userid  app{test|prod}   •  AWS  Security,  IdenMty  and  Access  Management   –  Each  app  has  its  own  security  group  (firewall  ports)   –  Fine  grain  user  roles  and  resource  ACLs   •  Key  Management   –  AWS  Keys  dynamically  provisioned,  easy  updates   –  High  grade  app  specific  key  management  support  
  • 61. Core  Pla$orm  Frameworks  and  APIs  
  • 62. Portals  and  Explorers   •  Ne$lix  ApplicaMon  Console  (NAC)   –  Primary  AWS  provisioning/config  interface   •  AWS  Usage  Analyzer   –  Breaks  down  costs  by  applicaMon  and  resource   •  Cassandra  Explorer   –  Browse  clusters,  keyspaces,  column  families   •  Base  Server  Explorer   –  Browse  service  endpoints  configuraMon,  perf  
  • 63.
  • 64.
  • 65. AWS  Usage   for  test,  carefully  omifng  any  $  numbers…  
  • 68. Pla$orm  Services   •  Discovery  –  service  registry  for  “ApplicaMons”   •  IntrospecMon  –  Entrypoints   •  Cryptex  –  Dynamic  security  key  management   •  Geo  –  Geographic  IP  lookup   •  Pla$ormservice  –  Dynamic  property  configuraMon   •  LocalizaMon  –  manage  and  lookup  local  translaMons   •  Evcache  –  ephemeral  volaMle  cache   •  Cassandra  –  Cross  zone/region  distributed  data  store   •  Zookeeper  –  Distributed  CoordinaMon  (Curator)   •  Various  proxies  –  access  to  old  datacenter  stuff  
  • 69. IntrospecMon  -­‐  Entrypoints   •  REST  API  for  tools,  apps,  explorers,  monkeys…   –  E.g.  GET  /REST/v1/instance/$INSTANCE_ID   •  AWS  Resources   –  Autoscaling  Groups,  EIP  Groups,  Instances   •  Ne$lix  PaaS  Resources   –  Discovery  ApplicaMons,  Clusters  of  ASGs,  History  
  • 70. Entrypoints  Queries   MongoDB  used  for  low  traffic  complex  queries  against  complex  objects   DescripAon   Range  expression   Find  all  acMve  instances.     all()   Find  all  instances  associated  with  a  group   %(cloudmonkey)   name.   Find  all  instances  associated  with  a   /^cloudmonkey$/discovery()   discovery  group.     Find  all  auto  scale  groups  with  no  instances.   asg(),-­‐has(INSTANCES;asg())   How  many  instances  are  not  in  an  auto   count(all(),-­‐info(eval(INSTANCES;asg())))     scale  group?   What  groups  include  an  instance?   *(i-­‐4e108521)   What  auto  scale  groups  and  elasMc  load   filter(TYPE;asg,elb;*(i-­‐4e108521))   balancers  include  an  instance?   What  instance  has  a  given  public  ip?   filter(PUBLIC_IP;174.129.188.{0..255};all())  
  • 71. Metrics  Framework   •  System  and  ApplicaMon   –  CollecMon,  AggregaMon,  Querying  and  ReporMng   –  Non-­‐blocking  logging,  avoids  log4j  lock  contenMon   –  Honu-­‐Streaming  -­‐>  S3  -­‐>  EMR  -­‐>  Hive   •  Performance,  Robustness,  Monitoring,  Analysis   –  Tracers,  Counters  –  explicit  code  instrumentaMon  log   –  Real  Time  Tracers/Counters   –  SLA  –  service  level  response  Mme  percenMles   –  Servo  annotated  JMX  extract  to  Cloudwatch   •  Latency  Monkey  Infrastructure   –  Inject  random  delays  into  service  responses  
  • 72. ConfiguraAon  Management   •  Ne$lixConfiguraMon   –  ValidaMon  Framework   –  Sitewide  ProperMes  Explorer   •  Pla$ormService   •  Mapping  Service   •  ZooKeeper  (Curator)   •  InstanceIdenMty  
  • 73. Interprocess  CommunicaAon   •  Discovery  Service  registry  for  “applicaMons”   –  “here  I  am”  call  every  30s,  drop  a?er  3  missed   –  “where  is  everyone”  call   –  Redundant,  distributed,  moving  to  Zookeeper   •  NIWS  –  Ne$lix  Internal  Web  Service  client   –  So?ware  Middle  Tier  Load  Balancer   –  Failure  retry  moves  to  next  instance   –  Many  opMons  for  encoding,  etc.  
  • 74. Security  Key  Management   •  AKMS   –  Dynamic  Key  Management  interface   –  Update  AWS  keys  at  runMme,  no  restart   –  All  keys  stored  securely,  none  on  disk  or  in  AMI   •  Cryptex  -­‐  Flexible  key  store   –  Low  grade  keys  processed  in  client   –  Medium  grade  keys  processed  by  Cryptex  service   –  High  grade  keys  processed  by  hardware  (Ingrian)  
  • 75. AWS  Persistence  Services   •  SimpleDB   –  Got  us  started,  migrated  to  Cassandra  now   –  NFSDB  -­‐  Instrumented  wrapper  library   –  Domain  and  Item  sharding  (workarounds)   •  S3   –  Upgraded/Instrumented  JetS3t  based  interface   –  Supports  mulMpart  upload  and  5TB  files   –  Global  S3  endpoint  management  
  • 76. Aside:  Adrian’s  Rant  on  CAP  Theorem   Choose  Consistency  or  Availability  when  ParAAoned   •  Instances  and  Networks  will  fail   •  Network  failure  =  ParMMon  “P”  is  a  given   •  Distributed  Systems:  two  choices  –  CP  or  AP   •  “Vendor  claims  CA”   –  Usually  they  mean  available  when  instances  fail   •  Master-­‐Slave  =  Consistent  when  ParMMoned   –  You  can’t  write  unless  you  can  see  the  master   •  No-­‐Master  =  Available  when  ParMMoned   –  Writes  proceed,  conflicts  will  be  patched  up  later  
  • 77. What  Ne$lix  Needed  from  NoSQL  
  • 78. Basic  Requirements   •  Supports  running  on  Amazon  EC2   •  Supports  Amazon  Availability  Zones   •  Low  latency,  low  latency  variance   •  High  and  scalable  read  and  write  throughput   •  Large  and  scalable  capacity,  no  external  sharding   •  “AP”  Eventually  Consistent   •  Data  integrity  checks  and  repairs   •  Online  Snapshot  Backup,  Restore/Rollback  
  • 79. Scenario  –  Immediate  Read  a?er  Write   Q1:  Is  rouMng  and  replicaMon  zone  aware?     TV  Device   New   New   Favorite   Round  Robin   Favorites   Load  Balancer   List   API   API   (zone  A)   (Zone  B)   Append   New   New   Favorites   Favorite   List   Favorites   Favorites   (zone  A)   (Zone  B)   ReplicaMon  
  • 80. Network  ParMMon   Q2:  What  happens  next?   TV  Device   New   New   Favorite   Round  Robin   Favorites   Load  Balancer   List   API   API   (zone  A)   (Zone  B)   Append   New   New   Favorites   Favorite   List   Favorites   Favorites   (zone  A)   (Zone  B)   No  ReplicaMon  
  • 81. Network  ParMMon   Q3:  Supports  Append  vs.  Read/Modify/Write?   TV  Device   New   New   Favorite   Round  Robin   Favorites   Load  Balancer   List   RMW   API   API   (zone  A)   (Zone  B)   Old   New   New   Favorites   Favorites   Favorites   List   List   List   Favorites   Favorites   (zone  A)   (Zone  B)   ReplicaMon  
  • 82. Silent  Data  CorrupMon   Q4:  How  is  it  detected  and  corrected?     TV  Device   New   New   Favorite   Round  Robin   Favorites   Load  Balancer   List   API   API   (zone  A)   (Zone  B)   Append   New   New   Favorites   Favorite   List   Favorites   Favorites   (zone  A)   (Zone  B)   ReplicaMon  corrupted  on  disk  or  via  network  
  • 83. NePlix  PlaPorm  Persistence   •  Ephemeral  VolaMle  Cache  –  evcache   –  Discovery-­‐aware  memcached  based  backend   –  Client  abstracMons  for  zone  aware  replicaMon   –  OpMon  to  write  to  all  zones,  fast  read  from  local   •  Cassandra   –  Highly  available  and  scalable  (more  later…)   •  MongoDB   –  Complex  object/query  model  for  small  scale  use   •  MySQL   –  Hard  to  scale,  legacy  and  small  relaMonal  models  
  • 84. Why  Cassandra?   •  We  value  Availability  over  Consistency  –  AP   –  Cassandra  is  a  Java  distributed  systems  toolkit   •  We  have  a  building  full  of  Java  engineers   –  Riak  is  in  Erlang  –  a  blessing  and  a  curse…   •  We  want  FOSS  +  Support   –  Voldemort  doesn’t  have  a  support  model   •  Writes  are  intrinsically  harder  than  reads   –  Hbase  is  CP  opMmized  for  reads  &  single  namenode  issues   •  Cassandra  works,  running  ~55  clusters   –  Step  by  step  into  full  producMon  over  the  last  year  
  • 85. Priam  –  Cassandra  AutomaMon   Available  at  hCp://github.com/ne$lix   •  Ne$lix  Pla$orm  Tomcat  Code   •  Zero  touch  auto-­‐configuraMon   •  State  management  for  Cassandra  JVM   •  Token  allocaMon  and  assignment   •  Broken  node  auto-­‐replacement   •  Full  and  incremental  backup  to  S3   •  Restore  sequencing  from  S3   •  Grow/Shrink  Cassandra  “ring”  
  • 86. Astyanax   Available  at  hCp://github.com/ne$lix   •  Cassandra  java  client   •  API  abstracMon  on  top  of  Thri?  protocol   •  “Fixed”  ConnecMon  Pool  abstracMon  (vs.  Hector)   –  Round  robin  with  Failover   –  Retry-­‐able  operaMons  not  Med  to  a  connecMon   –  Ne$lix  PaaS  Discovery  service  integraMon   –  Host  reconnect  (fixed  interval  or  exponenMal  backoff)   –  Token  aware  to  save  a  network  hop  –  lower  latency   –  Latency  aware  to  avoid  compacMng/repairing  nodes  –  lower  variance   •  Batch  mutaMon:  set,  put,  delete,  increment   •  Simplified  use  of  serializers  via  method  overloading  (vs.  Hector)   •  ConnecMonPoolMonitor  interface  for  counters  and  tracers   •  Composite  Column  Names  replacing  deprecated  SuperColumns  
  • 87. IniMalizing  Astyanax   // Configuration either set in code or nfastyanax.properties platform.ListOfComponentsToInit=LOGGING,APPINFO,DISCOVERY netflix.environment=test default.astyanax.readConsistency=CL_QUORUM default.astyanax.writeConsistency=CL_QUORUM MyCluster.MyKeyspace.astyanax.servers=127.0.0.1 // Must initialize platform for discovery to work NFLibraryManager.initLibrary(PlatformManager.class, props, false, true); NFLibraryManager.initLibrary(NFAstyanaxManager.class, props, true, false); // Open a keyspace instance Keyspace keyspace = KeyspaceFactory.openKeyspace(”MyCluster”,”MyKeyspace");
  • 88. Astyanax  Query  Example   Paginate  through  all  columns  in  a  row   ColumnList<String>  columns;   int  pageize  =  10;   try  {          RowQuery<String,  String>  query  =  keyspace                  .prepareQuery(CF_STANDARD1)                  .getKey("A")                  .setIsPaginaMng()                  .withColumnRange(new  RangeBuilder().setMaxSize(pageize).build());                                      while  (!(columns  =  query.execute().getResult()).isEmpty())  {                  for  (Column<String>  c  :  columns)  {                  }          }   }  catch  (ConnecMonExcepMon  e)  {   }      
  • 89. Data  MigraMon  to  Cassandra  
  • 90. Distributed  Key-­‐Value  Stores   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   DBA   –  Joins  take  place  in  java  code   •  No  schema  to  change,  no  scheduled  downMme   •  Latency  for  typical  queries   –  Memcached  is  dominated  by  network  latency  <1ms   –  Cassandra  takes  a  few  milliseconds   –  SimpleDB  replicaMon  and  REST  auth  overheads  >10ms  
  • 91. MulA-­‐Regional  Data  ReplicaAon   •  IR  Framework  –  Datacenter  Item  Replicator   –  Built  in  2009,  first  step  to  the  cloud   –  Oracle  to  SimpleDB  or  Cassandra  via  poll  and  push   –  Return  updates  to  Oracle  via  SQS  message  queue   •  SimpleDB  or  S3  to  Cassandra   –  Data  migraMon  tool  for  global  Ne$lix   •  Global  SimpleDB  and  S3  ReplicaMon   –  Cross  region  async  updates  USA  to  Europe  
  • 92. TransiAonal  Steps   •  BidirecMonal  ReplicaMon   –  Oracle  to  SimpleDB   –  Queued  reverse  path  using  SQS   –  Backups  remain  in  Datacenter  via  Oracle   •  New  Cloud-­‐Only  Data  Sources   –  Cassandra  based   –  No  replicaMon  to  Datacenter   –  Backups  performed  in  the  cloud  
  • 93. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   API  etc.   Load  Balancer   Component   API   SQS   Services   Oracl e   Oracle   Oracle   Cassandra   memcached   ReplicaMon   memcached   EC2   Internal   Disks   NePlix   S3   Data  Center   SimpleDB  
  • 94. Cufng  the  Umbilical   •  TransiMon  Oracle  Data  Sources  to  Cassandra   –  Offload  Datacenter  Oracle  hardware   –  Free  up  capacity  for  growth  of  remaining  services   •  TransiMon  SimpleDB+Memcached  to  Cassandra   –  Primary  data  sources  that  need  backup   –  Keep  simplest  small  use  cases  for  now   •  New  challenges   –  Backup,  restore,  archive,  business  conMnuity   –  Business  Intelligence  integraMon  
  • 95. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   Load  Balancer   Component   API   Services   memcached   Cassandra   EC2   Internal   Disks   Backup   S3   SimpleDB  
  • 96. High  Availability   •  Cassandra  stores  3  local  copies,  1  per  zone   –  Synchronous  access,  durable,  highly  available   –  Read/Write  One  fastest,  least  consistent  -­‐  ~1ms   –  Read/Write  Quorum  2  of  3,  consistent  -­‐  ~3ms   •  AWS  Availability  Zones   –  Separate  buildings   –  Separate  power  etc.   –  Fairly  close  together    
  • 97. “TradiMonal”  Cassandra  Write  Data  Flows   Single  Region,  MulMple  Availability  Zone,  Not  Token  Aware   Cassandra   • Disks   • Zone  A   2   2   4   2   1.  Client  Writes  to  any   Cassandra  3   3   Cassandra   If  a  node  goes  offline,   Cassandra  Node   • Disks   5 • Disks   5   hinted  handoff   2.  Coordinator  Node   • Zone  C   1 • Zone  A   completes  the  write   replicates  to  nodes   when  the  node  comes   and  Zones   Non  Token   back  up.   3.  Nodes  return  ack  to   Aware     coordinator   Clients   Requests  can  choose  to   4.  Coordinator  returns   3   wait  for  one  node,  a   Cassandra   Cassandra   ack  to  client   • Disks   • Disks   5   quorum,  or  all  nodes  to   5.  Data  wriCen  to   • Zone  C   • Zone  B   ack  the  write   internal  commit  log     disk  (no  more  than   Cassandra   SSTable  disk  writes  and   • Disks   10  seconds  later)   • Zone  B   compacMons  occur   asynchronously  
  • 98. Astyanax  -­‐  Cassandra  Write  Data  Flows   Single  Region,  MulMple  Availability  Zone,  Token  Aware   Cassandra   • Disks   • Zone  A   1.  Client  Writes  to   Cassandra  2   2   Cassandra   If  a  node  goes  offline,   nodes  and  Zones   • Disks   3 • Disks   3   hinted  handoff   2.  Nodes  return  ack  to   • Zone  C   1 • Zone  A   completes  the  write   client   3.  Data  wriCen  to   Token   when  the  node  comes   back  up.   internal  commit  log   Aware     disks  (no  more  than   Clients   2   Requests  can  choose  to   10  seconds  later)   Cassandra   Cassandra   wait  for  one  node,  a   • Disks   • Disks   3   quorum,  or  all  nodes  to   • Zone  C   • Zone  B   ack  the  write     Cassandra   SSTable  disk  writes  and   • Disks   • Zone  B   compacMons  occur   asynchronously  
  • 99. Data  Flows  for  MulM-­‐Region  Writes   Token  Aware,  Consistency  Level  =  Local  Quorum   1.  Client  writes  to  local  replicas   If  a  node  or  region  goes  offline,  hinted  handoff   2.  Local  write  acks  returned  to   completes  the  write  when  the  node  comes  back  up.   Client  which  conMnues  when   Nightly  global  compare  and  repair  jobs  ensure   2  of  3  local  nodes  are   everything  stays  consistent.   commiCed   3.  Local  coordinator  writes  to   remote  coordinator.     Cassandra   100+ms  latency   4.  When  data  arrives,  remote   Cassandra   •  Disks   •  Disks   •  Zone  A   •  Zone  A   coordinator  node  acks  and   Cassandra   2   2   Cassandra   Cassandra   4   Cassandra   6   6   3   5   Disks  6   copies  to  other  remote  zones   6   •  Disks   •  Disks   •  Zone  C   •  Zone  A   •  •  Zone  C   4  Disks  A   •  •  Zone   1   4   5.  Remote  nodes  ack  to  local   US   EU   coordinator   Clients   Clients   Cassandra   2   Cassandra   Cassandra   5   Cassandra   6.  Data  flushed  to  internal   •  Disks   •  Zone  C   •  Disks   6   •  Zone  B   •  Disks   •  Zone  C   •  Disks  6   •  Zone  B   commit  log  disks  (no  more   Cassandra   Cassandra   than  10  seconds  later)   •  Disks   •  Disks   •  Zone  B   •  Zone  B  
  • 100. Remote  Copies   •  Cassandra  duplicates  across  AWS  regions   –  Asynchronous  write,  replicates  at  desMnaMon   –  Doesn’t  directly  affect  local  read/write  latency   •  Global  Coverage   –  Business  agility   –  Follow  AWS…   ? •  Local  Access   ? ? –  BeCer  latency   3 A 3 –  Fault  IsolaMon    
  • 101. Cassandra  Backup     •  Full  Backup   Cassandra   Cassandra   Cassandra   –  Time  based  snapshot   –  SSTable  compress  -­‐>  S3   Cassandra   Cassandra   •  Incremental   S3   Backup   Cassandra   Cassandra   –  SSTable  write  triggers   compressed  copy  to  S3   Cassandra   Cassandra   •  Archive   Cassandra   Cassandra   –  Copy  cross  region   A  
  • 102. Cassandra  Restore   •  Full  Restore   Cassandra   Cassandra   Cassandra   –  Replace  previous  data   •  New  Ring  from  Backup   Cassandra   Cassandra   –  New  name  old  data   S3   Backup   Cassandra   Cassandra   •  Scripted   –  Create  new  instances   Cassandra   Cassandra   –  Parallel  load  -­‐  fast   Cassandra   Cassandra  
  • 103. Cassandra  Online  AnalyMcs   •  Brisk  =  Hadoop  +  Cass   Cassandra   –  “Cassandra  Enterprise”   Brisk   Cassandra   –  Use  split  Brisk  ring   Brisk   Cassandra   –  Size  each  separately   S3   •  Direct  Access   Cassandra   Backup   Cassandra   –  Keyspaces   –  Hive/Pig/Map-­‐Reduce   Cassandra   Cassandra   –  Hdfs  as  a  keyspace   Cassandra   Cassandra   –  Distributed  namenode  
  • 104. ETL  for  Cassandra   •  Data  is  de-­‐normalized  over  many  clusters!   •  Too  many  to  restore  from  backups  for  ETL   •  SoluMon  –  read  backup  files  using  Hadoop   •  Aegisthus   –  hCp://techblog.ne$lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html   –  High  throughput  raw  SSTable  processing   –  Re-­‐normalizes  many  clusters  to  a  consistent  view   –  Extract,  Transform,  then  Load  into  Teradata  
  • 105. Cassandra  Archive   A   Appropriate  level  of  paranoia  needed…   •  Archive  could  be  un-­‐readable   –  Restore  S3  backups  weekly  from  prod  to  test,  and  daily  ETL   •  Archive  could  be  stolen   –  PGP  Encrypt  archive   •  AWS  East  Region  could  have  a  problem   –  Copy  data  to  AWS  West   •  ProducMon  AWS  Account  could  have  an  issue   –  Separate  Archive  account  with  no-­‐delete  S3  ACL   •  AWS  S3  could  have  a  global  problem   –  Create  an  extra  copy  on  a  different  cloud  vendor….  
  • 106. Extending  to  MulM-­‐Region   In  producMon  for  UK/Eire  support   1.  Create  cluster  in  EU   Take  a  Boeing  737  on  a  domesMc  flight,  upgrade  it  to   a  747  by  adding  more  engines,  fuel  and  bigger  wings   2.  Backup  US  cluster  to  S3   and  fly  it  to  Europe  without  landing  it  on  the  way…   3.  Restore  backup  in  EU   4.  Local  repair  EU  cluster   5.  Global  repair/join   Cassandra   100+ms  latency   Cassandra   1   •  Disks   •  Disks   •  Zone  A   •  Zone  A   Cassandra   Cassandra   Cassandra   Cassandra   •  Disks   •  Disks   •  Disks   •  Disks   •  Zone  C   •  Zone  A   •  Zone  C   •  Zone  A   US   5   EU   Clients   Clients   Cassandra   Cassandra   Cassandra   Cassandra   •  Disks   •  Disks   •  Disks   •  Disks   •  Zone  C   •  Zone  B   •  Zone  C   •  Zone  B   Cassandra   Cassandra   •  Disks   •  Disks   •  Zone  B   3   •  Zone  B   4   2   S3  
  • 107. Tools  and  AutomaMon   •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jenkins,  Ivy,  ArMfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne$lix  ApplicaMon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producMon   •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop   –  Datastax  support  for  Cassandra,  AWS  support  for  Hadoop  via  EMR   •  Monitoring  Tools   –  Alert  processing  gateway  into  Pagerduty   –  AppDynamics  –  Developer  focus  for  cloud  hCp://appdynamics.com  
  • 108. NoSQL  Developer  MigraMon   •  Jason  Brown  @jasobrown   –  Cassandra  from  the  Trenches   –  slideshare.net/ne$lix   •  Mark  Atwood,  "Guide  to  NoSQL,  redux”   –  YouTube  hCp://youtu.be/zAbFRiyT3LU  
  • 109. Open  Sourcing  the  Ne$lix  PaaS  
  • 110. Open  Source  Strategy   •  Release  PaaS  Components  git-­‐by-­‐git   –  Source  at  github.com/ne$lix   –  Intros  and  techniques  at  techblog.ne$lix.com   –  Blog  post  or  new  code  every  week  or  so   •  MoMvaMons   –  Give  back  to  Apache  licensed  OSS  community   –  MoMvate,  retain,  hire  top  engineers   –  Create  a  community  that  adds  features  and  fixes  
  • 111. Current  OSS  Projects  and  Posts   Github  /  Techblog   Priam   Exhibitor   Servo   Apache  Project   Techblog  Post   Astyanax   Curator   Autoscaling  scripts   CassJMeter   Zookeeper   Honu   Cassandra   EVCache   Circuit  Breaker   Aegisthus  
  • 112. Takeaway     NePlix  has  built  and  deployed  a  scalable  global  PlaPorm  as  a  Service.     Key  components  of  the  NePlix  PaaS  are  being  released  as  Open  Source   projects  so  you  can  build  your  own  custom  PaaS.     hCp://github.com/Ne$lix   hCp://techblog.ne$lix.com   hCp://slideshare.net/Ne$lix     hCp://www.linkedin.com/in/adriancockcro?   @adrianco  #ne$lixcloud     End  of  Part  2  of  3