Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

StackWatch: A prototype CloudWatch service for CloudStack

Presented at CloudStack Collab 2014 in Denver. The presentation explores adding a Cloudwatch service to Apache CloudStack and some of the interesting design decisions and consequences.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

StackWatch: A prototype CloudWatch service for CloudStack

  1. 1. StackWatch   Monitoring-­‐as-­‐a-­‐service  for  Apache   CloudStack   (actually  an  explora=on  around  the   edges  of  Apache  CloudStack)   @chiradeep  
  2. 2. Disclaimer   •  Developer  talk   •  No  demo   •  Designed  to  make  you  think    
  3. 3. Agenda   •  Introduc=on  to  StackWatch   •  The  design  of  StackWatch   •  Lessons  learned   •  Tips  for  building  your  own  service  
  4. 4. What  is  StackWatch?   Monitoring-­‐as-­‐a-­‐service  for  the  users  of  a   CloudStack  Cloud   (like  AWS  CloudWatch)   ✔ Store  metrics  at  high  fidelity   ✔ Retrieve  metric  sta=s=cs   ✔ Graph  metrics   ✔ Alarms  on  threshold  crossings   ✔ Alarm  and  metric  manipula=on   ✔ Large  Scale  (>100k  metrics  /  min)   ✔ Mul=-­‐tenant  
  5. 5. StackWatch  Mo=va=on   •  AutoScale  implementa=on   in  Apache  CloudStack  is   adequate  but  limi=ng   •  Either     – requires  Netscaler  as  a  Load   balancer   OR   – Uses  hypervisor  metrics  (and   s=ll  requires  HAProxy)  
  6. 6. AutoScale  poten=al  improvements   •  Use  applica=on  metrics   –  BeWer  indica=on  of  applica=on  load   •  Scalable  implementa=on     –  No  polling   –  Alarm  driven   •  Fidelity  to  AWS  Autoscale  API   •  Independent  of  LB   •  Flexible  scaling  ac=on  (not  just  add  /  remove  VM)   Works  be)er  with  Monitoring-­‐as-­‐a-­‐Service    
  7. 7. Non-­‐func=onal  requirements   •  Develop  in  a  different  (i.e.,  not  Java)  language   – More  on  this  later   •  Testable  independent  of  ACS   – Faster  development  =me   •  Limited  changes  to  ACS   – Master  branch  is  hard  to  keep  up  with,  especially   if  you  code  just  a  few  hours  every  week!  
  8. 8. Digression   •  Apache  CloudStack  can  be  in=mida=ng   – Lots  of  features  baked  in   – Limited  test  cases   – Requirements  behind  every  logic  point   – Well  defined  extensibility  but  hard  to  go  beyond   the  plugin  API   •  Java  is  the  lingua  franca   – What  if  I  want  to  use  something  else?  
  9. 9. The  Narrow  Waist  Model   of  the  Internet     Innova=on   Innova=on   Hard  to  change  
  10. 10. Apache  CloudStack  Narrow  Waist   ACS  Core   XenServer   KVM   Hyper-­‐V   vSphere   NFS     ISCSI   FC   VLAN   Overlay   CPU   vCenter   libVirt   WMI   SDN   StackMate   DbaaS   LBaaS   MRaaS   PaaS   FWaaS   Technology   Applica=ons   Innova=on   Innova=on   Harder  to  change   Where  do  StackWatch  and  AutoScale  belong?   Should  network  services  be  applica=ons?   Analy=cs*aaS   MLaaS  
  11. 11. Example:  The  VR  model  inside-­‐out   ACS   1.  create  network   2.  create  VR   Hyperv isor  3.  create  VR  VM   VR   4.  Program  rules   ACS   1.  create  network   Hyperv isor  3.  create  VR  VM   VR   4.  Program  rules   2.  create  VR  VM   VR   Service   •  Easier  to  consume   •  Just  works   •  Harder  to  change   •  Harder  to  test  VR  opera=ons   in  isola=on   •  Requires  developer  discipline   to  not  leak  concerns  between   internal  layers   •  Easier  to  change   •  Requires  more  work  from   consumer  (addi=onal  orchestra=on)   •  Opera=onal  challenges  (HA,  state   storage,  failure  model)   Current  model   vs.   Inside  out  
  12. 12. Micro  Services?   a  par=cular  way  of  designing  sogware   applica=ons  as  suites  of  independently   deployable  services.     common  characteris=cs  around  organiza=on   around  business  capability,  automated   deployment,  intelligence  in  the  endpoints,  and   decentralized  control  of  languages  and  data.     -­‐  Mar$n  Fowler  hWp://mar=nfowler.com/ar=cles/microservices.html  
  13. 13. Monolith  vs.  Microservice   •  Monolith:   –  Change  is  hard  (-­‐)   –  Service  automa=cally  gets  horizontal  scale,  HA,  throWling,   monitoring  (+)   –  Easy  refactoring  (+)   •  Microservice:   –  Easier  to  change/rewrite  and  test  and  deploy  (+)   –  Developer  falls  to  Distributed  Compu=ng  fallacies  (-­‐)   •  hWp://en.wikipedia.org/wiki/Fallacies_of_Distributed_Compu=ng   –  Fuzzy  service  boundaries  (-­‐)   –  Service  boundaries  are  harder  to  change  /  refactor  (-­‐)  
  14. 14. AWS  Example   •  Service  boundaries  are  defined  by  API   endpoints.   •  Separate  API  endpoints  for   – EC2   – AutoScale   – CloudWatch   – ELB   – But  not  VPC,  Elas=c  IP,  etc.  
  15. 15. StackWatch  Architecture   CloudStack   StackWatch   Riemann   OpenTSDB   PutMetrics/CreateAlarm/   GetStats   Cache   DB   Alarms   Metric  Info   AlarmHistory   MetricData  +  Alarm  Cfgè  çThreshold  Alarm   Creden=al  Cache   GetUser   ✔ Insert  metrics   ✔ Retrieve  metric  sta=s=cs ✔ Graph  metrics   ✔ Real-­‐=me  alarms  
  16. 16. Components  -­‐  OpenTSDB   •  Open  Time  Series  Database   – Front-­‐end  to  Apache  HBase   – OSS  project  (LGPL  license)   •  Store  billions  of  data  points   – Indefinitely  without  losing  resolu=on   – Reliable  (HDFS  replica=on)   – Scalable  (HBase)   •  Simple  API  to  store  /  query  data  
  17. 17. Component:  Riemann   •  High  performance  stream  processor  designed  for  monitoring   infrastructure   –  Flexible,  powerful  DSL   –  Open  Source  (Eclipse  License)   –  WriWen  in  Clojure   •  Used  to  generate  Alarms  for  StackWatch    
  18. 18. Riemann  DSL  Example   Send  an  email  whenever  the  average  web  applica=on  latency  exceeds  6  ms  over  3   periods  of  3  seconds.       (streams          (where  (not  (expired?  event))                ;;  over  =me  windows  of  3  seconds...                (fixed-­‐$me-­‐window  3                      ;;  calculate  the  average  value  of  the  metric  and  emit  an  average  (summary)  event                      (combine  folds/mean                          ;;  if  there  are  no  events  in  the  window,  we  can  get  nil  events                          (where  (not  (nil?  event))              ;;  collect  the  summary  event  over  the  last  3  fixed-­‐=me-­‐windows                                  (moving-­‐event-­‐window  3                                        ;;find  the  summary  event  with  the  minimum  average  metric                                        (combine  folds/minimum                                              ;;  see  if  it  breached  the  threshold                                              (where  (>  metric  6.0)      ;;  send  the  event  in  an  email                                                (email  ”me@myself.com"))))))                          ))   )  
  19. 19. Component:  StackWatch   •  API  frontend  to  CloudWatch-­‐like  API   •  Stores  metric  metadata,  alarm  history  in   MySQL   •  API  authen=ca=on  using  signatures     – Authen=cated  using  secret  key  from  CloudStack   •  WriWen  in  Clojure  
  20. 20. Event-­‐based    integra=on   •  CloudWatch  API  (HTTP  Query)   mon-put-data --metric-name RequestLatency --namespace ”WebFrontEnd" -- dimensions ”host=i-2c9e85,Stack=Test" --timestamp 2014-03-25T00:00:00.000Z --value 4 •  OpenTSDB  API  (telnet  /  REST)   put RequestLatency 1395705600 6 host=i-2c9e85 Stack=Test namespace=WebFrontEnd acct_uuid=56A17202-36C2-46E8-8905-90423040AAA •  Riemann  event  (ProtoBuf)   {service: “RequestLatency”, metric: 4, time: 1395705600, host: i-2c9e85, stack: Test, namespace: “WebFrontEnd”, acct_uuid:’56A17202-36C2-46E8-8905-90423040AAA’ }   StackWatch   Riemann   OpenTSDB   PutMetrics   MetricDataè   MetricData  è  
  21. 21. CloudStack  Integra=on   •  Need  secret  key  from  CloudStack  DB   –  GetUser  Admin  API  returns   •  Secret  key   •  UUID  of  Account   •  Secret  Key  used  to  authen=cate  query  API   •  Account  UUID  usage:   –  Tag  metric  events  sent  to  OpenTSDB  and  Riemann   –  Part  of  primary  key  in  DB   –  E.g.,  metric  table  has  columns  account_uuid,  namespace  and   metric_name. Primary  key  is  composite  of  these  columns.   •  User  informa=on  cached  inside  app  for  speed   –  Call  GetUser  API  on  cache  miss  
  22. 22. StackWatch  Current  Status   •  Clojure  Web  App   – Uses  Ring  web  framework   – Easy  to  scale  up.  E.g.,  1000  tenants  send  1000   events  per  minute  =  1  million  events  per  minute   •  API  elements  that  work   – PutMetricData   – ListMetrics   – GetStats   •  No  Web  UI  
  23. 23. What  about  AutoScale?   •  CloudStack  AutoScale  API  not  fully  compa=ble   with  AWS   •  AutoScaling  service  concept   – StackScaler  Service  (Ruby-­‐on-­‐Rails  app)   – Concept  only,  not  implemented  
  24. 24. StackScaler  Architecture   CloudStack   StackScaler   (RoR  app)   AutoScaling  API   Create  autoscale   group/create-­‐ launch-­‐config  /  etc.   Cache   DB   AutoScale  Groups   Instance  Info   Launch  Config   History   Alarm  Configè   Creden=al  Cache   GetUser   StackWatch   (Clojure)   deployVM  /   listVM   çThreshold  Alarm   Service  interac=ons   always  use  the  Public   API  
  25. 25. Lessons  learnt   •  Service  oriented  architecture  is  useful  for   – Rapid  prototyping  /  evolu=on   – Using  your  favorite  language   – Using  the  appropriate  frameworks   •  E.g.,  undesirable  to  throw  a  million  PutMetricData  API   requests/minute  at  CloudStack   •  Riemann  and  OpenTSDB  both  have  incompa=ble   licenses  
  26. 26. Lessons  learnt   •  But   – Reinvent  API  parsing,  valida=on  and   authen=ca=on   – Reinvent  clustering,  DB  abstrac=ons,  etc.   – Key  management  problem  (admin  keys   distributed  to  each  service)   – Mul=tude  of  moving  parts  requires  automated   deployment  and  opera=on   – Unified  UI  ques=on      
  27. 27. Future   •  Test  metric  inser=on  at  scale     – Validate  architecture   •  Support  complete  CloudWatch  API   •  Start  working  on  AutoScale  service  triggered   by  StackWatch  
  28. 28. The  case  for  a  separate  service   •  You  don’t  want  to  code  in  Java   •  Your  requirements  aren’t  clear  and  you  want  to  iterate   quickly   •  Your  audience  is  different  (e.g.,  DBaaS  vs  IAAS)   •  CloudStack  Public  API  is  perfectly  adequate  for  your   service   •  Your  service  serves  a  niche  need   –  E.g.,  you  want  to  evaluate  hypervisors  for  patching   •  The  opera=onal  envelope  is  sufficiently  different   –  E.g.,  performance,  API  rate,  DB  needs,  HA,     •  License  issues  
  29. 29. The  case  for  an  in-­‐process  service   •  Community  advantages   – Many  eyes,  many  users   •  General  purpose  service  with  clear-­‐cut   requirements   •  Similar  opera=onal  envelope  to  CloudStack   •  Public  API  is  insufficient,  need  access  to   internal  APIs   – Consider  enhancing  the  public  API  first  
  30. 30. Weaker  case  for  in-­‐process  service   •  To  use  CloudStack  clustering  logic   •  To  use  UI  plugin  infrastructure     •  To  use  database  layer  but  only  for  new  tables   •  Joins  with  exis=ng  tables  (account  /  host  /  etc.)   –  uuid  column  is  your  friend.   •  To  use  API  framework   – Perhaps  this  needs  to  be  an  independent  component  
  31. 31. Niche  service  examples   •  Hypervisor  patching  service   – Use  admin  API  to  list  hypervisors  and  work  off   that  list   •  Integrate  with  your  datacenter  monitoring  /   alarm  /  CMDB   •  Real-­‐=me  repor=ng  and  correla=on   •  Spot  pricing  
  32. 32. References   •  OpenTSDB  hWp://opentsdb.net/   •  Riemann  hWp://riemann.io/index.html   •  Micro  Services   hWp://mar=nfowler.com/ar=cles/microservices.html   •  hWp://en.wikipedia.org/wiki/Fallacies_of_Distributed_Compu=ng      

    Be the first to comment

    Login to see the comments

  • MadeleineLee

    May. 1, 2014
  • ananthmit

    Dec. 2, 2014

Presented at CloudStack Collab 2014 in Denver. The presentation explores adding a Cloudwatch service to Apache CloudStack and some of the interesting design decisions and consequences.

Views

Total views

2,107

On Slideshare

0

From embeds

0

Number of embeds

42

Actions

Downloads

2

Shares

0

Comments

0

Likes

2

×