SlideShare a Scribd company logo
1 of 50
Download to read offline
Developing a Service
         Dashboard:
 keeping an eye on things
                    12th July 2012




            Dr Malcolm Murray
Bb: a complex system to manage
People need a simple dashboard
What should it look like?
Complexity
         5
A complex system to manage


              F5
Many interfaces
Too many interfaces
What should it do?

 Monitor
 Measure
 Alert
 Email
 Report
 Make Coffee?
Measurement
          9
What should we measure?

Disk Space
Table Size
Load
Users
Database Measures


Disks used
Table Size
Trends
Database Measures
Detail from the Sparkline
Application Servers
Disk space
Load
Connections
Users
Digg Effect or DOS?
Whose Online Now?




With thanks to Santo Nucifora at Seneca College
Implementation
             18
A lot of shell scripts
cd /local/bboard/blackboard/content
df -P . | grep -v '1024-blocks' | awk '{print "insert into dur_dashboard_data (when,
name, space, capacity, used, available) values
(sysdate,?duocontent?,?"$1"?,?"$2"?,?"$3"?,?"$4
"?);"}'|sed "s^?^'^g" >> /local/home/bbuser/sc/intotable.sql

Filesystem          1024-blocks      Used Available Capacity Mounted on
/dev/sda1               16246428   3231304 12176772      21% /
tmpfs                   32995944   2401084 30594860       8% /dev/shm
/dev/mapper/vg0-s01     51606140   3914360 45070340       8% /s01
/dev/mapper/vg0-s02     51606140 14266484 34718216       30% /s02
/dev/mapper/vg0-data01 258030980 181684776 63239004      75% /data01
/dev/mapper/vg0-data02 258030980 88587208 156336572      37% /data02
/dev/mapper/vg0-data03 309637120 284903840  9005280      97% /data03




ssh bbuser@duoapp1 'w'
10:23:05 up 3 days, 23:44, 0 users, load average: 0.02, 0.04, 0.00
USER TTY FROM                LOGIN@ IDLE JCPU PCPU WHAT
Database Tablespaces
spool /local/home/bbuser/sc/tablespacereturn;

SELECT Total.name "Tablespace Name",
     nvl(Free_space, 0) Free_space,
     nvl(total_space-Free_space, 0) Used_space,
     total_space
FROM
 (select tablespace_name, sum(bytes/1024) Free_Space
   from sys.dba_free_space dfs
   group by tablespace_name
 ) Free,
 (select b.name, sum(bytes/1024) TOTAL_SPACE
   from sys.v_$datafile a, sys.v_$tablespace B
   where a.ts# = b.ts#
   group by b.name
 ) Total
WHERE Free.Tablespace_name(+) = Total.name
ORDER BY Total.name
/
spool off;
Database Tablespaces
Tablespace Name                   FREE_SPACE   USED_SPACE   TOTAL_SPACE
------------------------------ ---------- ---------- -----------
BBADMIN_DATA                        48384       2816       51200
BBADMIN_INDX                        18048       2432       20480
BB_BB60_DATA                      1766336 102734880    104501216
BB_BB60_INDX                      1128512   29519808    30648320
BB_BB60_STATS_DATA                3545344   76160752    79706096
BB_BB60_STATS_INDX                6362880 141035728    147398608
CMS_DATA                             4736      81664       86400
CMS_DOC_DATA                        42368     467712      510080
CMS_DOC_INDX                        23040     453184      476224
CMS_FILES_COURSES_DATA             139392    2779776     2919168
CMS_FILES_COURSES_INDX              93760    1867072     1960832
CMS_FILES_INST_DATA                204928    4090752     4295680
CMS_FILES_INST_INDX                414720    3681280     4096000
CMS_FILES_LIBRARY_DATA              41600     299904      341504
CMS_FILES_LIBRARY_INDX               9728     177664      187392
CMS_FILES_ORGS_DATA                 44160     430592      474752
CMS_FILES_ORGS_INDX                 13376     249792      263168
CMS_FILES_USERS_DATA                28672     557056      585728
CMS_FILES_USERS_INDX                25408     503424      528832
CMS_INDX                             3136      60096       63232
SYSAUX                              75392    1235328     1310720
SYSTEM                               7616     832064      839680
UNDOTBS1                          4676480     341120     5017600
USERS                             1445056    1880384     3325440
Stored in custom tables
Is there a better way?

Almost definitely!


                              e.g. Zabbix


More details available:
http://tinyurl.com/bbzabbix
Simplicity
         24
Developing a Service Dashboard: keeping an eye on things
Deployment Target
Must work from my desk

         Hey folks,
        duoapp2 is in
          trouble…




                         … and you
                        need to tidy
                         your desk
28
29
30
31
Favourite?
1                2




3


                 5


          4
33
Learn from others




Stephen Few     Edward Tufte
Careful use of colour
Informative Graphics

                      1991.1.1   65 months 2004.4.28     low    high

 Euro foreign exchange $1.1608              1.1907     .8252 1.2858
 Euro foreign exchange ¥121.32              130.17     89.30 140.31
 Euro foreign exchange £0.7111              0.6665     .5711 0.7235




Edward Tufte’s Sparklines
Simple Code 




     <span class="sparkline">
     10 14 15 4.5 3.4 16
     </span>

 http://code.google.com/p/js-sparklines/
Deployment
         38
Light touch on production



       3 million visits
       Oct – Dec 2011
Update only the key metrics
JavaScript Graphing




http://code.google.com/apis/chart/
Easy live reconfiguration
Tailor to the Live System
Use Meaningful Labels
Tabs provide easy links
What next?
              Better integration with the F5

               More on NetApps disk usage

                   Java Memory Utilization

             Number of downloads per user

               Decide what to make public!
F5 Reporting
Summary
      48
Summary

Keep it simple
Light touch on system being monitored
Allow dynamic reconfiguration
Manage access using tabs & roles
Learn from others


   Slides available at: http://db.tt/rp2D88Nt
@malcolmmurray
          malcolm.murray@durham.ac.uk
            malcolm.murray@gmail.com




    We value your feedback!
Please fill out a session evaluation.

                                        50

More Related Content

More from Malcolm Murray

Lecture capture: the big bang theory
Lecture capture: the big bang theoryLecture capture: the big bang theory
Lecture capture: the big bang theoryMalcolm Murray
 
Durham University’s first institution-wide implementation of eXplorance Blue
Durham University’s first institution-wide implementation of eXplorance BlueDurham University’s first institution-wide implementation of eXplorance Blue
Durham University’s first institution-wide implementation of eXplorance BlueMalcolm Murray
 
Learning from student perspectives on digital assessment
Learning from student perspectives on digital assessmentLearning from student perspectives on digital assessment
Learning from student perspectives on digital assessmentMalcolm Murray
 
Extending the breadth and depth of interaction using gamification
Extending the breadth and depth of interaction using gamificationExtending the breadth and depth of interaction using gamification
Extending the breadth and depth of interaction using gamificationMalcolm Murray
 
eLearning at Durham: a UK Perspective
eLearning at Durham: a UK PerspectiveeLearning at Durham: a UK Perspective
eLearning at Durham: a UK PerspectiveMalcolm Murray
 
Student voice : is honesty the best policy?
Student voice : is honesty the best policy?Student voice : is honesty the best policy?
Student voice : is honesty the best policy?Malcolm Murray
 
Replacing duo: towards a business case
Replacing duo: towards a business caseReplacing duo: towards a business case
Replacing duo: towards a business caseMalcolm Murray
 
Copy is a 4 letter word
Copy  is a 4 letter wordCopy  is a 4 letter word
Copy is a 4 letter wordMalcolm Murray
 
Rethinking student feedback
Rethinking student feedbackRethinking student feedback
Rethinking student feedbackMalcolm Murray
 
Letting the lunatics run the asylum - students developing code for the prod...
Letting the lunatics run the asylum - students developing code for the prod...Letting the lunatics run the asylum - students developing code for the prod...
Letting the lunatics run the asylum - students developing code for the prod...Malcolm Murray
 
Diagnosing account, enrolment and snapshot problems using the APIs
Diagnosing account, enrolment and snapshot problems using the APIsDiagnosing account, enrolment and snapshot problems using the APIs
Diagnosing account, enrolment and snapshot problems using the APIsMalcolm Murray
 

More from Malcolm Murray (13)

Lecture capture: the big bang theory
Lecture capture: the big bang theoryLecture capture: the big bang theory
Lecture capture: the big bang theory
 
Durham University’s first institution-wide implementation of eXplorance Blue
Durham University’s first institution-wide implementation of eXplorance BlueDurham University’s first institution-wide implementation of eXplorance Blue
Durham University’s first institution-wide implementation of eXplorance Blue
 
Learning from student perspectives on digital assessment
Learning from student perspectives on digital assessmentLearning from student perspectives on digital assessment
Learning from student perspectives on digital assessment
 
Extending the breadth and depth of interaction using gamification
Extending the breadth and depth of interaction using gamificationExtending the breadth and depth of interaction using gamification
Extending the breadth and depth of interaction using gamification
 
eLearning at Durham: a UK Perspective
eLearning at Durham: a UK PerspectiveeLearning at Durham: a UK Perspective
eLearning at Durham: a UK Perspective
 
Student voice : is honesty the best policy?
Student voice : is honesty the best policy?Student voice : is honesty the best policy?
Student voice : is honesty the best policy?
 
Replacing duo
Replacing duoReplacing duo
Replacing duo
 
Replacing duo: towards a business case
Replacing duo: towards a business caseReplacing duo: towards a business case
Replacing duo: towards a business case
 
Copy is a 4 letter word
Copy  is a 4 letter wordCopy  is a 4 letter word
Copy is a 4 letter word
 
Rethinking student feedback
Rethinking student feedbackRethinking student feedback
Rethinking student feedback
 
Letting the lunatics run the asylum - students developing code for the prod...
Letting the lunatics run the asylum - students developing code for the prod...Letting the lunatics run the asylum - students developing code for the prod...
Letting the lunatics run the asylum - students developing code for the prod...
 
Diagnosing account, enrolment and snapshot problems using the APIs
Diagnosing account, enrolment and snapshot problems using the APIsDiagnosing account, enrolment and snapshot problems using the APIs
Diagnosing account, enrolment and snapshot problems using the APIs
 
Of Course you Can!
Of Course you Can!Of Course you Can!
Of Course you Can!
 

Recently uploaded

CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 

Recently uploaded (20)

CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 

Developing a Service Dashboard: keeping an eye on things

  • 1. Developing a Service Dashboard: keeping an eye on things 12th July 2012 Dr Malcolm Murray
  • 2. Bb: a complex system to manage
  • 3. People need a simple dashboard
  • 4. What should it look like?
  • 6. A complex system to manage F5
  • 8. What should it do? Monitor Measure Alert Email Report Make Coffee?
  • 10. What should we measure? Disk Space Table Size Load Users
  • 13. Detail from the Sparkline
  • 15. Users
  • 17. Whose Online Now? With thanks to Santo Nucifora at Seneca College
  • 19. A lot of shell scripts cd /local/bboard/blackboard/content df -P . | grep -v '1024-blocks' | awk '{print "insert into dur_dashboard_data (when, name, space, capacity, used, available) values (sysdate,?duocontent?,?"$1"?,?"$2"?,?"$3"?,?"$4 "?);"}'|sed "s^?^'^g" >> /local/home/bbuser/sc/intotable.sql Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sda1 16246428 3231304 12176772 21% / tmpfs 32995944 2401084 30594860 8% /dev/shm /dev/mapper/vg0-s01 51606140 3914360 45070340 8% /s01 /dev/mapper/vg0-s02 51606140 14266484 34718216 30% /s02 /dev/mapper/vg0-data01 258030980 181684776 63239004 75% /data01 /dev/mapper/vg0-data02 258030980 88587208 156336572 37% /data02 /dev/mapper/vg0-data03 309637120 284903840 9005280 97% /data03 ssh bbuser@duoapp1 'w' 10:23:05 up 3 days, 23:44, 0 users, load average: 0.02, 0.04, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
  • 20. Database Tablespaces spool /local/home/bbuser/sc/tablespacereturn; SELECT Total.name "Tablespace Name", nvl(Free_space, 0) Free_space, nvl(total_space-Free_space, 0) Used_space, total_space FROM (select tablespace_name, sum(bytes/1024) Free_Space from sys.dba_free_space dfs group by tablespace_name ) Free, (select b.name, sum(bytes/1024) TOTAL_SPACE from sys.v_$datafile a, sys.v_$tablespace B where a.ts# = b.ts# group by b.name ) Total WHERE Free.Tablespace_name(+) = Total.name ORDER BY Total.name / spool off;
  • 21. Database Tablespaces Tablespace Name FREE_SPACE USED_SPACE TOTAL_SPACE ------------------------------ ---------- ---------- ----------- BBADMIN_DATA 48384 2816 51200 BBADMIN_INDX 18048 2432 20480 BB_BB60_DATA 1766336 102734880 104501216 BB_BB60_INDX 1128512 29519808 30648320 BB_BB60_STATS_DATA 3545344 76160752 79706096 BB_BB60_STATS_INDX 6362880 141035728 147398608 CMS_DATA 4736 81664 86400 CMS_DOC_DATA 42368 467712 510080 CMS_DOC_INDX 23040 453184 476224 CMS_FILES_COURSES_DATA 139392 2779776 2919168 CMS_FILES_COURSES_INDX 93760 1867072 1960832 CMS_FILES_INST_DATA 204928 4090752 4295680 CMS_FILES_INST_INDX 414720 3681280 4096000 CMS_FILES_LIBRARY_DATA 41600 299904 341504 CMS_FILES_LIBRARY_INDX 9728 177664 187392 CMS_FILES_ORGS_DATA 44160 430592 474752 CMS_FILES_ORGS_INDX 13376 249792 263168 CMS_FILES_USERS_DATA 28672 557056 585728 CMS_FILES_USERS_INDX 25408 503424 528832 CMS_INDX 3136 60096 63232 SYSAUX 75392 1235328 1310720 SYSTEM 7616 832064 839680 UNDOTBS1 4676480 341120 5017600 USERS 1445056 1880384 3325440
  • 23. Is there a better way? Almost definitely! e.g. Zabbix More details available: http://tinyurl.com/bbzabbix
  • 27. Must work from my desk Hey folks, duoapp2 is in trouble… … and you need to tidy your desk
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. Favourite? 1 2 3 5 4
  • 33. 33
  • 34. Learn from others Stephen Few Edward Tufte
  • 35. Careful use of colour
  • 36. Informative Graphics 1991.1.1 65 months 2004.4.28 low high Euro foreign exchange $1.1608 1.1907 .8252 1.2858 Euro foreign exchange ¥121.32 130.17 89.30 140.31 Euro foreign exchange £0.7111 0.6665 .5711 0.7235 Edward Tufte’s Sparklines
  • 37. Simple Code  <span class="sparkline"> 10 14 15 4.5 3.4 16 </span> http://code.google.com/p/js-sparklines/
  • 39. Light touch on production 3 million visits Oct – Dec 2011
  • 40. Update only the key metrics
  • 43. Tailor to the Live System
  • 46. What next? Better integration with the F5 More on NetApps disk usage Java Memory Utilization Number of downloads per user Decide what to make public!
  • 48. Summary 48
  • 49. Summary Keep it simple Light touch on system being monitored Allow dynamic reconfiguration Manage access using tabs & roles Learn from others Slides available at: http://db.tt/rp2D88Nt
  • 50. @malcolmmurray malcolm.murray@durham.ac.uk malcolm.murray@gmail.com We value your feedback! Please fill out a session evaluation. 50

Editor's Notes

  1. Developing a Service Dashboard: keeping an eye on thingsThis session walks the user through the process of developing a custom service dashboard for the Blackboard Learn platform, which was deployed in production December 2011. It displays metrics such as disk space, database tables, processor load, connections, the number of users, etc., automatically updating.Malcolm Murray will share the measures used and how they were obtained, the APIs used to display the content and manage access, and the use of ajax and Google charts to provide live updates. Some time is spent explaining the design philosophy so that viewers aren’t dazzled by an array of blinking lights.It will conclude showing how we have incorporated other monitoring tools into the dashboard and our plans for the future.Audience: Sys Admins? Managers? Developers?Eyeball from: http://www.clker.com/cliparts/q/K/E/M/8/C/green-eye-md.png
  2. The key issue here is that Blackboard is a complicated service to manage. There’s a lot to it, some parts are essentially black boxes (closed systems) but we need to keep it running smoothly.Out of the box, there aren’t many management tools (though the Admin Console is a good start)Image source: http://myhometheaterbuild.wordpress.com/2011/08/03/feeling-ocd-how-to-clean-your-car-engine/
  3. The team managing a service don’t need a report from every switch or transaction – they don’t need to know everything, but do want to quickly check all seems well, or be alerted to any problem.What’s more this needs to be done in such a way that it doesn’t get in the way of their other activities. We can’t assume that all the team are liux/database/java/whatever gurus, so the tool needs to indicate where the concern is in plain english.Dashboard icon: http://www.veryicon.com/icons/system/smoothicons-5/dashboard-17.html
  4. So what should the Blackboard dashboard look like? The borrowing of terms from automotive design is a appropriate – the “dashboard” needs to convey the data we need, without distracting us from the road ahead.The second photo is an example of a poorly designed dashboard – your focus is on the steering column, not the windscreen.Image source: http://tutsplus.com/tutorial/creating-a-car-dashboard-using-the-brush-tool/Photo source: http://static.stomp.com.sg/site/servlet/linkableblob/stomp/1148306/data/a98211_d13jpg1338954675095-data.jpg
  5. So just why is it so complicated?
  6. For most institutions, Blackboard is not a stand-alone service running on a single box.From our own experience, as demand increased, we brought in a load balancer, currently run on four virtual servers, have a dedicated collab server, dedicated database, store data on a filestore (a NetApps appliance), etc. Each server has different partitions, the database spans multiple volumes, all with their own quota. There’s a lot of interdependencies and it is essentially a meta-service composed of lots of discrete units.
  7. When we started, each part had (if we were lucky) it’s own monitoring interface.Each had a different URLs, somerequired password, some only accessible on-site, etc.Some of the pages were definitely not designed for quick inspection key information such as server load may take careful scrutiny.Bringing these together introduces a further tension – the need to keep things simple, yet provide access to lots of interfaces in one placeImage icon source: http://www.caradvice.com.au/20229/cadillac-introduces-2010-srx-crossover/
  8. At this point there is a real danger of scope-creep – what do we need the dashboard to do?The initial aim was some form of (semi) real time monitoring, which can provide the metrics needed for service measurement if we persist them somewhere.Some hoped it would provide alerts – e.g. triggering emails if a measure exceeded a critical threshold. Personally I think this is in the wrong place – don’t expect a failing server to email you that it is going down It is hard to escape the ever-growing demand for KPIs – could this provide some? Should it? Could KPIs help shape what we measure?The scoping stage needs a lot of discipline or you end up trying to spec the impossible. My advice is to start small, and be prepared to learn, borrow, share and sometimes even start again from scratch (but wiser)!Image: http://img.ehowcdn.com/article-new/ehow/images/a05/ms/99/wire-switch-panel-race-cars-800x800.jpg
  9. A dashboard needs to provide some measurements
  10. The obvious candidates were used and available disk space, ditto for database tables, some measure of processor load and an indicator of users to help us understand any sudden changes in the figures.We started with the easy things to measure and added more as we could.An important activity if verifying the results – do your numbers add up? If you say a disk is approaching 75% capacity, is there really 25% free?Image source: http://www.moates.net/innovate-auxbox-lma-3.html
  11. Our database is split over several logical volumes. Although the tables are set to autogrow, sometimes in the past we had run very close to running out of space.Thus an early task was to get these figures onto the screen.Here the figures for disk data03 are shown in red – indicating they had exceeded a warning threshold. (Panic not, new data are no longer written to data03 now).As these figures only change relatively slowly (and there is a non-trivial overhead in calculating them) we decided they should be updated when the page is refreshed, drawing on data updated hourly.Image source: http://thinkinginrails.com/wp-content/uploads/2010/05/database-integration.jpg
  12. If you want more detail about the database – e.g. to try and understand why one volume is filling up – click on the database tab.This lists the volumes, tables and indexes, complete with sparklines showing the measures for the last 48 hours. These trends help to stop panics if a temp folder suddenly starts filling up.Clicking on the sparkline opens a new window…
  13. Here the disk usage is shown in two graphs, generated using the Google chart APIs.The left hand graph (red) shows relative usage – how near 100% of the disk space allocated to the system did we get?The right hand (blue) shown disk space in absolute terms – the horizontal grey line along the top shows that this disk has been sitting at 250GB throughout the monitoring period.The highlighted section along the bottom allows you to zoom in on a particular section of the graph – all thanks to Google’s code!
  14. The performance of the app servers was another early candidate that made it to the dashboard.We wanted similar measures of disk usage and capacity, but also processor load and the number of connections to the load balancer.We also wanted some JVM stats, but that has to wait until version 2.01As the load and connections are volatile, we needed these data to be regularly refreshed.Image source: http://www.7l.com/images/large-SL2600-Multi-Servers-icon.png
  15. For users, we can harvest the existing session data stored in the Blackboard database. Given that not everyone in the sessions table is likely to be an active user – we don’t all log out – we also plotted the number of logins. This tab provides a graph that the user can query, to see whether current usage is high, low or normal.
  16. What happened at 4.30 yesterday? Did someone try and download the entire content collection? Did something go viral? Or was there a denial of service attack?On the front page we plot a summary of current users, updated using a Prototype query every minute. A more detailed version of these data is available from the Online Now tab…
  17. The logic used to gather these data is based on Santo’s SENECA Who’s Online building block.Knowing who is actually online doing something can help diagnose strange events (load spikes) or provide a list of people to inform if things are about to go wrong!
  18. I’ve sort of skipped ahead, showing you the end result without really explaining how.The next section gives you a flavour of the scripts we use to generate and record the data.Note they may not be the best way of doing them, or the most efficient.They are in-house solutions that work, that’s good enough for us just now!
  19. Most of the data are gathered using a cron job that triggers a set of shell scripts running on one server –we chose the collab server as it is under-used.They follow a set pattern:Invoke a linux command to generate a set of measures (e.g. df –P or netstat) possibly redirecting the output to a file.Massage the file using commands such as grep and awk to get just the bits we need, in the format we want.Append these to a text file in the form of SQL insert statements.Once all the measures are done, run the SQL to persist the data.Credit here goes to my colleague Stephen Applegarth.
  20. This example shows the query used to generate free, used and total disk space figures for the database tables
  21. The query on the previous slide generates output like this
  22. This project had zero budget and limited time.At the developer’s conference this year, NoriakiTatsumi from Blackboard gave a great presentation showing how to use the free (GPL2 license) tool zabbix for system monitoring, which links to a custom building block, allowing JMX calls, security checks and lots of other goodness.I am sure this is the way to go – we will be investigating this when I get home! His presentation was recorded and should be available with all the other devcon2012 materials when they are released.
  23. OK – time to think more about the UI design decisions we made when developing the system dashboard.
  24. Hard to argue with Einstein (and win)But what does this mean for our dashboard?Image source: http://www.wallchan.com/wallpaper/19591/
  25. One of the design constraints/requirements was that the product needed to fit on this old 30” 75 cm monitor (running 1360 x 768) that we had lying around the office.I am a firm believer that the front page of a dashboard shouldn’t need to scroll.
  26. This wasn’t something that was going to be right under my nose all the time.It has to work from a distance
  27. We now look at a selection of dashboard designs culled from the internet – there are many more just google ‘system dashboard’http://www.dashboardinsight.com/dashboards/product-demos/altosoft-insight-dashboard-for-system-center.aspx
  28. http://dashboardspy.wordpress.com/2010/12/08/excel-dashboard-tutorial/Interesting example created using Excel – lots of VBA so no good for Mac users
  29. Two of the more classic analogue control panel designs – do you like these?http://www.designvsart.com/blog/2008/08/14/designing-information-dashboards/#.T_YAi3BzWw8
  30. Red amber green lightshttp://dashboardspy.wordpress.com/2006/11/02/business-analysis-monitoring-dashboards-bam-rolling-up-application-kpis-for-a-system-status-dashboard/
  31. Which is your favourite?No right answer!
  32. This example shows the way this dashboard appears to the 5% of men who suffer from deuteranopia (most common form of colour blindness).Can you tell which services are now in a state of alert?http://dashboardspy.wordpress.com/2006/11/02/business-analysis-monitoring-dashboards-bam-rolling-up-application-kpis-for-a-system-status-dashboard/
  33. My thinking has been informed by reading around the subject. I have found these two authors particularly informative and thought provoking.N.B. That is not the same as saying that I agree with everything they say!Stephen Few has over 20 years of experience as an innovator, consultant, and educator in the fields of business intelligence (a.k.a. data warehousing and decision support) and information design. Through his company, Perceptual Edge, he focuses on the effective analysis and presentation quantitative business information. Stephen is recognized as a world leader in the field of data visualization. He teaches regularly at conferences such as those presented by The Data Warehousing Institute (TDWI) and DCI, and also in the MBA program at the Haas School of Business at U. C. Berkeley. He is also the author of the book &quot;Show Me the Numbers: Designing Tables and Graphs to Enlighten&quot; (Analytics Press). Edward Tufte is an American statistician and professor emeritus of political science, statistics, and computer science at Yale University. He is noted for his writings on information design and as a pioneer in the field of data visualization.
  34. When designing our dashboard we considered the phrase “Don’t bother me I am busy!”It is designed so that if after a quick glance all is grey, that means we can go back to our day job, all is well.If something is amiss it appears red and in bold font – drawing your attention to the issue.Consider how this would look if we had used red and green…
  35. Sparklines are clever data-rich graphics that manage to impose a low cognitive load on the viewer.Tufte’s examples list start and end values in the sequence, plus low and high points.In our case we simply show the last 48 results, plotting the start and end values. We felt that was enough for this application.
  36. They are rendered using a delightfully simple bit of javascript – simply enclosing the sequence of numbers in a custom span.This means that the browser builds the graph on the fly – delayed auto buildDoesn’t work in a grumpy browser like IE that won’t support the canvas element – not a problem for our implementation – Safari, Firefox or Chrome would do nicely.
  37. Now lets turn our attention to the deployment process
  38. 2,996,403 Visits 1st Oct to 31 Dec 2011 (Google Analytics)The system is busy. We had to make sure that our monitoring wouldn’t tip it over the edge during busy periods.Image source: http://misterysnake.homepage24.de/bilder/indieluftgehen.jpg
  39. AJAX query updates key data every minute. – number of live sessions, the app server sparklines and number of connections. Database and app server disk space figures are pulled from database when page first loaded. Cached for this user until a refresh is forced. The bottom performance graphic is read from a file – this is generated automatically by another building block
  40. The rest of the graphing uses the Google chart API – this makes the browser do the work – all the data is stored on the page, the API selects which portion to graph – no JSP refresh neededInitial graph LHS relative, RHS actual – steps indicate physical growth in available disk spaceThen overlain with interactive graph where we can change the metric displayed
  41. Standard building block interface – a degree of future proofingThresholds – match our current risk appetite – how full should a disk get before we colour it red?2. URLs – allows us to change the location of external tools at will3: Access control – Use institutional roles and tabs to manage who sees what – some of the pages that allow dynamic reconfiguration are visible to sys admins but not senior managers!
  42. Standard building block interface – a degree of future proofingProvide information here displayed on individual server reports to help understand any differencesServer diagram generated on the fly using google chart APIs – helps to ensure categorisation is correctCan edit these data from here without a restart
  43. This page allows you to provide a friendly name for the various tables and disks – so everyone knows who to contact and what to ask about if they see an alertThis information is echoed on graphs of individual disks/servers
  44. Use tabs to easily switch views – incl. links to pages shown earlierKey information is on the front page, but useful links are collected in one place on the others.
  45. Image source: http://www.tuvie.com/wp-content/uploads/solid-future-car-concept1.jpgThis is only the start…
  46. This slide shows our first attempt at replacing some of the old apache load balancer pages with custom reports generated by querying the F5 appliance. Still very much a work in progress. This page updates automatically.
  47. So what have we learned?
  48. Keep it simple: one screen, grey is good, red is bad newsLightweight: use ajax calls to update screen, display content pulled from other sitesDynamic reconfiguration: important it stays up to date and has all the information you needUse tabs to organise content and control accessLearn from others: plenty of books, look at other dashboards around – which ones do you like/hate?It is doable – have a go yourself!