Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SYSADMIN’S TOOLBOX
TOOLS FOR RUNNING CEPH IN
PRODUCTION
Paul Evans
principal architect
daystrom technology group
Paul at D...
WHAT’S IN THIS TALK
• Tools to help Understand Ceph Status
• Tips forTroubleshooting Faster
• Won’t Cover it All
• Maybe s...
HOW THINGS ARE IN THE
LAB
AND THEN THERE IS
PRODUCTION
Is there a Simple way to run Ceph that
isn’t Rocket Science?
WHAT COULD BE SIMPLER
THAN THE….CLI ?
Ceph’s CLI is Great, but…
REALITY: many Operations Teams juggle
too many technologie...
Need Info Fast? GUI
InkScope VSM
Calamariceph-dash
GUI TOOL OPTIONS
Calamari VSM InkScope ceph-dash
Backing From Red Hat Intel Orange Labs
Christian
Eichelmann
Lastest
Versi...
MONITORING
Calamari VSM InkScope ceph-dash
Mon Status Y Y Y Y
OSD Status Y Y Y Y
OSD-Host Mapping Y Y Y Y
PG Status Y Y Y ...
MANAGEMENT
Calamari VSM InkScope ceph-dash
Deploy a Cluster N Y N N
Deploy Hosts (add/remove) N Y N N
Deploy Storage Group...
CALAMARI
VIRTUALSTORAGE
MANAGER
CEPH-DASH
INKSCOPE
NOW THAT WE HAVE VISIBILITY….
What are we looking for?
WHAT WE WANT
✓ Monitor Quorum
✓ Working OSDs
✓ Happy Placement Groups
OSDWORKBENCH
PGSTATES
✓ I’m Making Things Right
๏ I (may) Need Help
All Good - Bring Data
THE‘NOGUI’OPTION
ceph@m01:~$ ceph osd tree
# id weight type nameup/down reweight
-1 213.6 root default
-2 43.44 host s01
9...
THE‘NOGUI’OPTION
pg_stat objects bytes status up up_pri
10.f8 3042 25474961408 active+remapped [58,2147483647,24,20,55,59,...
VSM:
TROUBLESHOOTING
Repeated'auto*out'or'inability'
to'restart'auto*out'OSD'
suggests'failed'or'failing'disk'
VSM'periodi...
VSM:
TROUBLESHOOTING Restore'OSDs
 Managing&
Storage&Devices&
Restore&OSDs&
Wait&
Select&
Sort&
Confirm&
Verify&
(may&need&...
When it’s time to go deep…
/var/log/ceph/ceph.log
/var/log/ceph/ceph-mon.[host].log
/var/log/ceph/ceph-osd.[xx].log
ceph t...
REMINDER ABOUT CLUSTERS
Clusters rarely do things instantly.
Clusters can be like a Flock of Sheep - it
starts to move in ...
VSM:FASTDEPLOY Create&Cluster
 Ge#ng&Started&
Create&new&Ceph&
cluster&
All&servers&
present&
Correct&subnets&
and&IP&addr...
VSM:FASTDEPLOY Create&Cluster
Step%1%
Step%2:%Confirm%
VSM:FASTDEPLOY
Create&Cluster&*&Status&Sequence
 Ge#ng&Started&
Remember to…
IN TIME YOUR CLUSTER WILL
LEARN TO FOLLOW YOU
The 3 Keys to
Ceph in
Production
Happy PGs
Happy Monitors
Happy OSDs
thank you!
Paul Evans
principal architect
Paul at Daystrom dot com
technology grouptechnology group
san francisco
ceph days
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Inkscope - a CEPH GUI initiative
Next
Download to read offline and view in fullscreen.

4

Share

Download to read offline

Living with a Cephalopod: Daily Care & Feeding of Ceph Storage

Download to read offline

Living with a Cephalopod: Daily Care & Feeding of Ceph Storage - why Ceph is the lowest cost per gig storage solution, and how easy it is to deploy your own Ceph cluster!

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Living with a Cephalopod: Daily Care & Feeding of Ceph Storage

  1. 1. SYSADMIN’S TOOLBOX TOOLS FOR RUNNING CEPH IN PRODUCTION Paul Evans principal architect daystrom technology group Paul at Daystrom dot com san francisco ceph day March 12, 2015
  2. 2. WHAT’S IN THIS TALK • Tools to help Understand Ceph Status • Tips forTroubleshooting Faster • Won’t Cover it All • Maybe some Fun
  3. 3. HOW THINGS ARE IN THE LAB AND THEN THERE IS PRODUCTION
  4. 4. Is there a Simple way to run Ceph that isn’t Rocket Science?
  5. 5. WHAT COULD BE SIMPLER THAN THE….CLI ? Ceph’s CLI is Great, but… REALITY: many Operations Teams juggle too many technologies already… Do they need to learn another CLI?
  6. 6. Need Info Fast? GUI InkScope VSM Calamariceph-dash
  7. 7. GUI TOOL OPTIONS Calamari VSM InkScope ceph-dash Backing From Red Hat Intel Orange Labs Christian Eichelmann Lastest Version 1.2.3 2014.12-0.9.1 1.1 1.0 Release Date Sep 2014 Dec 2014 Jan 2015 Feb 2015 Capabilities Monitor + Light Config Monitor + Config Monitor + Light Config Monitor Only Compatability Wide Limited Wide Wide
  8. 8. MONITORING Calamari VSM InkScope ceph-dash Mon Status Y Y Y Y OSD Status Y Y Y Y OSD-Host Mapping Y Y Y Y PG Status Y Y Y Y PG-OSD Mapping N N Y N MDS Status N Y Y N Host Status Y Y Y Y Capacity Utilization Y via Groups Y Y Throughput (Cluster) N Y Y Y IOPS (Cluster) Y Y Y Y Errors/Warnings Y Y Y Y View Logs Y N N N Send Alerts (email) N N N via nagios plug-in Charts/Graphs Y N N via nagios plug-in
  9. 9. MANAGEMENT Calamari VSM InkScope ceph-dash Deploy a Cluster N Y N N Deploy Hosts (add/remove) N Y N N Deploy Storage Groups (create) N Y N N Cluster Services (daemons) OSD only Y N(?) N Cluster Settings (ops flags) Y N Y N Cluster Settings (parameters) Y N View N Cluster Settings (CRUSH map/rrules) N Partial View N Cluster Settings (EC Profiles) N Y Y N OSD (start/stop/in/out) Partial Y Y N Pools (Replicated) Y (limited) Y Y N Pools (EC &Tiering) N Y Partial N RBDs N Partial N N S3/Swift Users/Buckets N N Y N Link to OpenStack Nova N Y N N
  10. 10. CALAMARI
  11. 11. VIRTUALSTORAGE MANAGER
  12. 12. CEPH-DASH
  13. 13. INKSCOPE
  14. 14. NOW THAT WE HAVE VISIBILITY…. What are we looking for?
  15. 15. WHAT WE WANT ✓ Monitor Quorum ✓ Working OSDs ✓ Happy Placement Groups
  16. 16. OSDWORKBENCH
  17. 17. PGSTATES ✓ I’m Making Things Right ๏ I (may) Need Help All Good - Bring Data
  18. 18. THE‘NOGUI’OPTION ceph@m01:~$ ceph osd tree # id weight type nameup/down reweight -1 213.6 root default -2 43.44 host s01 9 3.62 osd.9 down 0 10 3.62 osd.10 down 0 0 3.62 osd.0 down 0 5 3.62 osd.5 down 0 1 3.62 osd.1 down 0 7 3.62 osd.7 down 0 3 3.62 osd.3 down 0 4 3.62 osd.4 down 0 2 3.62 osd.2 down 0 11 3.62 osd.11 down 0 -4 43.44 host s03 24 3.62 osd.24 up 1 25 3.62 osd.25 up 1 26 3.62 osd.26 up 1 27 3.62 osd.27 up 1 28 3.62 osd.28 up 1 29 3.62 osd.29 up 1 30 3.62 osd.30 up 1 31 3.62 osd.31 up 1 32 3.62 osd.32 up 1 33 3.62 osd.33 up 1 ceph osd tree ceph health detail HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down ... pg 0.5 is down+peering pg 1.4 is down+peering ... osd.1 is down since epoch 69, last address 192.168.106.220:6801/865 ceph health ceph pg dump_stuck stale ceph pg dump_stuck inactive ceph pg dump_stuck unclean ceph pg dump_stuck
  19. 19. THE‘NOGUI’OPTION pg_stat objects bytes status up up_pri 10.f8 3042 25474961408 active+remapped [58,2147483647,24,20,55,59,2147483647,27] 58 10.7fa 3029 25375584256 active+remapped [51,20,60,28,2147483647,61,2147483647,11] 51 10.716 2990 25052532736 inactive [9,44,10,55,24,2147483647,47,2147483647] 9 ceph pg dump_stuck
  20. 20. VSM: TROUBLESHOOTING Repeated'auto*out'or'inability' to'restart'auto*out'OSD' suggests'failed'or'failing'disk' VSM'periodically'probes' drive'path'–'missing'drive' path'missing'indicates' complete'disk'failure' A'set'of'auto*out'OSDs' that'share'the'same' journal'SSD'suggests'failed' or'failing'journal'SSD' VSM'periodically'probes'drive' path'–'missing'drive'path' indicates'complete'disk'(or' controller)'failure'
  21. 21. VSM: TROUBLESHOOTING Restore'OSDs Managing& Storage&Devices& Restore&OSDs& Wait& Select& Sort& Confirm& Verify& (may&need&to& sort&again)&
  22. 22. When it’s time to go deep… /var/log/ceph/ceph.log /var/log/ceph/ceph-mon.[host].log /var/log/ceph/ceph-osd.[xx].log ceph tell osd.[xx] injectargs --debug-osd 0/5
  23. 23. REMINDER ABOUT CLUSTERS Clusters rarely do things instantly. Clusters can be like a Flock of Sheep - it starts to move in the right directly slowly and then picks up speed (don’t run it off a cliff)
  24. 24. VSM:FASTDEPLOY Create&Cluster Ge#ng&Started& Create&new&Ceph& cluster& All&servers& present& Correct&subnets& and&IP&addresses& Correct&number&of& disks&iden>fied& At&least&three&monitors& &&odd&number&of& monitors& Servers&located& in&correct&zone& Servers& responsive& One&Zone&with&serverDlevel& replica>on&
  25. 25. VSM:FASTDEPLOY Create&Cluster Step%1% Step%2:%Confirm%
  26. 26. VSM:FASTDEPLOY Create&Cluster&*&Status&Sequence Ge#ng&Started&
  27. 27. Remember to…
  28. 28. IN TIME YOUR CLUSTER WILL LEARN TO FOLLOW YOU
  29. 29. The 3 Keys to Ceph in Production Happy PGs Happy Monitors Happy OSDs
  30. 30. thank you! Paul Evans principal architect Paul at Daystrom dot com technology grouptechnology group san francisco ceph days
  • OlivierOlejniczak

    Jan. 28, 2018
  • aki9

    Apr. 2, 2017
  • ssuserdcac2a

    Sep. 17, 2016
  • LenzGr

    May. 10, 2016

Living with a Cephalopod: Daily Care & Feeding of Ceph Storage - why Ceph is the lowest cost per gig storage solution, and how easy it is to deploy your own Ceph cluster!

Views

Total views

3,057

On Slideshare

0

From embeds

0

Number of embeds

32

Actions

Downloads

116

Shares

0

Comments

0

Likes

4

×