OCS changes the underneath technology from glusterfs to ceph. Moreover, it will be fully managed by Operator, which means when it has a disaster situation such as losing one of OSD, you should know the way how to recover it using Operator.
This is the doc how to recover your OCS when you lose one of nodes that OSD was running.
2. This document explains how to recover OCS when you lost one of OCS nodes.
Absolutely, this doc is for PoC so it is not supported by Red Hat.
3. Test Environment: 4
Test Scenario: 4
Step by step: 5
Remove one of OCS nodes(worker-0) 5
Stop the node (worker-0) 5
Detach/Remove ocs volumes 5
Delete the node: 5
Create a new server(worker-5) 6
Find ocs pod on worker-0 then scale down mon-0/crashcollector 6
Create a node (worker-5) 6
Add volume to the new node via OpenStack console 6
Apply infra label 6
Recover Local Volume 7
Delete pv/pvc that was attached to the old node 7
Update localvolume 7
Recover OCS 8
Apply storage label 8
Create a pvc for rook-ceph-mon-c 8
Deploy mon-c 8
Deploy rook toolbox to remove the old osd 8
Delete deployment OSD-0/rook-ceph-crashcollector 9
Verify OSD status 10
Appendix A. Why does a new server use a different hostname? 11
Appendix B. rook-toolbox.yaml 11
Reference 12
4. Test Environment:
- OpenStack 14
- OpenShift 4.3.28
- 3 Master nodes
- 3 Infra nodes
- 2 Worker nodes
- OpenShift Container Storage 4.3.0
- Local Volume 4.5.0
- 4 filesystem(worker-0/1/2/3)
- /dev/vdb
- 4 block(worker-0/1/2/3)
- /dev/vdc
Test Scenario:
- Remove one of OCS nodes(worker-0)
- Shutdown the OCS node (worker-0)
- remove worker-0 vm(instance)
- remove volumes for OCS node(worker-0) ⇐ different thing
- remove worker-0 from openshift
- Create a new server(worker-5)
- Use other server name and hostname because of this known issue
- Apply infra MCP
- Recover Local Volume
- Remove the deleted node from LocalVolume object
- Add a new node to LocalVolume object
- Recover OCS
- Remove PV/PVC/OSD/Crashcollector that are related with the deleted node.
- Wait for the operator to add new objects for the new OCS node.
5. Step by step:
Remove one of OCS nodes(worker-0)
This step explains how I remove one of OCS nodes permanently.
Stop the node (worker-0)
Detach/Remove ocs volumes
Delete the node:
1. from load balancer(haproxy)
a. for ingress endpoint of openshift, worker-0 should be removed
b. actually, worker-5 has to be added after it is created but for testing purposes, I
just add it now.
2. from dns
a. for upstream DNS, worker-0 record is not need anymore so like load balancer, I
remove worker-0 but add worker-5
3. from openstack
a. delete the instance
4. from openshift
oc delete node worker-0.telus.tamlab.brq.redhat.com
6. Create a new server(worker-5)
Before you create a new vm(instance), you have to do the following first. If not, you hit this
error[1]
I0715 03:56:11.808818 450992 update.go:92] error when evicting pod
"rook-ceph-mon-a-bcfc499c5-bm4lz" (will retry after 5s): Cannot evict pod as it would violate
the pod's disruption budget.
Find ocs pod on worker-0 then scale down mon-0/crashcollector
# Check which mon and osd pods were running on the deleted node
oc get pod -o wide|grep worker-0
#Scale down the mon/osd pod that found above
oc scale deployment rook-ceph-mon-c --replicas=0 -n openshift-storage
oc scale deployment rook-ceph-osd-0 --replicas=0 -n openshift-storage
oc scale deployment
--selector=app=rook-ceph-crashcollector,node_name=worker-0.telus.tamlab.brq.redhat.com
--replicas=0 -n openshift-storage
Create a node (worker-5)
- update haproxy/dns
- This is already done in the Delete the node step
- create a new node
- approve csr
oc get csr -o json | jq -r '.items[] | select(.status == {}) | .metadata.name' | xargs oc
adm certificate approve
Add volume to the new node via OpenStack console
Apply infra label
oc label node worker-5.telus.tamlab.brq.redhat.com node-role.kubernetes.io/infra=""
oc label node worker-5.telus.tamlab.brq.redhat.com node-role.kubernetes.io/worker-
7. (Tip)If a new worker node is not up with infra mcp, check machine-config-daemon
oc get pod -o wide -n openshift-machine-config-operator |grep worker-0
oc logs machine-config-daemon-XXX -c machine-config-daemon -n
openshift-machine-config-operator
Recover Local Volume
The localvolume object needed to be updated because worker-1 was deleted and worker-5
added for localvolume.
Delete pv/pvc that was attached to the old node
Before you update localvolume, you need to delete pv/pvc that was related with worker-1
# Backup and delete
oc get pvc rook-ceph-mon-c -o yaml -n openshift-storage > mon-c.yaml
oc get pvc ocs-deviceset-0-0-494jh -o yaml -n openshift-storage > ocs-deviceset-0.yaml
oc delete pvc rook-ceph-mon-d rook-ceph-mon-c ocs-deviceset-0-0-494jh
oc delete pv local-pv-12cc2ec4 local-pv-74c2a064 local-pv-85537348 local-pv-addebda5
Update localvolume
Remove worker-1 node and add worker-5 node
oc edit localvolume local-file -n local-storage
oc edit localvolume local-block -n local-storage
...
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1.telus.tamlab.brq.redhat.com
- worker-2.telus.tamlab.brq.redhat.com
- worker-3.telus.tamlab.brq.redhat.com
- worker-5.telus.tamlab.brq.redhat.com
8. Recover OCS
Apply storage label
oc label nodes worker-5.telus.tamlab.brq.redhat.com
cluster.ocs.openshift.io/openshift-storage=''
Create a pvc for rook-ceph-mon-c
New localvolume pv are created so you can create pvc for mon-c.
oc create -f mon-c.yaml
Deploy mon-c
oc scale deployment rook-ceph-mon-c --replicas=1 -n openshift-storage
(Tip) If rook-ceph-mon-d-canary deployment is created, you can delete it because we don't lost
rook-ceph-mon-c.
oc delete deploy rook-ceph-mon-d-canary
oc delete pv rook-ceph-mon-d
Deploy rook toolbox to remove the old osd
Manual script to remove problematic osd from here
#Deploy rook toolbox to use ceph command
oc create -f rook-toolbox.yaml (check here)
oc rsh rook-toolbox-XXXX
...
ceph status
...
osd.0 down
…
9. # Manual script to remove problematic osd.
cat osd_clean_job.sh
~~~
FAILED_OSD_ID=0 # This id should be updated depending on situation
HOST_TO_REMOVE=$(ceph osd find osd.${FAILED_OSD_ID} | grep "host" | tail -n 1 | awk
'{print $2}' | cut -d'"' -f 2)
osd_status=$(ceph osd tree | grep "osd.${FAILED_OSD_ID} " | awk '{print $5}')
if [[ "$osd_status" == "up" ]]; then
echo "OSD ${FAILED_OSD_ID} is up and running."
echo "Please check if you entered correct ID of failed osd!"
else
echo "OSD ${FAILED_OSD_ID} is down. Proceeding to mark out and purge"
ceph osd out osd.${FAILED_OSD_ID}
ceph osd purge osd.${FAILED_OSD_ID} --force --yes-i-really-mean-it
echo "Attempting to remove the parent host. Errors can be ignored if there are other OSDs
on the same host"
ceph osd crush rm $HOST_TO_REMOVE
fi
./osd_clean_job.sh
Delete deployment OSD-0/rook-ceph-crashcollector
With the above step, you delete OSD from the cluster but the operator will create
pvc/deployment for a new OSD automatically.
oc delete deployment rook-ceph-osd-0 -n openshift-storage
oc delete deployment
--selector=app=rook-ceph-crashcollector,node_name=worker-0.telus.tamlab.brq.redhat.com
-n openshift-storage
oc get -n openshift-storage pod -l app=rook-ceph-operator
oc delete -n openshift-storage pod rook-ceph-operator-XXXX
All steps are done so now, what you should do is waiting.
11. Appendix A. Why does a new server use a different
hostname?
8.3. OpenShift Container Storage deployed using local storage devices
IMPORTANT
While replacing a node, the hostname of the new Openshift Container Storage node should not be the
same as the hostname of any decommissioned Openshift Container Storage node due to a known issue.
As a workaround, we recommend to use a new hostname for adding the replaced node back into the
cluster.
Appendix B. rook-toolbox.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rook-ceph-tools
labels:
app: rook-ceph-tools
spec:
replicas: 1
selector:
matchLabels:
app: rook-ceph-tools
template:
metadata:
labels:
app: rook-ceph-tools
spec:
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: rook-ceph-tools
image: rook/ceph:v1.3.7
command: ["/tini"]
args: ["-g", "--", "/usr/local/bin/toolbox.sh"]
imagePullPolicy: IfNotPresent
env:
- name: ROOK_ADMIN_SECRET
valueFrom:
secretKeyRef:
name: rook-ceph-mon