SlideShare a Scribd company logo
1 of 12
Download to read offline
OCS Recovery PoC
Senior Technical Account Manager
Jooho Lee
This document explains how to recover OCS when you lost one of OCS nodes.
Absolutely, this doc is for PoC so it is not supported by Red Hat.
Test Environment: 4
Test Scenario: 4
Step by step: 5
Remove one of OCS nodes(worker-0) 5
Stop the node (worker-0) 5
Detach/Remove ocs volumes 5
Delete the node: 5
Create a new server(worker-5) 6
Find ocs pod on worker-0 then scale down mon-0/crashcollector 6
Create a node (worker-5) 6
Add volume to the new node via OpenStack console 6
Apply infra label 6
Recover Local Volume 7
Delete pv/pvc that was attached to the old node 7
Update localvolume 7
Recover OCS 8
Apply storage label 8
Create a pvc for rook-ceph-mon-c 8
Deploy mon-c 8
Deploy rook toolbox to remove the old osd 8
Delete deployment OSD-0/rook-ceph-crashcollector 9
Verify OSD status 10
Appendix A. Why does a new server use a different hostname? 11
Appendix B. rook-toolbox.yaml 11
Reference 12
Test Environment:
- OpenStack 14
- OpenShift 4.3.28
- 3 Master nodes
- 3 Infra nodes
- 2 Worker nodes
- OpenShift Container Storage 4.3.0
- Local Volume 4.5.0
- 4 filesystem(worker-0/1/2/3)
- /dev/vdb
- 4 block(worker-0/1/2/3)
- /dev/vdc
Test Scenario:
- Remove one of OCS nodes(worker-0)
- Shutdown the OCS node (worker-0)
- remove worker-0 vm(instance)
- remove volumes for OCS node(worker-0) ​ ⇐ different thing
- remove worker-0 from openshift
- Create a new server(worker-5)
- Use other server name and hostname because of ​this known issue
- Apply infra MCP
- Recover Local Volume
- Remove the deleted node from LocalVolume object
- Add a new node to LocalVolume object
- Recover OCS
- Remove PV/PVC/OSD/Crashcollector that are related with the deleted node.
- Wait for the operator to add new objects for the new OCS node.
Step by step:
Remove one of OCS nodes(worker-0)
This step explains how I remove one of OCS nodes permanently.
Stop the node (worker-0)
Detach/Remove ocs volumes
Delete the node:
1. from load balancer(haproxy)
a. for ingress endpoint of openshift, worker-0 should be removed
b. actually, worker-5 has to be added after it is created but for testing purposes, I
just add it now.
2. from dns
a. for upstream DNS, worker-0 record is not need anymore so like load balancer, I
remove worker-0 but add worker-5
3. from openstack
a. delete the instance
4. from openshift
oc delete node worker-0.telus.tamlab.brq.redhat.com
Create a new server(worker-5)
Before you create a new vm(instance), you have to do the following first. If not, you hit this
error[1]
I0715 03:56:11.808818 450992 update.go:92] error when evicting pod
"rook-ceph-mon-a-bcfc499c5-bm4lz" (will retry after 5s): Cannot evict pod as it would violate
the pod's disruption budget.
Find ocs pod on worker-0 then scale down mon-0/crashcollector
# Check which mon and osd pods were running on the deleted node
oc get pod -o wide|grep worker-0
#Scale down the mon/osd pod that found above
oc scale deployment rook-ceph-mon-c --replicas=0 -n openshift-storage
oc scale deployment rook-ceph-osd-0 --replicas=0 -n openshift-storage
oc scale deployment
--selector=app=rook-ceph-crashcollector,node_name=worker-0.telus.tamlab.brq.redhat.com
--replicas=0 -n openshift-storage
Create a node (worker-5)
- update haproxy/dns
- This is already done in the ​Delete the node step
- create a new node
- approve csr
oc get csr -o json | jq -r '.items[] | select(.status == {}) | .metadata.name' | xargs oc
adm certificate approve
Add volume to the new node via OpenStack console
Apply infra label
oc label node worker-5.telus.tamlab.brq.redhat.com node-role.kubernetes.io/infra=""
oc label node worker-5.telus.tamlab.brq.redhat.com node-role.kubernetes.io/worker-
(Tip)If a new worker node is not up with infra mcp, check machine-config-daemon
oc get pod -o wide -n openshift-machine-config-operator |grep worker-0
oc logs ​machine-config-daemon-XXX​ -c machine-config-daemon -n
openshift-machine-config-operator
Recover Local Volume
The localvolume object needed to be updated because worker-1 was deleted and worker-5
added for localvolume.
Delete pv/pvc that was attached to the old node
Before you update localvolume, you need to delete pv/pvc that was related with worker-1
# Backup and delete
oc get pvc rook-ceph-mon-c -o yaml -n openshift-storage > mon-c.yaml
oc get pvc ocs-deviceset-0-0-494jh -o yaml -n openshift-storage > ocs-deviceset-0.yaml
oc delete pvc rook-ceph-mon-d rook-ceph-mon-c ocs-deviceset-0-0-494jh
oc delete pv local-pv-12cc2ec4 local-pv-74c2a064 local-pv-85537348 local-pv-addebda5
Update localvolume
Remove worker-1 node and add worker-5 node
oc edit localvolume local-file -n local-storage
oc edit localvolume local-block -n local-storage
...
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1.telus.tamlab.brq.redhat.com
- worker-2.telus.tamlab.brq.redhat.com
- worker-3.telus.tamlab.brq.redhat.com
- worker-5.telus.tamlab.brq.redhat.com
Recover OCS
Apply storage label
oc label nodes worker-5.telus.tamlab.brq.redhat.com
cluster.ocs.openshift.io/openshift-storage=''
Create a pvc for rook-ceph-mon-c
New localvolume pv are created so you can create pvc for mon-c.
oc create -f mon-c.yaml
Deploy mon-c
oc scale deployment rook-ceph-mon-c --replicas=1 -n openshift-storage
(Tip) If rook-ceph-mon-d-canary deployment is created, you can delete it because we don't lost
rook-ceph-mon-c.
oc delete deploy rook-ceph-mon-d-canary
oc delete pv rook-ceph-mon-d
Deploy rook toolbox to remove the old osd
Manual script to remove problematic osd from ​here
#Deploy rook toolbox to use ceph command
oc create -f rook-toolbox.yaml ​(check ​here​)
oc rsh rook-toolbox-XXXX
...
ceph status
...
osd.0 down
…
# Manual script to remove problematic osd.
cat osd_clean_job.sh
~~~
FAILED_OSD_ID=0 ​# This id should be updated depending on situation
HOST_TO_REMOVE=$(ceph osd find osd.${FAILED_OSD_ID} | grep "host" | tail -n 1 | awk
'{print $2}' | cut -d'"' -f 2)
osd_status=$(ceph osd tree | grep "osd.${FAILED_OSD_ID} " | awk '{print $5}')
if [[ "$osd_status" == "up" ]]; then
echo "OSD ${FAILED_OSD_ID} is up and running."
echo "Please check if you entered correct ID of failed osd!"
else
echo "OSD ${FAILED_OSD_ID} is down. Proceeding to mark out and purge"
ceph osd out osd.${FAILED_OSD_ID}
ceph osd purge osd.${FAILED_OSD_ID} --force --yes-i-really-mean-it
echo "Attempting to remove the parent host. Errors can be ignored if there are other OSDs
on the same host"
ceph osd crush rm $HOST_TO_REMOVE
fi
./osd_clean_job.sh
Delete deployment OSD-0/rook-ceph-crashcollector
With the above step, you delete OSD from the cluster but the operator will create
pvc/deployment for a new OSD automatically.
oc delete deployment rook-ceph-osd-0 -n openshift-storage
oc delete deployment
--selector=app=rook-ceph-crashcollector,node_name=worker-0.telus.tamlab.brq.redhat.com
-n openshift-storage
oc get -n openshift-storage pod -l app=rook-ceph-operator
oc delete -n openshift-storage pod rook-ceph-operator-XXXX
All steps are done so now, what you should do is waiting.
Verify OSD status
# Inside rook-toolbox
oc rsh rook-toolbox-XXXX
sh-4.2$ ceph status
cluster:
id: d579609e-7440-4432-b6b4-b79173bf7a93
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 47m)
mgr: a(active, since 17m)
mds: ocs-storagecluster-cephfilesystem:1
{0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
osd: 3 osds: 3 up (since 13m), 3 in (since 25m)
rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)
task status:
data:
pools: 10 pools, 192 pgs
objects: 437 objects, 332 MiB
usage: 3.7 GiB used, 83 GiB / 87 GiB avail
pgs: 192 active+clean
io:
client: 5.9 KiB/s rd, 6.0 KiB/s wr, 7 op/s rd, 3 op/s wr
Appendix A. Why does a new server use a different
hostname?
8.3. OpenShift Container Storage deployed using local storage devices
IMPORTANT
While replacing a node, the hostname of the new Openshift Container Storage node should not be the
same as the hostname of any decommissioned Openshift Container Storage node due to a known issue.
As a workaround, we recommend to use a new hostname for adding the replaced node back into the
cluster.
Appendix B. rook-toolbox.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rook-ceph-tools
labels:
app: rook-ceph-tools
spec:
replicas: 1
selector:
matchLabels:
app: rook-ceph-tools
template:
metadata:
labels:
app: rook-ceph-tools
spec:
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: rook-ceph-tools
image: rook/ceph:v1.3.7
command: ["/tini"]
args: ["-g", "--", "/usr/local/bin/toolbox.sh"]
imagePullPolicy: IfNotPresent
env:
- name: ROOK_ADMIN_SECRET
valueFrom:
secretKeyRef:
name: rook-ceph-mon
key: admin-secret
volumeMounts:
- mountPath: /etc/ceph
name: ceph-config
- name: mon-endpoint-volume
mountPath: /etc/rook
volumes:
- name: mon-endpoint-volume
configMap:
name: rook-ceph-mon-endpoints
items:
- key: data
path: mon-endpoints
- name: ceph-config
emptyDir: {}
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 5
Reference
- https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.3/ht
ml-single/deploying_openshift_container_storage/index
- https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.3/ht
ml-single/managing_openshift_container_storage/index#replacing-storage-nodes-for-openshift-
container-storage_rhocs
- https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.4/ht
ml-single/managing_openshift_container_storage/index#openshift_container_storage_deploye
d_using_local_storage_devices

More Related Content

Recently uploaded

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 

Recently uploaded (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

OpenShift Container Storage on OCP4 Recovery PoC

  • 1. OCS Recovery PoC Senior Technical Account Manager Jooho Lee
  • 2. This document explains how to recover OCS when you lost one of OCS nodes. Absolutely, this doc is for PoC so it is not supported by Red Hat.
  • 3. Test Environment: 4 Test Scenario: 4 Step by step: 5 Remove one of OCS nodes(worker-0) 5 Stop the node (worker-0) 5 Detach/Remove ocs volumes 5 Delete the node: 5 Create a new server(worker-5) 6 Find ocs pod on worker-0 then scale down mon-0/crashcollector 6 Create a node (worker-5) 6 Add volume to the new node via OpenStack console 6 Apply infra label 6 Recover Local Volume 7 Delete pv/pvc that was attached to the old node 7 Update localvolume 7 Recover OCS 8 Apply storage label 8 Create a pvc for rook-ceph-mon-c 8 Deploy mon-c 8 Deploy rook toolbox to remove the old osd 8 Delete deployment OSD-0/rook-ceph-crashcollector 9 Verify OSD status 10 Appendix A. Why does a new server use a different hostname? 11 Appendix B. rook-toolbox.yaml 11 Reference 12
  • 4. Test Environment: - OpenStack 14 - OpenShift 4.3.28 - 3 Master nodes - 3 Infra nodes - 2 Worker nodes - OpenShift Container Storage 4.3.0 - Local Volume 4.5.0 - 4 filesystem(worker-0/1/2/3) - /dev/vdb - 4 block(worker-0/1/2/3) - /dev/vdc Test Scenario: - Remove one of OCS nodes(worker-0) - Shutdown the OCS node (worker-0) - remove worker-0 vm(instance) - remove volumes for OCS node(worker-0) ​ ⇐ different thing - remove worker-0 from openshift - Create a new server(worker-5) - Use other server name and hostname because of ​this known issue - Apply infra MCP - Recover Local Volume - Remove the deleted node from LocalVolume object - Add a new node to LocalVolume object - Recover OCS - Remove PV/PVC/OSD/Crashcollector that are related with the deleted node. - Wait for the operator to add new objects for the new OCS node.
  • 5. Step by step: Remove one of OCS nodes(worker-0) This step explains how I remove one of OCS nodes permanently. Stop the node (worker-0) Detach/Remove ocs volumes Delete the node: 1. from load balancer(haproxy) a. for ingress endpoint of openshift, worker-0 should be removed b. actually, worker-5 has to be added after it is created but for testing purposes, I just add it now. 2. from dns a. for upstream DNS, worker-0 record is not need anymore so like load balancer, I remove worker-0 but add worker-5 3. from openstack a. delete the instance 4. from openshift oc delete node worker-0.telus.tamlab.brq.redhat.com
  • 6. Create a new server(worker-5) Before you create a new vm(instance), you have to do the following first. If not, you hit this error[1] I0715 03:56:11.808818 450992 update.go:92] error when evicting pod "rook-ceph-mon-a-bcfc499c5-bm4lz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. Find ocs pod on worker-0 then scale down mon-0/crashcollector # Check which mon and osd pods were running on the deleted node oc get pod -o wide|grep worker-0 #Scale down the mon/osd pod that found above oc scale deployment rook-ceph-mon-c --replicas=0 -n openshift-storage oc scale deployment rook-ceph-osd-0 --replicas=0 -n openshift-storage oc scale deployment --selector=app=rook-ceph-crashcollector,node_name=worker-0.telus.tamlab.brq.redhat.com --replicas=0 -n openshift-storage Create a node (worker-5) - update haproxy/dns - This is already done in the ​Delete the node step - create a new node - approve csr oc get csr -o json | jq -r '.items[] | select(.status == {}) | .metadata.name' | xargs oc adm certificate approve Add volume to the new node via OpenStack console Apply infra label oc label node worker-5.telus.tamlab.brq.redhat.com node-role.kubernetes.io/infra="" oc label node worker-5.telus.tamlab.brq.redhat.com node-role.kubernetes.io/worker-
  • 7. (Tip)If a new worker node is not up with infra mcp, check machine-config-daemon oc get pod -o wide -n openshift-machine-config-operator |grep worker-0 oc logs ​machine-config-daemon-XXX​ -c machine-config-daemon -n openshift-machine-config-operator Recover Local Volume The localvolume object needed to be updated because worker-1 was deleted and worker-5 added for localvolume. Delete pv/pvc that was attached to the old node Before you update localvolume, you need to delete pv/pvc that was related with worker-1 # Backup and delete oc get pvc rook-ceph-mon-c -o yaml -n openshift-storage > mon-c.yaml oc get pvc ocs-deviceset-0-0-494jh -o yaml -n openshift-storage > ocs-deviceset-0.yaml oc delete pvc rook-ceph-mon-d rook-ceph-mon-c ocs-deviceset-0-0-494jh oc delete pv local-pv-12cc2ec4 local-pv-74c2a064 local-pv-85537348 local-pv-addebda5 Update localvolume Remove worker-1 node and add worker-5 node oc edit localvolume local-file -n local-storage oc edit localvolume local-block -n local-storage ... nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-1.telus.tamlab.brq.redhat.com - worker-2.telus.tamlab.brq.redhat.com - worker-3.telus.tamlab.brq.redhat.com - worker-5.telus.tamlab.brq.redhat.com
  • 8. Recover OCS Apply storage label oc label nodes worker-5.telus.tamlab.brq.redhat.com cluster.ocs.openshift.io/openshift-storage='' Create a pvc for rook-ceph-mon-c New localvolume pv are created so you can create pvc for mon-c. oc create -f mon-c.yaml Deploy mon-c oc scale deployment rook-ceph-mon-c --replicas=1 -n openshift-storage (Tip) If rook-ceph-mon-d-canary deployment is created, you can delete it because we don't lost rook-ceph-mon-c. oc delete deploy rook-ceph-mon-d-canary oc delete pv rook-ceph-mon-d Deploy rook toolbox to remove the old osd Manual script to remove problematic osd from ​here #Deploy rook toolbox to use ceph command oc create -f rook-toolbox.yaml ​(check ​here​) oc rsh rook-toolbox-XXXX ... ceph status ... osd.0 down …
  • 9. # Manual script to remove problematic osd. cat osd_clean_job.sh ~~~ FAILED_OSD_ID=0 ​# This id should be updated depending on situation HOST_TO_REMOVE=$(ceph osd find osd.${FAILED_OSD_ID} | grep "host" | tail -n 1 | awk '{print $2}' | cut -d'"' -f 2) osd_status=$(ceph osd tree | grep "osd.${FAILED_OSD_ID} " | awk '{print $5}') if [[ "$osd_status" == "up" ]]; then echo "OSD ${FAILED_OSD_ID} is up and running." echo "Please check if you entered correct ID of failed osd!" else echo "OSD ${FAILED_OSD_ID} is down. Proceeding to mark out and purge" ceph osd out osd.${FAILED_OSD_ID} ceph osd purge osd.${FAILED_OSD_ID} --force --yes-i-really-mean-it echo "Attempting to remove the parent host. Errors can be ignored if there are other OSDs on the same host" ceph osd crush rm $HOST_TO_REMOVE fi ./osd_clean_job.sh Delete deployment OSD-0/rook-ceph-crashcollector With the above step, you delete OSD from the cluster but the operator will create pvc/deployment for a new OSD automatically. oc delete deployment rook-ceph-osd-0 -n openshift-storage oc delete deployment --selector=app=rook-ceph-crashcollector,node_name=worker-0.telus.tamlab.brq.redhat.com -n openshift-storage oc get -n openshift-storage pod -l app=rook-ceph-operator oc delete -n openshift-storage pod rook-ceph-operator-XXXX All steps are done so now, what you should do is waiting.
  • 10. Verify OSD status # Inside rook-toolbox oc rsh rook-toolbox-XXXX sh-4.2$ ceph status cluster: id: d579609e-7440-4432-b6b4-b79173bf7a93 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 47m) mgr: a(active, since 17m) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 13m), 3 in (since 25m) rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) task status: data: pools: 10 pools, 192 pgs objects: 437 objects, 332 MiB usage: 3.7 GiB used, 83 GiB / 87 GiB avail pgs: 192 active+clean io: client: 5.9 KiB/s rd, 6.0 KiB/s wr, 7 op/s rd, 3 op/s wr
  • 11. Appendix A. Why does a new server use a different hostname? 8.3. OpenShift Container Storage deployed using local storage devices IMPORTANT While replacing a node, the hostname of the new Openshift Container Storage node should not be the same as the hostname of any decommissioned Openshift Container Storage node due to a known issue. As a workaround, we recommend to use a new hostname for adding the replaced node back into the cluster. Appendix B. rook-toolbox.yaml apiVersion: apps/v1 kind: Deployment metadata: name: rook-ceph-tools labels: app: rook-ceph-tools spec: replicas: 1 selector: matchLabels: app: rook-ceph-tools template: metadata: labels: app: rook-ceph-tools spec: dnsPolicy: ClusterFirstWithHostNet containers: - name: rook-ceph-tools image: rook/ceph:v1.3.7 command: ["/tini"] args: ["-g", "--", "/usr/local/bin/toolbox.sh"] imagePullPolicy: IfNotPresent env: - name: ROOK_ADMIN_SECRET valueFrom: secretKeyRef: name: rook-ceph-mon
  • 12. key: admin-secret volumeMounts: - mountPath: /etc/ceph name: ceph-config - name: mon-endpoint-volume mountPath: /etc/rook volumes: - name: mon-endpoint-volume configMap: name: rook-ceph-mon-endpoints items: - key: data path: mon-endpoints - name: ceph-config emptyDir: {} tolerations: - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 5 Reference - https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.3/ht ml-single/deploying_openshift_container_storage/index - https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.3/ht ml-single/managing_openshift_container_storage/index#replacing-storage-nodes-for-openshift- container-storage_rhocs - https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.4/ht ml-single/managing_openshift_container_storage/index#openshift_container_storage_deploye d_using_local_storage_devices