Pacemaker 操作方法メモ

ドキュメント
Clusters from Scratch Step-by-Step Instructions for Building Your First High-Availability Cluster
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html
•
http://clusterlabs.org/doc/Cluster_from_Scratch.pdf
Configuration Explained An A-Z guide to Pacemaker's Configuration Options
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html
•
Pacemaker HA 環境のリソース制御 (Start / Stop / Montor) 部分。リソース監視/制御を実施する
単体ではクラスタソフトとしては動作せず、クラスタ制御 (ノード監視等の死活監視) 用のコンポーネント (Corosync / Heartbeat) と連携を行う必要がある
Corosync HA 環境のクラスタ制御部分
Corosync は Heartbeat の後継ソフト
Corosync でノード停止を検知した後に、Pacemaker がリソースの操作 (Promote / Demoto / Stop 等) を実施する
リソース HA で制御をする必要がある対象 (Pacemaker の操作対象となるもの)
リソースエージェント (RA) リソースと Pacemaker を連携させるためのエージェント
Pacemaker はリソースエージェントに対して命令を実施することで、リソースの操作を行う
OCF (Open Cluster Framework) の仕様に従うことで、独自にシェルスクリプトで RA を作成することも可能
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/ap-ocf.html
watchdog メインプロセスに障害があった場合に、OS を再起動させることができる監視機構 (have-watchdog の設定)
STONITH Shoot The Other Node In The Head
スプリットブレインを防ぐための機構で、ノード間の通信に異常が発生した場合に、強制的に対向のノードを再起動 (フェンシング) することで、
両ノードがマスターになる (リソースを同時にアクセスする) ことを防ぐ
/usr/lib/stonith/plugins/external にプラグインのスクリプトが格納されている
フェンシングスプリットブレインを防ぐために、強制的に対向のノードを再起動する動作
一般的には、ハードウェアの IPMI と連動して、停止 / 再起動の制御を実施している
両ノードが対向ノードを同時に落とす (落としあい/相打ち) ことを防ぐために、プラグインを使用することができる
stonith-wrapper
https://ja.osdn.net/projects/linux-ha/wiki/stonith-wrapper
STONITH (ストニス)
https://docs.microsoft.com/ja-jp/azure/virtual-machines/linux/classic/mysql-cluster1.
負荷分散セットを使用して Linux の MySQL をクラスター化する•
PacemakerのMaster/Slave構成の基本と事例紹介(DRBD、PostgreSQLレプリケーション) @Open Source Conference 2014
https://www.slideshare.net/tatsuyaw/pacemaker-osc2014tokyo-31882542
•
第4章フェンシング
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/high_availability_add-on_overview/ch-fencing
•
HAクラスタをフェイルオーバ失敗から救おう
https://ja.osdn.net/projects/linux-ha/docs/Pacemaker_OSC2013Kyoto_20130803/ja/1/Pacemaker_OSC2013Kyoto_20130803.pdf
•
試して覚えるPacemaker入門排他制御編
http://linux-ha.osdn.jp/wp/wp-content/uploads/076783ca53a363270d253bbb98b59e83.pdf
•
VIPcheckリソースエージェント
https://ja.osdn.net/projects/linux-ha/wiki/VIPcheck
•
Pacemaker + Corosyncでのクラスタ環境の構築 [CentOS]
http://dan-project.blog.so-net.ne.jp/2016-05-09
•
グローバルクラスタオプション
https://www.suse.com/ja-jp/documentation/sle_ha/book_sleha/data/sec_ha_config_basics_global.html
•
2台でHA構築 (CentOS6.9) 書きかけ
https://qiita.com/tukiyo3/items/162e131007365fc4fe80
•
Fencing and Stonith
http://clusterlabs.org/doc/crm_fencing.html
•
Corosync の quorum の設定
https://qiita.com/ngyuki/items/f8111de17b470b5509c7
•
STONITHプラグイン「external/ssh」でシステムを自動的に再起動してみる。
http://labunix.hateblo.jp/entry/20130722/1374496401
•
クラウド環境での STONITH
STONITH を使用した SUSE での高可用性のセットアップ
https://docs.microsoft.com/ja-jp/azure/virtual-machines/workloads/sap/ha-setup-with-stonith
•
Azure Virtual Machines (VM) 上の SAP HANA の高可用性 | Microsoft Docs
https://docs.microsoft.com/ja-jp/azure/virtual-machines/workloads/sap/sap-hana-high-availability
•
SAP の場合、SAP on Azure 向けのフェンスエージェント (stonith:fence_azure_arm) が使用されている。
負荷分散セットを使用して Linux の MySQL をクラスター化する
https://docs.microsoft.com/ja-jp/azure/virtual-machines/linux/classic/mysql-cluster
•
外部スクリプトとして作成したフェンスエージェントを使用して、Azure cli からシャットダウンを実施している
https://github.com/bureado/aztonith
EC2でSTONITH
https://bcblog.sios.jp/ec2-stonith/
•
インターコネクト
Pacemaker-1.0 インストール方法 CentOS 5編
http://linux-ha.osdn.jp/wp/archives/4219
•
コンポーネント参考設定サービスログ
pacemaker
(cluster resource manager : クラスタのリソース制御)
https://github.com/ClusterLabs/pacemaker
可用性グループのリソースについては Master / Slave タイプのリ
ソースとして作成する
Corosync の quorum の設定
https://qiita.com/ngyuki/items/f8111de17b470b5509c7
/etc/default/pacemaker
/lib/systemd/system/pacemaker.service
/var/lib/pacemaker/cib/cib.xml
-> pcs cluster cib で確認可能
CIB : Cluster Information Base
/etc/logrotate.d/pacemaker
/var/log/pacemaker.log
Pacemaker
2017年11月1日 9:04
SQL Server on Linux - 1 ページ

ソースとして作成する CIB : Cluster Information Base
corosync
cluster engine daemon and utilities
ノードの死活監視を実施
https://github.com/ClusterLabs/corosync
/usr/sbin/corosync
/usr/sbin/corosync-cfgtool
/usr/sbin/corosync-cmapctl
/usr/sbin/corosync-cpgtool
/usr/sbin/corosync-keygen
/usr/sbin/corosync-quorumtool
/lib/systemd/system/corosync.service
/etc/corosync/corosync.conf
/etc/logrotate.d/corosync
/var/log/corosync/corosync.log
pcs
Pacemaker Configuration System
Pacemaker の制御コマンド (以前の crm コマンド)
https://github.com/ClusterLabs/pcs
/usr/sbin/pcs
(実体は Python のスクリプト)
/etc/default/pcsd
/etc/init.d/pcsd
/etc/logrotate.d/pcsd
/etc/pam.d/pcsd
/lib/systemd/system/pcsd.service
/etc/logrotate.d/pcsd
/var/log/pcsd
fence-agents
Fence Agents for Red Hat Cluster
クラスタ障害発生時のフェンス (遮断) の動作の制御
https://github.com/ClusterLabs/fence-agents
1.3.3. Fencing
https://access.redhat.com/documentation/ja-
jp/red_hat_enterprise_linux/5/html/cluster_suite_overview/s2-
fencing-overview-cso
resource-agents
Cluster Resource Agents
リソースの制御を実施 (Resource Agent : RA)
https://github.com/ClusterLabs/resource-agents
OCF(OCF : Open Cluster Framework) 格納先 /usr/lib/ocf/lib OCFリソースエージェント開発者ガイド
http://linux-ha.osdn.jp/wp/archives/4328
•
SQL Server 向け OCF ocf:mssql:ag - Availability Group resource agent.
ocf:mssql:fci - Failover Cluster Instance resource agent.
pcs resource list で取得
pcs resource describe ocf:mssql:ag
クラスター管理コマンド管理 pcs High Availability Add-On リファレンス•
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/
第1章 Pacemaker を使用した Red Hat High Availability クラスターの作成•
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/ch-startup-haaa>
第3章 pcs コマンドラインインターフェース
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/ch-pcscommand-haar
•
付録B pcs コマンドの使用例
https://access.redhat.com/documentation/ja-JP/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/ap-
configfile-HAAR.html
•
crm Pacemaker 1.1 以前で使用されていたコマンド。次のコマンドでインストーすることが可能。
Pacemakerの制御コマンドを「crm」に戻して運用する (1/3)
http://www.atmarkit.co.jp/ait/articles/1611/10/news005.html
•
apt install -y crmsh
モニタリング crm_mon
cibadmin --query
pcs コマンドクラスターの操作クラスターの破棄 sudo pcs cluster destroy --all
設定の確認状態の確認 pcs status
pcs status --full
pcs status resources
pcs status cluster
設定の確認 pcs config
ノードの操作ノードの停止全ノードの停止 : pcs cluster stop --all
特定ノードの停止 : pcs cluster stopp <ノード名>
ノードの開始全ノードの開始 : pcs cluster start --all
特定ノードの開始 : pcs cluster start <ノード名>
ノードをスタンバイ状態に移行全ノードをスタンバイ状態に移行 : pcs cluster standby --all
特定ノードをスタンバイ状態に移行 : pcs cluster standby <ノード名>
ノードをスタンバイ状態から解除全ノードをスタンバイ状態から解除 : pcs cluster unstandby --all
特定ノードをスタンバイ状態から解除 : pcs cluster unstandby <ノード名>
クラスターの操作クラスターの強制停止 pcs cluster kill
設定のダンプ pcs cluster cib <ファイル名 (～.cib)>
設定のロード pcs cluster cib-push <ファイル名>
ダンプしたファイルの設定確認 / 操作 pcs -f <ファイル名> <コマンド>
pcs -f cluster.cib config
リソースのフェールオーバー履歴のクリアすべてのリソース pcs resource cleanup
特定のリソース
(ノード単位のクリアが現状はない)
pcs resource cleanup ag_cluster-master
リソースの確認 pcs resource list
pcs resource show <リソース名>
STNOITH の設定無効化 sudo pcs property set stonith-enabled=false
有効化 sudo pcs property set stonith-enabled=true
true の場合は STONITHリソースが定義されていなと、Master でないノードの Promotion Score が
-INFINITY となり、リソースの開始が拒否される
STONITH (フェンス) エージェントの確認
第4章フェンス機能: STONITH の設定
https://access.redhat.com/documentation/ja-
jp/red_hat_enterprise_linux/6/html/configuring_the_red_hat_high_availability_add-on_with_pacemaker/ch-fencing-
haar
•
sudo stonith list
ログの確認 journalctl -xe -u pacemaker
tail -f 100 /var/log/syslog
tail -f /var/log/corosync/corosync.log

tail -f /var/log/corosync/corosync.log
クォーラム投票数の確認 pcs status corosync
corosync の設定ファイル /etc/corosync/corosync.conf
2 ノードクラスターの場合、quorum のセクションに次のような設定を行う (通常は自動で設定されているはずである）
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: 192.168.1.1
}
node {
ring0_addr: 192.168.1.2
}
}
ヘルプ man corosync.conf
man votequorum
クォーラム総評数の過半数を満たさない場合でも起動させる
デフォルトは stop となっている
pcs property set no-quorum-policy=ignore
SQL Server 向け設定
クエリによるフェールオーバー ALTER AVAILABILITY GROUP [SoLAG1] SET (ROLE = SECONDARY)
GO
EXEC sp_set_session_context @key = N'external_cluster', @value = N'yes', @read_only = 1
GO
ALTER AVAILABILITY GROUP [SoLAG1] FAILOVER
GO
リソースの操作フェールオーバー pcs resource move ag_cluster-master SoL01 --master
pcs resource move ag_cluster-master SoL02 --master
同期コミット数の調整同期コミット数の変更 sudo pcs resource update ag_cluster required_synchronized_secondaries_to_commit=0
sudo pcs resource update ag_cluster required_synchronized_secondaries_to_commit=1
設定の確認 sudo pcs resource show ag_cluster
運用
設定の確認 Pacemaker の設定 pcs property
リソースの設定 pcs resource show ag_cluster --full
制約の設定 pcs constraint
ロケーションの設定 pcs constraint location show --full
Fence エージェントの一覧 pcs stonith list
投票数の確認 pcs status corosync
Pacemaker のコマンドによるフェールオー
バー
pcs resource move ag_cluster-master SoL02 --master
フェールオーバーカウントのリセット
第5回 Pacemakerを運用してみよう！［保守運用編(2)］
http://gihyo.jp/admin/serial/01/pacemaker/0005
•
pcs resource failcount reset ag_cluster
スコアの確認
Promotion Score
(合計スコア / location score + master
score)
Promote : Master に昇格
Demote : Slave に降格
制約の確認 pcs constraint
Promotion Score
このスコアが一番高いノードを Master に昇格させる
Promotion Score = location 設定値 + Master Score
Location 設定値 : pcs constraint location
Master Score : リソースエージェントがノードの状態に応じて設定
crm_simulate -sL | grep -i promotion
Master Score crm_mon -fAr
crm_mon -rfotcARj
PacemakerのMaster/Slave構成の基本と事例紹介(DRBD、PostgreSQLレプリケーション) @Open Source Conference 2014
https://www.slideshare.net/tatsuyaw/pacemaker-osc2014tokyo-31882542
•
動かして理解するPacemaker ～CRM設定編～その３
https://linux-ha.osdn.jp/wp/archives/3868
•
障害時にサブサーバへ自動で切り替える「高可用性WordPressシステム」の作り方後編 (1/3)
http://www.atmarkit.co.jp/ait/articles/1601/21/news007.html
•
アクティブ機のデータディスクが壊れたら遅滞なくフェイルオーバさせる方法
https://blog.3ware.co.jp/2015/04/アクティブ機のデータディスクが壊れたら遅滞な/
•
スコアの調整
stickiness = リソースがその場にとどまろう
とする強さ
プロパティの設定設定の変更 pcs property set default-resource-stickiness=200
最大値を設定 pcs property set default-resource-stickiness="INFINITY"
設定の解除 pcs property unset default-resource-stickiness

設定の確認 pcs property
試行回数の変更 (優先順位
に応じて試行)
pcs property set start-failure-is-fatal=false
起動の失敗をリソースに対して致命的と処理するかどうかを指定、false に設定するとリソースの failcount と migration-threshold の値を使用する
(リソース用の migration-threshold オプションの設定は「障害発生のためリソースを移動する」を参照)
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/configuring_the_red_hat_high_availability_add-
on_with_pacemaker/ch-clusteropts-haar
リソースの初期値の設定最大値を設定 pcs resource defaults resource-stickiness=INFINITY migration-threshold=1
migration-threshold : ノードでのリトライ数の上限
設定の解除 pcs resource defaults resource-stickiness= migration-threshold=
設定の確認 pcs resource defaults
リソース個別リソースのメタ情報としてスコアを付与 pcs resource meta ag_cluster resource-stickiness="INFINITY" migration-threshold=1
pcs resource meta ag_cluster migration-threshold=1
pcs resource meta ag_cluster migration-threshold=
pcs resource show ag_cluster
pcs resource defaults
pcs resource meta ag_cluster resource-stickiness=200
pcs resource meta ag_cluster migration-threshold=
pcs resource show ag_cluster
ノード (ロケーション / コロケーション) の設定設定の確認 pcs constraint show --full
pcs constraint location show --full
設定の削除 pcs constraint location remove cli-prefer-ag_cluster-master
# 手動フェールオーバーした場合は、フェールオーバー先が INFINITY になる設定が残るため削除する必要がある
特定ノードでの実行を避ける pcs constraint location ag_cluster-master avoids SoL03
pcs constraint location remove location-ag_cluster-master-SoL03--INFINITY
優先度の調整 pcs constraint location ag_cluster-master prefers SoL01=100
pcs constraint location remove location-ag_cluster-master-SoL01-100
pcs constraint location ag_cluster-master prefers SoL02=100
pcs constraint location ag_cluster-master prefers SoL03=1

Pacemaker 操作方法メモ

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pacemaker 操作方法メモ

Similar to Pacemaker 操作方法メモ (20)

More from Masayuki Ozawa

More from Masayuki Ozawa (20)

Pacemaker 操作方法メモ