jcsug21 20140912

4. VMware Support for DRS

2014/9/12KDDI 北条

第 21 回 CloudStack ユーザ会

What’s New in 4.4

• Load BalancingDRS （ Distributed Resource Scheduler ）のサポート物理サーバの負荷のばらつきを分散して平均化

• Power ManagementDPM （ Distributed Power Management ）のサポート消費電力削減を目的として、物理サーバの負荷を集中化

• Affinity RulesDRS アフィニティルールを使って、クラスタ内のホストへのインスタンスの配置を制御

VMware DRS

• パワーオン時および起動中のインスタンスを物理サーバの CPUとメモリ負荷に応じて、自動的に vMotion する機能

• DRS のアルゴリズムと CloudStack のインスタンス探索

VMware

CloudStack

300 秒 300 秒

60 秒60 秒

300 秒毎にクラスタ内の移行しきい値を計算しきい値超えの場合はロードバランスが改善する移行パスを探索して vMotionを実行

60 秒毎に Management Server から vCenter へインスタンス探索（ DirectAgent ）

時間

時間

・・・

・・・

VMware CloudStack

DRS のメリット (1)

クラスタ内でサイロ化された物理サーバリソースをプール化し、動的分散することでインスタンスの統合率を向上⇒ CloudStack との親和性大

Cluster 1

PrimaryStorage

Cluster 2

PrimaryStorage

Cluster N

PrimaryStorage

ゾーン内でプール化Cluster

Cluster

DRS なしサイロ化

DRS ありクラスタ内でプール化

DRS のメリット (2)

• クラスタ内でプール化されたリソース（ CPU, メモリ）を用途毎に分割

• 「リソースプール」と「シェア、予約、制限」で実現

⇒CloudStack 未サポート

Cluster

営業部

開発部営業部開発部

クラスタリソース100GHz, 50GB

40GHz, 20GB 60GHz, 30GB

検証環境

• ApacheCloudStack4.4 の DRS 機能を実装しているCloudPlatform4.3.0.1

• VMware vCenter5.5• ESXi5.5 　 ×5 台（ 12core/48GB, 同一の CPU 種別）

2 台 DRS クラスタ、 3 台 HA クラスタ• Primary Storage ：　 iSCSI• Secondary Storage ：　 NFS• 1 ゾーン , 1Pod, 2 クラスタ

• 移行の自動化レベル：完全自動化• 移行のしきい値： 5.0 （積極的）

L2 switch

Cluster 1

PrimaryStorage

Cluster 2

Pod 1

SecondaryStorage

L3 switch

Zone １

PrimaryStorage

DRS を構成

検証 (1)

• DRS なし• vCenter で 2vCPU/4GB のインスタンスを手動 vMotion

17:25:39 vMotion 開始17:25:57 vMotion 完了

17:26:34,313 (DirectAgent) Detecting a new state but couldn't find a old state so adding it to the changes17:26:36,708 (DirectAgent) VM is now missing from host report but we detected that it might be migrated to other host by vCenter17:26:36,708 (DirectAgent) VM is now missing from host report and VM is not at starting/migrating state, remove it from host VM-sync map, oldState: Running17:27:17,526 (DirectAgent:” 物理サーバ名” ) find VM on host

vMotion 完了から約 2 分未満で CloudStack がインスタンスを検知

（ management-server.log 一部抜粋）

検証 (2)

• DRS あり（完全自動化、閾値 5.0 ）• 5 台のインスタンス（ 2vCPU/4GB ）に I/O,CPU 負荷を発生• その後、コマンドを停止して負荷のバラつきを発生させる

dd if=/dev/urandom of=/tmp/’date +%Y%m%d%H%M%S’ bs=‘expr $RANDOM % 900’K

yes >> /dev/null

“mackerel”, https://mackerel.io/

検証 (2 cont,)

18:54 5 号機のインスタンスの I/O,CPU 負荷を停止19:01 2 号機が DRS で他方のホストへ vMotion

19:01:34,311 (DirectAgent) Detecting a new state but couldn't find a old state so adding it to the changes19:01:37,192 (DirectAgent) VM is now missing from host report but we detected that it might be migrated to other host by vCenter19:01:37,192 (DirectAgent) VM is now missing from host report and VM is not at starting/migrating state, remove it from host VM-sync map, oldState: Running19:02:03,481 (DirectAgent:” 物理サーバ名” ) find VM on host19:02:03,481 (DirectAgent:” 物理サーバ名” ) VM found in host cache

“mackerel”, https://mackerel.io/

vMotion 完了から約 2 分未満で CloudStack がインスタンスを検知


検証 (3)

• DRS あり（完全自動化、閾値 5.0 ）• 5 台のインスタンス（ 2vCPU/4GB ）に I/O,CPU 負荷を発生• 任意のインスタンスで VM スナップショットとボリュームス

ナップショットを取得

VM スナップショットを取得したインスタンス、ボリュームスナップショット取得中のインスンタンスが DRSの移行対象となり vMotion が実行される。

19:36:34,347 (DirectAgent) Detecting a new state but couldn't find a old state so adding it to the changes19:36:34,347 (DirectAgent) VM is now missing from host report but we detected that it might be migrated to other host by vCenter19:01:37,192 (DirectAgent) VM is now missing from host report and VM is not at starting/migrating state, remove it from host VM-sync map, oldState: Running19:36:42,870 (DirectAgent:” 物理サーバ名” ) find VM on host19:36:42,870 (DirectAgent:” 物理サーバ名” ) VM found in host cache


検証 (4)

• インスタンスの移行時のログでストップコマンドの実行が出力される場合がある。

• ログ出力時にインスタンスの停止は未発生。

17:34:34,343 (DirectAgent) VM is now missing from host report but we detected that it might be migrated to other host by vCenter17:34:34,343 (DirectAgent) VM is now missing from host report and VM is not at starting/migrating state, remove it from host VM-sync map, oldState: Running17:34:34,439 INFO (DirectAgent) VM is at Running and we received a power-off report while there is no pending jobs on it17:34:34,441 Sending "com.cloud.agent.api.StopCommand“17:34:34,441 Executing "com.cloud.agent.api.StopCommand"17:34:34,441 Executing resource StopCommand: 17:34:34,458 find VM on host17:34:34,458 VM not found in host cache


検証 (5)

• DRS アフィニティルールの設定

インスタンス

仮想マシン～ホスト間の

アフィニティ

仮想マシン間の非アフィニティ

01 物理サーバ #1 AntiAffinity_Group

02 物理サーバ #1

03

04

05 AntiAffinity_Group

インスタンス 01 と 02 は同じサーバで稼働し、 01 と 05 は異なるサーバで稼働する。アフィニティルールの設定時にこの条件を満たしていない場合は、DRS によりインスタンスの移行が実行される。

VMware DPM

• クラスタの負荷変動に応じて自動的にサーバの電源を ON/OFF• 低負荷のサーバ上のインスタンスを自動的に vMotion して、空

になったサーバの電源を Standby Mode へ遷移する。• クラスタが高負荷になった場合に、自動的に電源 ON する。• BMC 、 IPMI 等によってサーバの電源を制御

＜未検証＞

DRS 利用時の設計のベストプラクティス

• DRS により統合率は 40-60% 上昇すると言われている。• CPU はオーバコミットする。ただし割り当て待ち状態に注意。• メモリはオーバコミットしない。メモリ回収によるインスタ

ンスのパフォーマンスダウンを考慮。

• CloudStack ではリソース割り当ての閾値をクラスタ毎で設定。　閾値 = “allocated.capacity.diablethreshold” × “overprovisiong.factor”

　閾値は CPU150% として使用率を監視して統合率を徐々に上げる。　メモリ閾値は既定値として、物理メモリは DRS 利用しない場合の　サイジングの 1.5 ～ 3 倍を搭載する。

cluster.cpu.allocated.capacity.diablethreshold0.85(default)cpu.overprovisiong.factor 1.8cluster.memory.allocated.capacity.diablethreshold 0.85(defalut)cpu.overprovisiong.factor 1(defatult)

1.53

0.85

ご清聴ありがとうございました

jcsug21 20140912

Technology