2013年2月16日土曜日

監視・起動方法について(フォアグランド・バックグラウンド)

前回までは手動でmasterha_master_switchコマンド実行をすることで、
フェイルオーバーを実行してきましたが、
今回はmha-managerを起動させて、実際に自動でフェイルオーバーが行われる事を確認します。

前回までの記事はこちらになります。
MySQL-MHAのインストール&設定
masterha_master_switchについて(前編)
masterha_master_switchについて(後編)

実行前に現在のサーバー構成を確認する
・mha-managerサーバー
  IP:192.168.10.227
・masterサーバー
  IP:192.168.10.233
  VIP:192.168.10.234
・salve1
  IP:192.168.10.229
・salve2
  IP:192.168.10.228

■手順1)mha-managerの起動
mha-managerはデフォルトではフォアグランドで起動されますので、
まずはフォアグランドで検証し、次にバックグラウンドで起動した時の動作を検証したいと思います。

フォアグランドで起動すると、次の出力がされます。
# masterha_manager --conf=/etc/mha.cnf
Fri Feb 15 09:15:18 2013 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Feb 15 09:15:18 2013 - [info] Reading application default configurations from /etc/mha.cnf..
Fri Feb 15 09:15:18 2013 - [info] Reading server configurations from /etc/mha.cnf..
別のコンソールなどで、mha-managerの状態とログを確認する。
# masterha_check_status --conf=/etc/mha.cnf
ログファイルを見てみる。
# view /tmp/mha/log/mha.log
Fri Feb 15 09:15:18 2013 - [info] MHA::MasterMonitor version 0.55.
Fri Feb 15 09:15:18 2013 - [info] Dead Servers:
Fri Feb 15 09:15:18 2013 - [info] Alive Servers:
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.228(192.168.10.228:3306)
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.229(192.168.10.229:3306)
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info] Alive Slaves:
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.228(192.168.10.228:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:15:18 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:15:18 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info] Current Alive Master: 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info] Checking slave configurations..
Fri Feb 15 09:15:18 2013 - [info]  read_only=1 is not set on slave 192.168.10.228(192.168.10.228:3306).
Fri Feb 15 09:15:18 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.228(192.168.10.228:3306).
Fri Feb 15 09:15:18 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.229(192.168.10.229:3306).
Fri Feb 15 09:15:18 2013 - [info] Checking replication filtering settings..
Fri Feb 15 09:15:18 2013 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Feb 15 09:15:18 2013 - [info]  Replication filtering check ok.
Fri Feb 15 09:15:18 2013 - [info] Starting SSH connection tests..
Fri Feb 15 09:15:22 2013 - [info] All SSH connection tests passed successfully.
Fri Feb 15 09:15:22 2013 - [info] Checking MHA Node version..
Fri Feb 15 09:15:23 2013 - [info]  Version check ok.
Fri Feb 15 09:15:23 2013 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 15 09:15:24 2013 - [info] HealthCheck: SSH to 192.168.10.233 is reachable.
Fri Feb 15 09:15:24 2013 - [info] Master MHA Node version is 0.54.
Fri Feb 15 09:15:24 2013 - [info] Checking recovery script configurations on the current master..
Fri Feb 15 09:15:24 2013 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/tmp/mha/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000013
Fri Feb 15 09:15:24 2013 - [info]   Connecting to root@192.168.10.233(192.168.10.233)..
  Creating /tmp/mha if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Feb 15 09:15:25 2013 - [info] Master setting check done.
Fri Feb 15 09:15:25 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Feb 15 09:15:25 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='repl' --slave_host=192.168.10.228 --slave_ip=192.168.10.228 --slave_port=3306 --workdir=/tmp/mha --target_version=5.5.29-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Feb 15 09:15:25 2013 - [info]   Connecting to root@192.168.10.228(192.168.10.228:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Feb 15 09:15:26 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='repl' --slave_host=192.168.10.229 --slave_ip=192.168.10.229 --slave_port=3306 --workdir=/tmp/mha --target_version=5.5.29-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Feb 15 09:15:26 2013 - [info]   Connecting to root@192.168.10.229(192.168.10.229:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Feb 15 09:15:26 2013 - [info] Slaves settings check done.
Fri Feb 15 09:15:26 2013 - [info]
192.168.10.233 (current master)
 +--192.168.10.228
 +--192.168.10.229

Fri Feb 15 09:15:26 2013 - [info] Checking master_ip_failover_script status:
Fri Feb 15 09:15:26 2013 - [info]   /usr/bin/master_ip_failover --virtual_ip=192.168.10.234 --orig_master_vip_eth=eth0:234 --new_master_vip_eth=eth0:234 --command=status --ssh_user=root --orig_master_host=192.168.10.233 --orig_master_ip=192.168.10.233 --orig_master_port=3306
DEBUG PARAMETERS***********
command => status
ssh_user=s => root
orig_master_host => 192.168.10.233
orig_master_ip => 192.168.10.233
orig_master_port => 3306
virtual_ip => 192.168.10.234
orig_master_vip_eth => eth0:234
new_master_vip_eth => eth0:234
Fri Feb 15 09:15:27 2013 - [info]  OK.
Fri Feb 15 09:15:27 2013 - [warning] shutdown_script is not defined.
Fri Feb 15 09:15:27 2013 - [info] Set master ping interval 3 seconds.
Fri Feb 15 09:15:27 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Feb 15 09:15:27 2013 - [info] Starting ping health check on 192.168.10.233(192.168.10.233:3306)..
Fri Feb 15 09:15:27 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
■手順2)master(192.168.10.233)の停止を行う。

この状態で、実際にmaster(192.168.10.233)のmysqlを停止してみます。
# /etc/init.d/mysqld stop
mysqld を停止中:                                           [  OK  ]
そうすると、先程起動したmha-manager-サーバーのコンソールと、
ログファイルに次のような出力がされ、フェイルオーバーが完了されます。

フォアグランドで実行されたコンソール画面
# masterha_manager --conf=/etc/mha.cnf
Fri Feb 15 09:15:18 2013 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Feb 15 09:15:18 2013 - [info] Reading application default configurations from /etc/mha.cnf..
Fri Feb 15 09:15:18 2013 - [info] Reading server configurations from /etc/mha.cnf..
  Creating /tmp/mha if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Feb 15 09:16:24 2013 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Feb 15 09:16:24 2013 - [info] Reading application default configurations from /etc/mha.cnf..
Fri Feb 15 09:16:24 2013 - [info] Reading server configurations from /etc/mha.cnf..
ログファイルの確認
Fri Feb 15 09:15:18 2013 - [info] MHA::MasterMonitor version 0.55.
Fri Feb 15 09:15:18 2013 - [info] Dead Servers:
Fri Feb 15 09:15:18 2013 - [info] Alive Servers:
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.228(192.168.10.228:3306)
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.229(192.168.10.229:3306)
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info] Alive Slaves:
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.228(192.168.10.228:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:15:18 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:15:18 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info] Current Alive Master: 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:15:18 2013 - [info] Checking slave configurations..
Fri Feb 15 09:15:18 2013 - [info]  read_only=1 is not set on slave 192.168.10.228(192.168.10.228:3306).
Fri Feb 15 09:15:18 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.228(192.168.10.228:3306).
Fri Feb 15 09:15:18 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.229(192.168.10.229:3306).
Fri Feb 15 09:15:18 2013 - [info] Checking replication filtering settings..
Fri Feb 15 09:15:18 2013 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Feb 15 09:15:18 2013 - [info]  Replication filtering check ok.
Fri Feb 15 09:15:18 2013 - [info] Starting SSH connection tests..
Fri Feb 15 09:15:22 2013 - [info] All SSH connection tests passed successfully.
Fri Feb 15 09:15:22 2013 - [info] Checking MHA Node version..
Fri Feb 15 09:15:23 2013 - [info]  Version check ok.
Fri Feb 15 09:15:23 2013 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 15 09:15:24 2013 - [info] HealthCheck: SSH to 192.168.10.233 is reachable.
Fri Feb 15 09:15:24 2013 - [info] Master MHA Node version is 0.54.
Fri Feb 15 09:15:24 2013 - [info] Checking recovery script configurations on the current master..
Fri Feb 15 09:15:24 2013 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/tmp/mha/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000013
Fri Feb 15 09:15:24 2013 - [info]   Connecting to root@192.168.10.233(192.168.10.233)..
  Creating /tmp/mha if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Feb 15 09:15:25 2013 - [info] Master setting check done.
Fri Feb 15 09:15:25 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Feb 15 09:15:25 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='repl' --slave_host=192.168.10.228 --slave_ip=192.168.10.228 --slave_port=3306 --workdir=/tmp/mha --target_version=5.5.29-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Feb 15 09:15:25 2013 - [info]   Connecting to root@192.168.10.228(192.168.10.228:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Feb 15 09:15:26 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='repl' --slave_host=192.168.10.229 --slave_ip=192.168.10.229 --slave_port=3306 --workdir=/tmp/mha --target_version=5.5.29-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Feb 15 09:15:26 2013 - [info]   Connecting to root@192.168.10.229(192.168.10.229:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Feb 15 09:15:26 2013 - [info] Slaves settings check done.
Fri Feb 15 09:15:26 2013 - [info]
192.168.10.233 (current master)
 +--192.168.10.228
 +--192.168.10.229

Fri Feb 15 09:15:26 2013 - [info] Checking master_ip_failover_script status:
Fri Feb 15 09:15:26 2013 - [info]   /usr/bin/master_ip_failover --virtual_ip=192.168.10.234 --orig_master_vip_eth=eth0:234 --new_master_vip_eth=eth0:234 --command=status --ssh_user=root --orig_master_host=192.168.10.233 --orig_master_ip=192.168.10.233 --orig_master_port=3306
DEBUG PARAMETERS***********
command => status
ssh_user=s => root
orig_master_host => 192.168.10.233
orig_master_ip => 192.168.10.233
orig_master_port => 3306
virtual_ip => 192.168.10.234
orig_master_vip_eth => eth0:234
new_master_vip_eth => eth0:234
Fri Feb 15 09:15:27 2013 - [info]  OK.
Fri Feb 15 09:15:27 2013 - [warning] shutdown_script is not defined.
Fri Feb 15 09:15:27 2013 - [info] Set master ping interval 3 seconds.
Fri Feb 15 09:15:27 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Feb 15 09:15:27 2013 - [info] Starting ping health check on 192.168.10.233(192.168.10.233:3306)..
Fri Feb 15 09:15:27 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Feb 15 09:16:15 2013 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Fri Feb 15 09:16:15 2013 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/tmp/mha/save_binary_logs_test --manager_version=0.55 --binlog_prefix=mysql-bin
Fri Feb 15 09:16:15 2013 - [info] HealthCheck: SSH to 192.168.10.233 is reachable.
Fri Feb 15 09:16:18 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Feb 15 09:16:18 2013 - [warning] Connection failed 1 time(s)..
Fri Feb 15 09:16:21 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Feb 15 09:16:21 2013 - [warning] Connection failed 2 time(s)..
Fri Feb 15 09:16:24 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Feb 15 09:16:24 2013 - [warning] Connection failed 3 time(s)..
Fri Feb 15 09:16:24 2013 - [warning] Master is not reachable from health checker!
Fri Feb 15 09:16:24 2013 - [warning] Master 192.168.10.233(192.168.10.233:3306) is not reachable!
Fri Feb 15 09:16:24 2013 - [warning] SSH is reachable.
Fri Feb 15 09:16:24 2013 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha.cnf again, and trying to connect to all servers to check server status..
Fri Feb 15 09:16:24 2013 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Feb 15 09:16:24 2013 - [info] Reading application default configurations from /etc/mha.cnf..
Fri Feb 15 09:16:24 2013 - [info] Reading server configurations from /etc/mha.cnf..
Fri Feb 15 09:16:24 2013 - [info] Dead Servers:
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:24 2013 - [info] Alive Servers:
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.228(192.168.10.228:3306)
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.229(192.168.10.229:3306)
Fri Feb 15 09:16:24 2013 - [info] Alive Slaves:
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.228(192.168.10.228:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:24 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:24 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:24 2013 - [info] Checking slave configurations..
Fri Feb 15 09:16:24 2013 - [info]  read_only=1 is not set on slave 192.168.10.228(192.168.10.228:3306).
Fri Feb 15 09:16:24 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.228(192.168.10.228:3306).
Fri Feb 15 09:16:24 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.229(192.168.10.229:3306).
Fri Feb 15 09:16:24 2013 - [info] Checking replication filtering settings..
Fri Feb 15 09:16:24 2013 - [info]  Replication filtering check ok.
Fri Feb 15 09:16:24 2013 - [info] Master is down!
Fri Feb 15 09:16:24 2013 - [info] Terminating monitoring script.
Fri Feb 15 09:16:24 2013 - [info] Got exit code 20 (Master dead).
Fri Feb 15 09:16:24 2013 - [info] MHA::MasterFailover version 0.55.
Fri Feb 15 09:16:24 2013 - [info] Starting master failover.
Fri Feb 15 09:16:24 2013 - [info]
Fri Feb 15 09:16:24 2013 - [info] * Phase 1: Configuration Check Phase..
Fri Feb 15 09:16:24 2013 - [info]
Fri Feb 15 09:16:24 2013 - [info] Dead Servers:
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:24 2013 - [info] Checking master reachability via mysql(double check)..
Fri Feb 15 09:16:24 2013 - [info]  ok.
Fri Feb 15 09:16:24 2013 - [info] Alive Servers:
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.228(192.168.10.228:3306)
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.229(192.168.10.229:3306)
Fri Feb 15 09:16:24 2013 - [info] Alive Slaves:
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.228(192.168.10.228:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:24 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:24 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:24 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:24 2013 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Feb 15 09:16:24 2013 - [info]
Fri Feb 15 09:16:24 2013 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Feb 15 09:16:24 2013 - [info]
Fri Feb 15 09:16:24 2013 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Feb 15 09:16:24 2013 - [info] Executing master IP deactivatation script:
Fri Feb 15 09:16:24 2013 - [info]   /usr/bin/master_ip_failover --virtual_ip=192.168.10.234 --orig_master_vip_eth=eth0:234 --new_master_vip_eth=eth0:234 --orig_master_host=192.168.10.233 --orig_master_ip=192.168.10.233 --orig_master_port=3306 --command=stopssh --ssh_user=root
DEBUG PARAMETERS***********
command => stopssh
ssh_user=s => root
orig_master_host => 192.168.10.233
orig_master_ip => 192.168.10.233
orig_master_port => 3306
virtual_ip => 192.168.10.234
orig_master_vip_eth => eth0:234
new_master_vip_eth => eth0:234
Fri Feb 15 09:16:25 2013 - [info]  done.
Fri Feb 15 09:16:25 2013 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Feb 15 09:16:25 2013 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Feb 15 09:16:25 2013 - [info]
Fri Feb 15 09:16:25 2013 - [info] * Phase 3: Master Recovery Phase..
Fri Feb 15 09:16:25 2013 - [info]
Fri Feb 15 09:16:25 2013 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Feb 15 09:16:25 2013 - [info]
Fri Feb 15 09:16:25 2013 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:107
Fri Feb 15 09:16:25 2013 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Feb 15 09:16:25 2013 - [info]   192.168.10.228(192.168.10.228:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:25 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:25 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:25 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:25 2013 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:107
Fri Feb 15 09:16:25 2013 - [info] Oldest slaves:
Fri Feb 15 09:16:25 2013 - [info]   192.168.10.228(192.168.10.228:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:25 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:25 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 09:16:25 2013 - [info]     Replicating from 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:16:25 2013 - [info]
Fri Feb 15 09:16:25 2013 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Feb 15 09:16:25 2013 - [info]
Fri Feb 15 09:16:25 2013 - [info] Fetching dead master's binary logs..
Fri Feb 15 09:16:25 2013 - [info] Executing command on the dead master 192.168.10.233(192.168.10.233:3306): save_binary_logs --command=save --start_file=mysql-bin.000013  --start_pos=107 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
  Creating /tmp/mha if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000013 pos 107 to mysql-bin.000013 EOF into /tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysql-bin.000013 position 107 to tail(126).. ok.
 Concat succeeded.
Fri Feb 15 09:16:27 2013 - [info] scp from root@192.168.10.233:/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog to local:/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog succeeded.
Fri Feb 15 09:16:28 2013 - [info] HealthCheck: SSH to 192.168.10.228 is reachable.
Fri Feb 15 09:16:29 2013 - [info] HealthCheck: SSH to 192.168.10.229 is reachable.
Fri Feb 15 09:16:30 2013 - [info]
Fri Feb 15 09:16:30 2013 - [info] * Phase 3.3: Determining New Master Phase..
Fri Feb 15 09:16:30 2013 - [info]
Fri Feb 15 09:16:30 2013 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Feb 15 09:16:30 2013 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Feb 15 09:16:30 2013 - [info] Searching new master from slaves..
Fri Feb 15 09:16:30 2013 - [info]  Candidate masters from the configuration file:
Fri Feb 15 09:16:30 2013 - [info]  Non-candidate masters:
Fri Feb 15 09:16:30 2013 - [info] New master is 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 09:16:30 2013 - [info] Starting master failover..
Fri Feb 15 09:16:30 2013 - [info]
From:
192.168.10.233 (current master)
 +--192.168.10.228
 +--192.168.10.229

To:
192.168.10.228 (new master)
 +--192.168.10.229
Fri Feb 15 09:16:30 2013 - [info]
Fri Feb 15 09:16:30 2013 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Feb 15 09:16:30 2013 - [info]
Fri Feb 15 09:16:30 2013 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Feb 15 09:16:30 2013 - [info] Sending binlog..
Fri Feb 15 09:16:31 2013 - [info] scp from local:/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog to root@192.168.10.228:/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog succeeded.
Fri Feb 15 09:16:31 2013 - [info]
Fri Feb 15 09:16:31 2013 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Feb 15 09:16:31 2013 - [info]
Fri Feb 15 09:16:31 2013 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Feb 15 09:16:31 2013 - [info] Starting recovery on 192.168.10.228(192.168.10.228:3306)..
Fri Feb 15 09:16:31 2013 - [info]  Generating diffs succeeded.
Fri Feb 15 09:16:31 2013 - [info] Waiting until all relay logs are applied.
Fri Feb 15 09:16:31 2013 - [info]  done.
Fri Feb 15 09:16:31 2013 - [info] Getting slave status..
Fri Feb 15 09:16:31 2013 - [info] This slave(192.168.10.228)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:107). No need to recover from Exec_Master_Log_Pos.
Fri Feb 15 09:16:31 2013 - [info] Connecting to the target slave host 192.168.10.228, running recover script..
Fri Feb 15 09:16:31 2013 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='repl' --slave_host=192.168.10.228 --slave_ip=192.168.10.228  --slave_port=3306 --apply_files=/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog --workdir=/tmp/mha --target_version=5.5.29-log --timestamp=20130215091624 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Feb 15 09:16:32 2013 - [info]
Applying differential binary/relay log files /tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog on 192.168.10.228:3306. This may take long time...
Applying log files succeeded.
Fri Feb 15 09:16:32 2013 - [info]  All relay logs were successfully applied.
Fri Feb 15 09:16:32 2013 - [info] Getting new master's binlog name and position..
Fri Feb 15 09:16:32 2013 - [info]  mysql-bin.000012:107
Fri Feb 15 09:16:32 2013 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.228', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000012', MASTER_LOG_POS=107, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Feb 15 09:16:32 2013 - [info] Executing master IP activate script:
Fri Feb 15 09:16:32 2013 - [info]   /usr/bin/master_ip_failover --virtual_ip=192.168.10.234 --orig_master_vip_eth=eth0:234 --new_master_vip_eth=eth0:234 --command=start --ssh_user=root --orig_master_host=192.168.10.233 --orig_master_ip=192.168.10.233 --orig_master_port=3306 --new_master_host=192.168.10.228 --new_master_ip=192.168.10.228 --new_master_port=3306 --new_master_user='repl' --new_master_password='repl'
DEBUG PARAMETERS***********
command => start
ssh_user=s => root
orig_master_host => 192.168.10.233
orig_master_ip => 192.168.10.233
orig_master_port => 3306
new_master_host => 192.168.10.228
new_master_ip => 192.168.10.228
new_master_port => 3306
virtual_ip => 192.168.10.234
orig_master_vip_eth => eth0:234
new_master_vip_eth => eth0:234
Set read_only=0 on the new master.
Fri Feb 15 09:16:33 2013 - [info]  OK.
Fri Feb 15 09:16:33 2013 - [info] ** Finished master recovery successfully.
Fri Feb 15 09:16:33 2013 - [info] * Phase 3: Master Recovery Phase completed.
Fri Feb 15 09:16:33 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info] * Phase 4: Slaves Recovery Phase..
Fri Feb 15 09:16:33 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Feb 15 09:16:33 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info] -- Slave diff file generation on host 192.168.10.229(192.168.10.229:3306) started, pid: 24226. Check tmp log /tmp/mha/192.168.10.229_3306_20130215091624.log if it takes time..
Fri Feb 15 09:16:33 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info] Log messages from 192.168.10.229 ...
Fri Feb 15 09:16:33 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Feb 15 09:16:33 2013 - [info] End of log messages from 192.168.10.229.
Fri Feb 15 09:16:33 2013 - [info] -- 192.168.10.229(192.168.10.229:3306) has the latest relay log events.
Fri Feb 15 09:16:33 2013 - [info] Generating relay diff files from the latest slave succeeded.
Fri Feb 15 09:16:33 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Feb 15 09:16:33 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info] -- Slave recovery on host 192.168.10.229(192.168.10.229:3306) started, pid: 24228. Check tmp log /tmp/mha/192.168.10.229_3306_20130215091624.log if it takes time..
Fri Feb 15 09:16:35 2013 - [info]
Fri Feb 15 09:16:35 2013 - [info] Log messages from 192.168.10.229 ...
Fri Feb 15 09:16:35 2013 - [info]
Fri Feb 15 09:16:33 2013 - [info] Sending binlog..
Fri Feb 15 09:16:34 2013 - [info] scp from local:/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog to root@192.168.10.229:/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog succeeded.
Fri Feb 15 09:16:34 2013 - [info] Starting recovery on 192.168.10.229(192.168.10.229:3306)..
Fri Feb 15 09:16:34 2013 - [info]  Generating diffs succeeded.
Fri Feb 15 09:16:34 2013 - [info] Waiting until all relay logs are applied.
Fri Feb 15 09:16:34 2013 - [info]  done.
Fri Feb 15 09:16:34 2013 - [info] Getting slave status..
Fri Feb 15 09:16:34 2013 - [info] This slave(192.168.10.229)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:107). No need to recover from Exec_Master_Log_Pos.
Fri Feb 15 09:16:34 2013 - [info] Connecting to the target slave host 192.168.10.229, running recover script..
Fri Feb 15 09:16:34 2013 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='repl' --slave_host=192.168.10.229 --slave_ip=192.168.10.229  --slave_port=3306 --apply_files=/tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog --workdir=/tmp/mha --target_version=5.5.29-log --timestamp=20130215091624 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx
Fri Feb 15 09:16:35 2013 - [info]
Applying differential binary/relay log files /tmp/mha/saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog on 192.168.10.229:3306. This may take long time...
Applying log files succeeded.
Fri Feb 15 09:16:35 2013 - [info]  All relay logs were successfully applied.
Fri Feb 15 09:16:35 2013 - [info]  Resetting slave 192.168.10.229(192.168.10.229:3306) and starting replication from the new master 192.168.10.228(192.168.10.228:3306)..
Fri Feb 15 09:16:35 2013 - [info]  Executed CHANGE MASTER.
Fri Feb 15 09:16:35 2013 - [info]  Slave started.
Fri Feb 15 09:16:35 2013 - [info] End of log messages from 192.168.10.229.
Fri Feb 15 09:16:35 2013 - [info] -- Slave recovery on host 192.168.10.229(192.168.10.229:3306) succeeded.
Fri Feb 15 09:16:35 2013 - [info] All new slave servers recovered successfully.
Fri Feb 15 09:16:35 2013 - [info]
Fri Feb 15 09:16:35 2013 - [info] * Phase 5: New master cleanup phase..
Fri Feb 15 09:16:35 2013 - [info]
Fri Feb 15 09:16:35 2013 - [info] Resetting slave info on the new master..
Fri Feb 15 09:16:35 2013 - [info]  192.168.10.228: Resetting slave info succeeded.
Fri Feb 15 09:16:35 2013 - [info] Master failover to 192.168.10.228(192.168.10.228:3306) completed successfully.
Fri Feb 15 09:16:35 2013 - [info]

----- Failover Report -----

mha: MySQL Master failover 192.168.10.233 to 192.168.10.228 succeeded

Master 192.168.10.233 is down!

Check MHA Manager logs at local-dev1-vm004:/tmp/mha/log/mha.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.10.233.
The latest slave 192.168.10.228(192.168.10.228:3306) has all relay logs for recovery.
Selected 192.168.10.228 as a new master.
192.168.10.228: OK: Applying all logs succeeded.
192.168.10.228: OK: Activated master IP address.
192.168.10.229: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.10.229: OK: Applying all logs succeeded. Slave started, replicating from 192.168.10.228.
192.168.10.228: Resetting slave info succeeded.
Master failover to 192.168.10.228(192.168.10.228:3306) completed successfully.
完了キタ――(゚∀゚)――!!

■各サーバーの状況を確認する

まず、新master(192.168.10.233)の状況を確認します。
ログを見ると、
Master failover to 192.168.10.228(192.168.10.228:3306) completed successfully.
と、書かれているので新しいmasterは192.168.10.228になったみたいです。

新masterに仮想IPが追加されているか確認する。
# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:47:5E:B8
          inet addr:192.168.10.228  Bcast:192.168.10.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5180059 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1487820 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1376260950 (1.2 GiB)  TX bytes:4242466911 (3.9 GiB)

eth0:234  Link encap:Ethernet  HWaddr 00:16:3E:47:5E:B8
          inet addr:192.168.10.234  Bcast:192.168.10.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:558 errors:0 dropped:0 overruns:0 frame:0
          TX packets:558 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:72882 (71.1 KiB)  TX bytes:72882 (71.1 KiB)
各状態を確認する
#slaveの情報が消えているか確認
mysql> show slave status\G
Empty set (0.00 sec)

#参照されているslaveのホスト情報を確認する
mysql> show slave hosts;
+-----------+---------+------+-----------+
| Server_id | Host    | Port | Master_id |
+-----------+---------+------+-----------+
|        20 | slave20 | 3306 |        10 |
+-----------+---------+------+-----------+
1 row in set (0.00 sec)

#masterの情報を確認する
mysql> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000012 |      107 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

#read_only=0になっているか確認する
mysql> SELECT @@read_only;
+-------------+
| @@read_only |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)
次に、slaveの確認を行います。

元master(192.168.10.233)は落ち、slave2(192.168.10.228)だったのが新masterに変わったので、
slave1(192.168.10.229)はそのままslaveで、masterの情報が更新されているはずです。

Master_Hostが新masterになっている事を確認する。
Slave_IO_Running、Slave_SQL_Runningが「Yes」になっている事を確認する。
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.228
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000012
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000012
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 409
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 10
1 row in set (0.00 sec)
read_only=1になっている事を確認する
mysql> SELECT @@read_only;
+-------------+
| @@read_only |
+-------------+
|           1 |
+-------------+
1 row in set (0.00 sec)
次に、元master(192.168.10.2339の状況を確認します。

先程、mysqlを停止したのでプロセスが落ちています。
# ps axu|grep mysql
root      5495  0.0  0.5  23016  2776 xvc0     S+   Feb08   0:00 mysql -u root
root     30971  0.0  0.1   5120   796 pts/0    R+   09:38   0:00 grep mysql
仮想IPが剥奪されているか確認する。
# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:48:5F:B8
          inet addr:192.168.10.233  Bcast:192.168.10.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3147603 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58914 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:714469742 (681.3 MiB)  TX bytes:6138453 (5.8 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:400 errors:0 dropped:0 overruns:0 frame:0
          TX packets:400 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:46792 (45.6 KiB)  TX bytes:46792 (45.6 KiB)

lo:0      Link encap:Local Loopback
          inet addr:192.168.10.223  Mask:255.255.255.255
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
一応、仮想IPで新masterに接続できるか確認してみる。
# mysql -u repl -h 192.168.10.234 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 29
Server version: 5.5.29-log MySQL Community Server (GPL) by Remi

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show slave hosts;
+-----------+---------+------+-----------+
| Server_id | Host    | Port | Master_id |
+-----------+---------+------+-----------+
|        20 | slave20 | 3306 |        10 |
+-----------+---------+------+-----------+
1 row in set (0.00 sec)
これで、全て確認が終わりましたで、正常にフェイルオーバーが完了していることになります。
※masterha_check_ssh、masterha_check_replを実行して確認もしておくと(・∀・)イイネ!!

フェイルオーバーが完了すると、mha-managerは落ちるので、
再度、監視を行いたい場合には、起動してあげる必要があります。

mha-managerサーバーの監視状態を確認する。
# masterha_check_status --conf=/etc/mha.cnf
■各サーバーの作業ディレクトリを確認してみる

各サーバーで作られているファイルについて
・saved_master_binlog_from_~は、各slaveサーバーへ転送する差分バイナリーログ。
・mha.failover.completeは、フェイルオーバーが完了したというフラグ用のファイル。
※mha.failover.completeがある状態で、再度mha-managerの起動を行うとするとエラーになる。
 (デフォルトでは8時間以内に再度起動は出来なくなっている。設定または起動オプションで回避可能。)
・relay_log_apply_for_~は、エラーが発生したリレーログ?

・mha-mahager(192.168.10.227)
# ls -alt /tmp/mha/
合計 36
drwxrwxrwx 3 root root 4096  2月 15 09:16 .
-rw-r--r-- 1 root root    0  2月 15 09:16 mha.failover.complete
-rw-r--r-- 1 root root  126  2月 15 09:16 saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog
・元master(192.168.10.233)
# ls -alt /tmp/mha/
合計 24
drwxrwxrwt 5 root root 4096  2月 15 09:54 ..
drwxrwxrwx 2 root root 4096  2月 15 09:16 .
-rw-r--r-- 1 root root  126  2月 15 09:16 saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog
・新master(192.168.10.228)
# ls -alt /tmp/mha/
合計 32
drwxrwxrwx 2 root root 4096  2月 15 09:16 .
-rw-r--r-- 1 root root  767  2月 15 09:16 relay_log_apply_for_192.168.10.228_3306_20130215091624_err.log
-rw-r--r-- 1 root root  126  2月 15 09:16 saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog
drwxrwxrwt 5 root root 4096  2月 15 04:02 ..
・slave(192.168.10.229)
# ls -alt /tmp/mha/
合計 32
drwxrwxrwx 2 root root 4096  2月 15 09:16 .
drwxrwxrwt 5 root root 4096  2月 15 09:16 ..
-rw-r--r-- 1 root root  767  2月 15 09:16 relay_log_apply_for_192.168.10.229_3306_20130215091624_err.log
-rw-r--r-- 1 root root  126  2月 15 09:16 saved_master_binlog_from_192.168.10.233_3306_20130215091624.binlog
■落ちたmasterをslaveとして復旧させる

検証用に落としたmasterを復旧させて、slaveとして起動させたいと思います。

起動する前に、my.cnfにskip_slave_start = 1を記述して置いた方が良いかも?
(skip_slave_start = 1が無いと、mysqlを起動した段階で勝手にレプリケーションを開始しようとします。)
元々masterだったので、設定して無くても同期されるってことは無いと思いますが・・・念のために(゚д゚)(。_。)(゚д゚)(。_。) ウンウン

次に、slaveとして起動させるためには、CHANGE MASTER TOコマンドを実行する必要があります。
いちいち、masterの情報を確認して、CHANGE MASTER TOコマンドを組み立てるかというと、
そんな野暮用は必要ありませんよ!奥さん( ´∀`)bグッ!

なんとmha-managerのログには、slaveに実行するべきCHANGE MASTER TOコマンドが記述されています。
なのでそのログを利用することで簡単にslaveとして復旧させることが可能になっています(; ・`д・´) ナ、ナンダッテー!! (`・д´・ ;)

ログファイルに出力されている次の箇所を見つけます。
Fri Feb 15 09:16:32 2013 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.228', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000012', MASTER_LOG_POS=107, MASTER_USER='repl', MASTER_PASSWORD='xxx';
ログの出力結果は、MASTER_PASSWORDの部分が'xxx'になっているので、正しい値に編集してから実行する。
#mysqlの起動
# /etc/init.d/mysqld start
mysqld を起動中:                                           [  OK  ]

#slaveの情報をリセット
mysql> RESET SLAVE;
Query OK, 0 rows affected (0.00 sec)

#master情報を設定する
mysql> CHANGE MASTER TO MASTER_HOST='192.168.10.228', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000012', MASTER_LOG_POS=107, MASTER_USER='repl', MASTER_PASSWORD='repl';
Query OK, 0 rows affected (0.02 sec)

#master情報の確認
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.10.228
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000012
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysql-bin.000012
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 107
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 0
1 row in set (0.00 sec)

#同期の開始
mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

#状態の確認
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.228
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000012
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000012
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 409
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 10
1 row in set (0.00 sec)
以上で、masterをslaveとして復旧することが出来ましたワーイヽ(゚∀゚)メ(゚∀゚)メ(゚∀゚)ノワーイ

今回は、フォアグランドで起動した場合でしたが、
今度はバックグラウンドで実行して、masterサーバーはOSのシャットダウンでやってみたいと思います。
基本的には起動するときのコマンドが違うだけで後は一緒になります。

実行前に現在のサーバー構成を確認する
・mha-managerサーバー
  IP:192.168.10.227
・masterサーバー
  IP:192.168.10.228
  VIP:192.168.10.234
・salve1
  IP:192.168.10.229
・salve2
  IP:192.168.10.233
■mha-managerをバックグラウンドで起動する。
バックグラウンド実行する
# nohup masterha_manager --conf=/etc/mha.cnf < /dev/null > /tmp/mha/log/mha.log 2>&1 &
[1] 24546
■状態の確認
状態の確認
# masterha_check_status --conf=/etc/mha.cnf
mha (pid:24546) is running(0:PING_OK), master:192.168.10.228
■masterサーバーのシャットダウンを行う
masterサーバーをシャットダウンさせる。
shutdown -h now
■mha-managerのログファイルを確認する
ログファイル
Fri Feb 15 10:52:42 2013 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Feb 15 10:52:42 2013 - [info] Reading application default configurations from /etc/mha.cnf..
Fri Feb 15 10:52:42 2013 - [info] Reading server configurations from /etc/mha.cnf..
Fri Feb 15 10:52:42 2013 - [info] MHA::MasterMonitor version 0.55.
Fri Feb 15 10:52:42 2013 - [info] Dead Servers:
Fri Feb 15 10:52:42 2013 - [info] Alive Servers:
Fri Feb 15 10:52:42 2013 - [info]   192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:52:42 2013 - [info]   192.168.10.229(192.168.10.229:3306)
Fri Feb 15 10:52:42 2013 - [info]   192.168.10.233(192.168.10.233:3306)
Fri Feb 15 10:52:42 2013 - [info] Alive Slaves:
Fri Feb 15 10:52:42 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:52:42 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:52:42 2013 - [info]   192.168.10.233(192.168.10.233:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:52:42 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:52:42 2013 - [info] Current Alive Master: 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:52:42 2013 - [info] Checking slave configurations..
Fri Feb 15 10:52:42 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.229(192.168.10.229:3306).
Fri Feb 15 10:52:42 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.233(192.168.10.233:3306).
Fri Feb 15 10:52:42 2013 - [info] Checking replication filtering settings..
Fri Feb 15 10:52:42 2013 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Feb 15 10:52:42 2013 - [info]  Replication filtering check ok.
Fri Feb 15 10:52:42 2013 - [info] Starting SSH connection tests..
Fri Feb 15 10:52:46 2013 - [info] All SSH connection tests passed successfully.
Fri Feb 15 10:52:46 2013 - [info] Checking MHA Node version..
Fri Feb 15 10:52:47 2013 - [info]  Version check ok.
Fri Feb 15 10:52:47 2013 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 15 10:52:48 2013 - [info] HealthCheck: SSH to 192.168.10.228 is reachable.
Fri Feb 15 10:52:49 2013 - [info] Master MHA Node version is 0.54.
Fri Feb 15 10:52:49 2013 - [info] Checking recovery script configurations on the current master..
Fri Feb 15 10:52:49 2013 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/tmp/mha/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000013
Fri Feb 15 10:52:49 2013 - [info]   Connecting to root@192.168.10.228(192.168.10.228)..
  Creating /tmp/mha if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000013
Fri Feb 15 10:52:49 2013 - [info] Master setting check done.
Fri Feb 15 10:52:49 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Feb 15 10:52:49 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='repl' --slave_host=192.168.10.229 --slave_ip=192.168.10.229 --slave_port=3306 --workdir=/tmp/mha --target_version=5.5.29-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Feb 15 10:52:49 2013 - [info]   Connecting to root@192.168.10.229(192.168.10.229:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Feb 15 10:52:50 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='repl' --slave_host=192.168.10.233 --slave_ip=192.168.10.233 --slave_port=3306 --workdir=/tmp/mha --target_version=5.5.29-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Fri Feb 15 10:52:50 2013 - [info]   Connecting to root@192.168.10.233(192.168.10.233:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Feb 15 10:52:51 2013 - [info] Slaves settings check done.
Fri Feb 15 10:52:51 2013 - [info]
192.168.10.228 (current master)
 +--192.168.10.229
 +--192.168.10.233

Fri Feb 15 10:52:51 2013 - [info] Checking master_ip_failover_script status:
Fri Feb 15 10:52:51 2013 - [info]   /usr/bin/master_ip_failover --virtual_ip=192.168.10.234 --orig_master_vip_eth=eth0:234 --new_master_vip_eth=eth0:234 --command=status --ssh_user=root --orig_master_host=192.168.10.228 --orig_master_ip=192.168.10.228 --orig_master_port=3306
DEBUG PARAMETERS***********
command => status
ssh_user=s => root
orig_master_host => 192.168.10.228
orig_master_ip => 192.168.10.228
orig_master_port => 3306
virtual_ip => 192.168.10.234
orig_master_vip_eth => eth0:234
new_master_vip_eth => eth0:234
Fri Feb 15 10:52:51 2013 - [info]  OK.
Fri Feb 15 10:52:51 2013 - [warning] shutdown_script is not defined.
Fri Feb 15 10:52:51 2013 - [info] Set master ping interval 3 seconds.
Fri Feb 15 10:52:51 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Feb 15 10:52:51 2013 - [info] Starting ping health check on 192.168.10.228(192.168.10.228:3306)..
Fri Feb 15 10:52:51 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Feb 15 10:55:57 2013 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Fri Feb 15 10:55:57 2013 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/tmp/mha/save_binary_logs_test --manager_version=0.55 --binlog_prefix=mysql-bin
Fri Feb 15 10:55:57 2013 - [warning] HealthCheck: SSH to 192.168.10.228 is NOT reachable.
Fri Feb 15 10:56:00 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Feb 15 10:56:00 2013 - [warning] Connection failed 1 time(s)..
Fri Feb 15 10:56:03 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Feb 15 10:56:03 2013 - [warning] Connection failed 2 time(s)..
Fri Feb 15 10:56:06 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Feb 15 10:56:06 2013 - [warning] Connection failed 3 time(s)..
Fri Feb 15 10:56:06 2013 - [warning] Master is not reachable from health checker!
Fri Feb 15 10:56:06 2013 - [warning] Master 192.168.10.228(192.168.10.228:3306) is not reachable!
Fri Feb 15 10:56:06 2013 - [warning] SSH is NOT reachable.
Fri Feb 15 10:56:06 2013 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha.cnf again, and trying to connect to all servers to check server status..
Fri Feb 15 10:56:06 2013 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Feb 15 10:56:06 2013 - [info] Reading application default configurations from /etc/mha.cnf..
Fri Feb 15 10:56:06 2013 - [info] Reading server configurations from /etc/mha.cnf..
Fri Feb 15 10:56:06 2013 - [info] Dead Servers:
Fri Feb 15 10:56:06 2013 - [info]   192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:06 2013 - [info] Alive Servers:
Fri Feb 15 10:56:06 2013 - [info]   192.168.10.229(192.168.10.229:3306)
Fri Feb 15 10:56:06 2013 - [info]   192.168.10.233(192.168.10.233:3306)
Fri Feb 15 10:56:06 2013 - [info] Alive Slaves:
Fri Feb 15 10:56:06 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:06 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:06 2013 - [info]   192.168.10.233(192.168.10.233:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:06 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:06 2013 - [info] Checking slave configurations..
Fri Feb 15 10:56:06 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.229(192.168.10.229:3306).
Fri Feb 15 10:56:06 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.10.233(192.168.10.233:3306).
Fri Feb 15 10:56:06 2013 - [info] Checking replication filtering settings..
Fri Feb 15 10:56:06 2013 - [info]  Replication filtering check ok.
Fri Feb 15 10:56:06 2013 - [info] Master is down!
Fri Feb 15 10:56:06 2013 - [info] Terminating monitoring script.
Fri Feb 15 10:56:06 2013 - [info] Got exit code 20 (Master dead).
Fri Feb 15 10:56:06 2013 - [info] MHA::MasterFailover version 0.55.
Fri Feb 15 10:56:06 2013 - [info] Starting master failover.
Fri Feb 15 10:56:06 2013 - [info]
Fri Feb 15 10:56:06 2013 - [info] * Phase 1: Configuration Check Phase..
Fri Feb 15 10:56:06 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] Dead Servers:
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:07 2013 - [info] Checking master reachability via mysql(double check)..
Fri Feb 15 10:56:07 2013 - [info]  ok.
Fri Feb 15 10:56:07 2013 - [info] Alive Servers:
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.229(192.168.10.229:3306)
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.233(192.168.10.233:3306)
Fri Feb 15 10:56:07 2013 - [info] Alive Slaves:
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:07 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.233(192.168.10.233:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:07 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:07 2013 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Feb 15 10:56:07 2013 - [info] Executing master IP deactivatation script:
Fri Feb 15 10:56:07 2013 - [info]   /usr/bin/master_ip_failover --virtual_ip=192.168.10.234 --orig_master_vip_eth=eth0:234 --new_master_vip_eth=eth0:234 --orig_master_host=192.168.10.228 --orig_master_ip=192.168.10.228 --orig_master_port=3306 --command=stop
DEBUG PARAMETERS***********
command => stop
orig_master_host => 192.168.10.228
orig_master_ip => 192.168.10.228
orig_master_port => 3306
virtual_ip => 192.168.10.234
orig_master_vip_eth => eth0:234
new_master_vip_eth => eth0:234
ssh: connect to host 192.168.10.228 port 22: Connection refused
Fri Feb 15 10:56:07 2013 - [info]  done.
Fri Feb 15 10:56:07 2013 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Feb 15 10:56:07 2013 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] * Phase 3: Master Recovery Phase..
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:107
Fri Feb 15 10:56:07 2013 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:07 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.233(192.168.10.233:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:07 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:07 2013 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:107
Fri Feb 15 10:56:07 2013 - [info] Oldest slaves:
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.229(192.168.10.229:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:07 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:07 2013 - [info]   192.168.10.233(192.168.10.233:3306)  Version=5.5.29-log (oldest major version between slaves) log-bin:enabled
Fri Feb 15 10:56:07 2013 - [info]     Replicating from 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] * Phase 3.3: Determining New Master Phase..
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Feb 15 10:56:07 2013 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Feb 15 10:56:07 2013 - [info] Searching new master from slaves..
Fri Feb 15 10:56:07 2013 - [info]  Candidate masters from the configuration file:
Fri Feb 15 10:56:07 2013 - [info]  Non-candidate masters:
Fri Feb 15 10:56:07 2013 - [info] New master is 192.168.10.229(192.168.10.229:3306)
Fri Feb 15 10:56:07 2013 - [info] Starting master failover..
Fri Feb 15 10:56:07 2013 - [info]
From:
192.168.10.228 (current master)
 +--192.168.10.229
 +--192.168.10.233

To:
192.168.10.229 (new master)
 +--192.168.10.233
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Feb 15 10:56:07 2013 - [info]
Fri Feb 15 10:56:07 2013 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Feb 15 10:56:07 2013 - [info] Starting recovery on 192.168.10.229(192.168.10.229:3306)..
Fri Feb 15 10:56:07 2013 - [info]  This server has all relay logs. Waiting all logs to be applied..
Fri Feb 15 10:56:07 2013 - [info]   done.
Fri Feb 15 10:56:07 2013 - [info]  All relay logs were successfully applied.
Fri Feb 15 10:56:07 2013 - [info] Getting new master's binlog name and position..
Fri Feb 15 10:56:07 2013 - [info]  mysql-bin.000009:379
Fri Feb 15 10:56:07 2013 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.10.229', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=379, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Feb 15 10:56:07 2013 - [info] Executing master IP activate script:
Fri Feb 15 10:56:07 2013 - [info]   /usr/bin/master_ip_failover --virtual_ip=192.168.10.234 --orig_master_vip_eth=eth0:234 --new_master_vip_eth=eth0:234 --command=start --ssh_user=root --orig_master_host=192.168.10.228 --orig_master_ip=192.168.10.228 --orig_master_port=3306 --new_master_host=192.168.10.229 --new_master_ip=192.168.10.229 --new_master_port=3306 --new_master_user='repl' --new_master_password='repl'
DEBUG PARAMETERS***********
command => start
ssh_user=s => root
orig_master_host => 192.168.10.228
orig_master_ip => 192.168.10.228
orig_master_port => 3306
new_master_host => 192.168.10.229
new_master_ip => 192.168.10.229
new_master_port => 3306
virtual_ip => 192.168.10.234
orig_master_vip_eth => eth0:234
new_master_vip_eth => eth0:234
Set read_only=0 on the new master.
Fri Feb 15 10:56:08 2013 - [info]  OK.
Fri Feb 15 10:56:08 2013 - [info] ** Finished master recovery successfully.
Fri Feb 15 10:56:08 2013 - [info] * Phase 3: Master Recovery Phase completed.
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] * Phase 4: Slaves Recovery Phase..
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] -- Slave diff file generation on host 192.168.10.233(192.168.10.233:3306) started, pid: 24673. Check tmp log /tmp/mha/192.168.10.233_3306_20130215105606.log if it takes time..
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] Log messages from 192.168.10.233 ...
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Feb 15 10:56:08 2013 - [info] End of log messages from 192.168.10.233.
Fri Feb 15 10:56:08 2013 - [info] -- 192.168.10.233(192.168.10.233:3306) has the latest relay log events.
Fri Feb 15 10:56:08 2013 - [info] Generating relay diff files from the latest slave succeeded.
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] -- Slave recovery on host 192.168.10.233(192.168.10.233:3306) started, pid: 24675. Check tmp log /tmp/mha/192.168.10.233_3306_20130215105606.log if it takes time..
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] Log messages from 192.168.10.233 ...
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] Starting recovery on 192.168.10.233(192.168.10.233:3306)..
Fri Feb 15 10:56:08 2013 - [info]  This server has all relay logs. Waiting all logs to be applied..
Fri Feb 15 10:56:08 2013 - [info]   done.
Fri Feb 15 10:56:08 2013 - [info]  All relay logs were successfully applied.
Fri Feb 15 10:56:08 2013 - [info]  Resetting slave 192.168.10.233(192.168.10.233:3306) and starting replication from the new master 192.168.10.229(192.168.10.229:3306)..
Fri Feb 15 10:56:08 2013 - [info]  Executed CHANGE MASTER.
Fri Feb 15 10:56:08 2013 - [info]  Slave started.
Fri Feb 15 10:56:08 2013 - [info] End of log messages from 192.168.10.233.
Fri Feb 15 10:56:08 2013 - [info] -- Slave recovery on host 192.168.10.233(192.168.10.233:3306) succeeded.
Fri Feb 15 10:56:08 2013 - [info] All new slave servers recovered successfully.
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] * Phase 5: New master cleanup phase..
Fri Feb 15 10:56:08 2013 - [info]
Fri Feb 15 10:56:08 2013 - [info] Resetting slave info on the new master..
Fri Feb 15 10:56:08 2013 - [info]  192.168.10.229: Resetting slave info succeeded.
Fri Feb 15 10:56:08 2013 - [info] Master failover to 192.168.10.229(192.168.10.229:3306) completed successfully.
Fri Feb 15 10:56:08 2013 - [info]

----- Failover Report -----

mha: MySQL Master failover 192.168.10.228 to 192.168.10.229 succeeded

Master 192.168.10.228 is down!

Check MHA Manager logs at local-dev1-vm004:/tmp/mha/log/mha.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.10.228.
The latest slave 192.168.10.229(192.168.10.229:3306) has all relay logs for recovery.
Selected 192.168.10.229 as a new master.
192.168.10.229: OK: Applying all logs succeeded.
192.168.10.229: OK: Activated master IP address.
192.168.10.233: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.10.233: OK: Applying all logs succeeded. Slave started, replicating from 192.168.10.229.
192.168.10.229: Resetting slave info succeeded.
Master failover to 192.168.10.229(192.168.10.229:3306) completed successfully.
フェイルオーバー完了!(+・`ー'・)ドヤ

※サーバーの状況を、先程フォアグランドで実行した際のフェイルオーバー後の確認と同じ手順で確認すること。

■mha-managerサーバーで状態を確認する

バックグラウンドで動いてたものが停止されている。
# masterha_check_status --conf=/etc/mha.cnf
mha is stopped(2:NOT_RUNNING).
[1]+  Done                    nohup masterha_manager --conf=/etc/mha.cnf < /dev/null > /tmp/mha/log/mha.log 2>&1

#再度実行してみる
# masterha_check_status --conf=/etc/mha.cnf
mha is stopped(2:NOT_RUNNING).
再度、監視を行いたい場合には、起動させてあげること!!!ド━━━━m9(゚∀゚)━━━━ン!!

その前に、フェイルオーバー完了から8時間以内(デフォルト)の場合には、
監視を再開できないようになっているため、監視完了フラグ用に?使っている作業ディレクトリにある
ファイル名の末尾にcompleteが付いているファイルを削除するか
起動時のオプションでignore_last_failoverまたは、last_failover_minuteを設定して、
このチェックを無効化すること!

以上(`・ω・´)ゞビシッ!!

■作業中にエラーになった点
ちょっと、具体的にどのタイミングで出たのかわからなくなってしまったのですが、
slaveからmasterになったサーバーにslaveの情報が残っている状態で、
masterになろうとすると、mha-managerがマルチマスターと認識してしまって、
次のようなエラーがでるみたいです。
Fri Feb 15 09:11:40 2013 - [info] MHA::MasterMonitor version 0.55.
Fri Feb 15 09:11:40 2013 - [warning] SQL Thread is stopped(no error) on 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:11:40 2013 - [info] Multi-master configuration is detected. Current primary(writable) master is 192.168.10.228(192.168.10.228:3306)
Fri Feb 15 09:11:40 2013 - [info] Master configurations are as below:
Master 192.168.10.233(192.168.10.233:3306), replicating from 192.168.10.228(192.168.10.228:3306), read-only
Master 192.168.10.228(192.168.10.228:3306), replicating from 192.168.10.233(192.168.10.233:3306)

Fri Feb 15 09:11:40 2013 - [warning] SQL Thread is stopped(no error) on 192.168.10.233(192.168.10.233:3306)
Fri Feb 15 09:11:40 2013 - [error][/usr/lib/perl5/vendor_perl/MHA/ServerManager.pm, ln677] Slave 192.168.10.229(192.168.10.229:3306) replicates from 192.168.10.233:3306, but real master is 192.168.10.228(192.168.10.228:3306)!
Fri Feb 15 09:11:40 2013 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations.  at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 300
Fri Feb 15 09:11:40 2013 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Fri Feb 15 09:11:40 2013 - [info] Got exit code 1 (Not master dead).
masterになっているサーバーでRESET MASTERをやったら直りました( ´∀`)bグッ!
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.10.228
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000011
          Read_Master_Log_Pos: 3185
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000011
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3185
              Relay_Log_Space: 535
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 0
1 row in set (0.00 sec)

mysql> RESET SLAVE;
Query OK, 0 rows affected (0.02 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.10.228
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysql-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File:
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 126
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 0
1 row in set (0.00 sec)

参考URL

0 件のコメント:

コメントを投稿