Run a Test Failover
It is recommended that you run a regular test failover to check that your disaster recovery protection is configured correctly, and the VMs are correctly replicated at the recovery site.
During a test, real time replication doesn't need to be stopped and the production workload is protected. There are two types of failover tests:
- Test failover is a clean shutdown (cleanly completes final sync with source) and failover.
- Real failover is a non-clean (drop everything, don’t check source) and failover. See Run a Real Failover
It is also recommended to test all the VPGs being recovered to the same cluster together. For example: High availability configuration in a cluster includes admission control (to prevent VMs being started if they violate availability constraints). Testing the failover of every VPG configured for recovery to this cluster, at the same time, shows whether the constraints are violated or not.
Topics
- The Test Failover Process
- Start a Test Failover
- Monitor the Test Failover Status
- Stop a Test Failover
- View Test Results
The table below describes what happens during a test failover:
Stage | Description |
Start the Test Failover |
Single or multiple VPGs can be tested. The test VMs are:
Note: By default, test VMs are started with the same IPs as the protected VMs in the protected site. To avoid clashes, you need to ensure that different IPs are assigned to the VMs when they start, by configuring the VM NIC properties in the VPG. If you have defined the new VMs so that they are assigned different IPs, the re-IP cannot be performed until the new machine is started. Virtual replication changes the machine IPs, and then reboots these machines, with their new IPs. |
During the Test |
The VMs in the VPG are created as test machines in a sandbox. They are powered on for testing using the test network specified in the VPG definition and use the virtual disks managed by the VRA.
While a test is running:
|
Stop the Failover Test |
The test VMs are powered off and removed from the inventory. |
Follow these steps to start a test failover:
1. From your target datacentre, open the Silver-lining DR self service portal. At the bottom right-hand side of the screen, set the operation to Test and click Failover.
2. The Select VPGs screen appears. Select the VPG name/s to test, then Next. By default, all VPGs are listed. At the bottom of the screen, the selection details show the amount of data and the total number of VM's selected. The Direction arrow shows the direction of the process: from the protected site to the peer recovery site.
3. The Execution Parameters screen appears. By default, the latest checkpoint added to the journal is displayed. If you want to:
- use this checkpoint, click Next and go to Step 7 below.
- use one of the checkpoints from the last 3 days, click on the checkpoint that is displayed.
4. The Operations Checkpoints screen appears. Select the Checkpoint you want to fail back to as a test and click OK. To locate a specific checkpoint, use the table below.
5. To locate a specific checkpoint, select from the following options (as shown in the screenshot above).
Filter option | Description |
Latest |
The recovery, or clone, is to the latest checkpoint. This ensures that data is crash-consistent for the recovery or clone. If a checkpoint is added between this point and starting the failover or clone, the later checkpoint is not used. |
Latest Tagged Checkpoint |
The recovery operation is to the latest checkpoint created manually. Checkpoints added to the VM journals in the VPG by the Zerto Virtual Manager ensure that data is crash-consistent to this point. If a checkpoint is added between this point and starting the operation, this later checkpoint is not used. |
Select from all available checkpoints | By default, this option displays all checkpoints in the system. |
Refresh the list. |
6. The Execution Parameters screen reappears showing the selected options checkpoint. Click Next.
7. The Failover Test screen appears. The topology shows the number of VPGs and vApps being tested to failover to each recovery site. In the following example, one VPG will be failed over to the WPD2 site, and contains 2 vApps.
8. Click Start Failover Test. The test begins an initialisation period, during which the vApps are created in the target (recovery) site.
Note: Any changes made in production are still able to be made in the target site. You still have the option to do a live failover which will include any changes up to that point.
9. The Silver-lining DR self service portal shows 'Testing Failover' in the Operation column of the VPGs being tested, and at the bottom left of the screen.
10. Once the failover test has completed, the vApps will appear in the target site. They are powered on but isolated from affecting the live workload. It is possible to interact with and make changes to these vApps, but any changes will be lost when the failover test is completed. When Stop Failover Test is selected, it deletes the vApps and removes any changes that were made inside the test. See Stop a Failover Test below.
Source protected vApp during a failover test:
vApp at target site during a failover test:
Follow these steps to monitor a test failover:
1. In the Silver-lining DR self service portal, click the VPGs tab to monitor the status of a failover test.
2. In the General view, the Operation field displays 'Testing Failover' and the completion percentage when a failover test is being performed.
3. Click on the name of a VPG you are testing. A dynamic tab is created displaying the specific VPG details including the status of the failover test.
Follow these steps to stop a test failover:
1. In the Silver-lining DR self service portal, select the VPGs tab, then click the Stop icon in the Operation column.
2. The Stop Test screen appears. Use the table below as a guide to complete the fields, then click Stop.
Field | Description |
Result | Specify whether the test succeeded or failed. |
Notes |
Add a description of the test (optional). For example, you can specify where the external files are saved which describe the tests performed. Notes are limited to 255 characters. |
3. After stopping a test, the following events occur:
- The vApps in the recovery site are powered off and removed.
- The checkpoint that was used for the test has the following tag added to identify the test: 'Tested at StartDateAndTimeOfTest'. This checkpoint can be used to identify the point in time to use to restore the VMs in the VPG during a failover.
- In vCloud Director, the Testing Recovery vApp will be removed, including the testing VMs inside it. The original VMs remain untouched and unchanged.
The date and time of the last test is displayed in a column in the VPGs and VMs tabs.