vSAN Cluster Live-Migration to new vCenter instance

What can be done if the production vCenter Server appliance is damaged and you need to migrate a vSAN cluster to a new vCenter appliance?

In this post, I will show how to migrate a running vSAN cluster from one vCenter instance to a new vCenter under full load.

Anyone who works with vSAN will have a sinking feeling in their guts thinking about this. Why would one do such a thing? Wouldn’t it be better to put the cluster into maintenance mode? – In theory, yes. In practice, however, we repeatedly encounter constraints that do not allow a maintenance window in the near future.

Normally, vCenter Server appliances are solid and low-maintenance units. Either they work, or they are completely destroyed. In the latter case, a new appliance could be deployed and a configuration restore could be applied from the backup. None of this applied to a recent project. VCSA 6.7 was still working halfway, but key vSAN functionality was no longer operational in the UI. An initial idea to fix the problem with an upgrade to vCenter v7 and thus to a new appliance proved unsuccessful. Cross-vCenter migration of VMs (XVM) to a new vSAN cluster was also not possible, firstly because this feature was only available starting with version 7.0 update 1c, and secondly because only two new replacement hosts were available. Too few for a new vSAN cluster. To make things worse, the source cluster was also at its capacity limit.

There was only one possible way out: stabilize the cluster and transfer it to a new vCenter under full load.

There is an old, but still valuable post by William Lam on this topic. With this, and the VMware KB 2151610 article, I was able to work out a strategy that I would like to briefly outline here.

The process actually works because, once set up and configured, a vSAN cluster can operate autonomously from the vCenter. The vCenter is only needed for purposes of monitoring and configuration changes.

Caution. This method poses risks and requires thorough preparation. Anyone who is in doubt here should definitely contact VMware support.

Phase 1 – Getting cluster in proper condition

This phase could also be referred to as “pulling the skeletons out of the closet”. The prerequisite is a stable and compliant vSAN cluster. Even if the vCenter instance is faulty or damaged, all nodes must communicate without any problems. All object synchronizations in the background must be completed.

In the presented case, no reliable statement could be made from the vSphere Client, so the diagnostics had to be carried out on the CLI.

esxcli vsan network list

The command above lists all kernel adapters that are responsible for vSAN traffic.

To check the vSAN network connectivity, we use the command listed below. Specify the vSAN kernel port (here vmk1) and the vSAN kernel port IP of the destination host (shown here as x.x.x.x).

esxcli network diag ping -I vmk1 -H x.x.x.x

As an alternative, you can also use a the simple vmkping command.

vmkping -I vmk1 x.x.x.x

It is essential to test the communication with all neighboring hosts in the cluster.

Final check

esxcli vsan cluster get

It is important to note the number of cluster members listed here.

esxcli vsan health cluster list

The cluster, which was originally oversubscribed, was brought to a stabilized state (green) by provisioning two more hosts. It is now in a healthy state and the migration of the hosts can begin.

Phase 2 – Preparation of the target vCenter instance

The target vCenter must have at least the same build level as the source vCenter or higher.

register all licenses for vCenter, vSAN and ESXi in target vCenter
connect to Active-Directory if that connection existed in the source vCenter
create a datacenter object on target site
create a cluster object on target site and enable vSAN
enable HA and DRS (manual or semi-automatic mode) [optional]
configure deduplication and compression according to source vCenter settings

If encryption was enabled on the source side, the target vCenter must also be connected to the KMS and a trust must be established with the identical cluster ID.

Sometimes it is not obvious from the GUI which cluster features are enabled. In my above case, the parameters were not identifiable from the vSphere client. Again, we need to query the CLI. The code below is a one-line command.

esxcli vsan storage list | grep -i 'Device\|Is ssd\|Is Capacity Tier\|DEDUPLICATION\|COMPRESSION\|In CMMDS\|ENCRYPTION:' |sed 'N;N;N;N;;N;N;s/\n//g' | sort -k9;

The command above displays on the CLI a table with active, or inactive cluster properties like deduplication and compression or encryption.

Phase 3 – Migrate storage policies and vDS-settings

Export storage policies

Check policies in use

There can be a large number of storage policies in a vSAN cluster. However, only the ones that are actually in use are of significance.

esxcli vsan debug object list | grep spbmProfileId | sort | uniq

Only policies that have been applied to an object will be returned.

You can export all policies with a PowerCLI command.

Get-SpbmStoragePolicy | Export-SpbmStoragePolicy -FilePath C:\temp\

This command exports all policies as XML files. File name equals the policy name. For example SP-ErasureCoding-R5.xml.

Importing the storage-policies

Connect a PowerCLI session to the target vCenter instance and import policy (xml) files.

Import-SpbmStoragePolicy -Name "MyPolicy" -FilePath C:\temp\MyPolicy.xml

After transferring the storage policies, the distributed vSwitches must also be exported and imported to the target.

Export of vDS configuration

We now export the vDistributed Switch (vDS) settings from the source cluster.

Networking > select vDS > Settings > Export Configuration

The configuration of the switch will be downloaded to the client as a ZIP archive.

Import of vDS configuration

In the target vCenter, select Datacenter > Distributed Switch > Import Distributed Switch. We’ll import the previously exported ZIP file. In the import dialog, do not select the option “Preserve original distributed switch port group identifiers“.

Phase 4 – Migrate Hosts

The transfer of the host to the new vCenter is carried out one at a time. To keep the vSAN cluster intact in the process, we need to put it in protected mode. ClusterMemberListUpdates from the vCenter are ignored from now on. Meaning, no host is going to leave the cluster and no host is going to be added. This is a crucial point, because we will remove hosts from the source vCenter one by one. Under normal conditions, this will result in member list updates by the vCenter to the remaining hosts and would split our cluster. Therefore, we instruct the hosts to ignore these member list updates coming from the vCenter.

In order to do this, either execute the command shown below on each host, or activate it globally in the cluster via using PowerCLI.

esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListUpdates

We can use PowerCLI as an alternative. You need to adjust the name of the vSAN-Cluster (here: vSAN-Cluster).

Foreach ($vmhost in (Get-Cluster -Name vSAN-Cluster | Get-VMHost)) { $vmhost | Get-AdvancedSetting -Name "VSAN.IgnoreClusterMemberListUpdates" | Set-AdvancedSetting -Value 1 -Confirm:$false }

If executed successfully, the PowerCLI command returns one row of results per host.

Before we start the host migration, DRS in the source cluster should be set to either semi-automatic, or manual.

A final check on the cluster’s health is always a good idea, and a complete backup must also be in place. After all, we are migrating a vSAN cluster under all sails. Even with the most carefull planning, the following holds true:

“The force of shit happens will be with you – always”.
Obi vSAN Kenobi

Migrate Host

The following procedure is executed with each host one after the other. The sequence is marked with sequence start and sequence end.

Sequence Start

We will disconnect the first host from the source vCenter, acknowledge the warning and after the task is complete, remove the host from inventory.

In target vCenter we’ll select the datacenter object (not the vSAN-Cluster!) and add the host.

enter FQDN of the host
enter root password
accept certificate warning
check host details
Assign license. The host usually comes with its original license. This can be reassigned.
configure lockdown mode (disable)
choose datacenter as VM-target
read summary
finish

After the action is complete, the host is located outside the new vSAN cluster. We now drag it into the vSAN cluster object by using the mouse. This intermediate step is necessary because a direct import into the vSAN cluster would trigger a maintenance mode on the host. This must not happen since we are actively running VMs on the host. However, the move action doesn’t trigger a maintenance mode.

At this point, it is recommended to check the vSAN network connectivity to all hosts residing in the old vCenter.

vmkping -I vmk1 x.x.x.x

Adjust vSAN kernel port (source) and target IP address of other cluster members.

Add host to imported vDS

Our vSAN network communication remains functional even though the vDS in the new vCenter is still empty. This is because a distributed vSwitch creates “hidden” standard vSwitches on each host. These move with the host and remain active. In order to be able to manage and monitor the vDS properties of the host in the future, we add it to the imported vDS.

Network > select vDS
Add and manage hosts
select migrated host
define uplinks (same as before)
assign kernel ports to port groups (vSAN, vMotion, Provosioning, etc)
assign VM-networks (if applicable)
check summary
finish

Sequence End

Repeat this procedure with each host until they are all in the new vSAN cluster.

Once the transfer of all hosts is complete, we can exit the protected mode that we had enabled from the beginning. This can be done either individually on each host, or centrally via PowerCLI.

esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListUpdates

The alternative PowerCLI command is practically the same as the one at the beginning. The only difference is that the value is set to 0 (= disabled). Adjust the vSAN cluster name accordingly.

Foreach ($vmhost in (Get-Cluster -Name vSAN-Cluster | Get-VMHost)) { $vmhost | Get-AdvancedSetting -Name "VSAN.IgnoreClusterMemberListUpdates" | Set-AdvancedSetting -Value 0 -Confirm:$false }

After completing the action, two vDS by the same name can be found on all hosts. One holds the hosts and port groups, the other is a leftover from the transfer and can be deleted.

Conclusion

All hosts were migrated to the new vCenter. During this process, all VMs were available without interruption.

It’s important to have a viable backup plan B (and perhaps a plan C) in addition to your plan A. Changing conditions or hidden constraints may require a change of strategy. If you are forced to abandon the silver bullet, an alternative approach that has been developed provides additional security.