DataCore SANsymphony offers software defined storage with transparent mirror in active/active mode.
Recently released version 10 PSP7 now supports a witness to avoid split-brain scenarios.
The Problem
In cases where both DataCore hosts (DC1, DC2) lose mirror (MIR) paths and LAN-connection, a split brain scenario occurs.
Both hosts remain functional and have a fully intact set of data on their storage. Both hosts can handle I/O from initiators in their (split) region. Both datastores receive writes that cannot be mirrored to the opposite site. Those changes cannot be synced if the mirror comes up again.
To rebuild the mirror one site has to be declared master and all changes will be copied in a full recovery to the mirror site. Which means all writes that have been made to the mirror side before, will be lost.
Witness
Here comes the witness. As soon as one DataCore host loses SAN-mirror and LAN-connection to its mirror partner, it will try to reach the witness. If there’s no communication to the witness the host will refuse all I/O to its datastore.
Scenario 1 – without witness
Below I’ve outlined a scenario with two sites in active/active mode. Hosts (H1-H4) are using their preferred local DataCore Servers, but can also switch to the remote site in case of a fault condition.
By breaking the fibre connection between site1 and site2, the Inter-Switch-Link (ISL) of the SAN, the mirror-link (MIR) and the LAN-connection are interrupted. Both datacenters now continue to work autonomously without data synchronization.
As a result both sides of the mirror are divergent and can’t be resynchronized by a log recovery. The only chance to get in sync again, is to declare one side as master and discard the changes on the other side. Dataloss is the result.
Scenario 2 – with witness
Same scenario, but this time with a witness (W) on site1 close to DataCore Server DC1.
Right after breaking the fibre links between sites, DC1 is still able to contact the witness. It will continue to present LUNs to all initiators. Because the SAN ISL is broken, DC1 can only be reached by its local hosts from site1. DataCore host 2 (DC2) has lost all connections to DC1 and also to the witness. It will instantly stop access to its datastores.
Hosts on site2 will face an APD, but data on DC2’s storage will remain frozen and consistent. Once reestablishing the fibre links between site1 and site2, both sides of the mirror can be resynced by a log recovery (green delta arrow) .
Setup
SANsymphony version 10 PSP7 is a prerequisite for having a witness. After installation/update to PSP7 you need to do blind activation of your license keys. I.E. reactivating the cluster without adding new keys. After successful re-activation you’ll see the new witness feature on the license tab.
To define a witness you need PowerShell cmdlets. There’s no GUI yet. Open a Powershell on one of the DataCore hosts and connect to the DataCore server. In the example below the names of both datacore servers are sds1 and sds2 respectively. Witness is a physical server (Veeam Backup proxy on site1)
The first step is essential and wasn’t well documented.
Connect-DCSserver sds1
Now you can define the witness. Choose a name for it. The witness must reply to ping requests.
Add-DcsWitness -Name "witness 1" -Address "172.22.7.110"
Now let’s check if both hosts can reach the witness.
Invoke-DcsWitnessContact
The cmdlet will ask for the given name of the witness (witness 1).
cmdlet Invoke-DcsWitnessContact at command pipeline position 1 Supply values for the following parameters: Witness: witness 1 ServerId WitnessId ResponseStatus -------- --------- -------------- 7233CB41-D6A1-4730-B9DB... 79506f86bfa748779b71eb1... Success 1503818E-E8E2-4370-9130... 79506f86bfa748779b71eb1... Success
Both DataCore hosts can reach the witness.
We can check an existing witness with the command below.
PS C:\Program Files\DataCore\SANsymphony> Get-DcsWitness Alias : witness 1 IPAddress : 172.22.7.110 SequenceNumber : 97879 Id : 79506f86bfa748779b71eb12c1ad45b6 Caption : witness 1 ExtendedCaption : Internal : False
Use witness as default for all vDisks
Now the witness is defined, you can add it to each vDisk individually. But usually it is enough to set it as a default for all vDisks.
Set-DcsServerGroupDefaultWitnessProperties -Address "172.22.7.110" OurGroup : True Alias : Server Group Description : State : Present SmtpSettings : DataCore.Executive.SmtpServerSettings LicenseSettings : DataCore.Executive.LicenseSettings LicenseType : Regular ContactData : DataCore.Executive.ContactData StorageUsed : 49.82 TB BulkStorageUsed : 0 B MaxStorage : 100 TB RecoverySpeed : 32 ExistingProductKeys : {, , , ...} DataCoreStorageUsed : 0 B SupportBundleRelayAddress : MirrorTrunkMappingEnabled : False SelfHealingDelay : 480 DefaultWitness : f33551cf965e4dc887abe750130db21a DefaultWitnessOption : Automatic WitnessAllowed : True SequenceNumber : 103302 Id : 1f929299-c9f9-419c-a7e9-8c7c694102b5 Caption : Server Group ExtendedCaption : Server Group Internal : False
Conclusion
Customers have long asked for a witness feature to prevent split-brain scenarios. We know witnesses from other high available solutions like vSAN or vCenter-HA. DataCore finally made an important step towards data resiliency. Next topics on a wish list would be the ability to setup the witness from the DataCore console and to improve documentation for this new feature.