After a failed firmware update on my Intel x722 NICs one host came up without its 10 Gbit kernelports (vSAN Network). Every effort of recovery failed and I had to send in my “bricked” host to Supermicro. Normally this shouldn’t be a big issue in a 4-node cluster. But the fact that management interfaces were up and vSAN interfaces were not must have caused some “disturbance” on the cluster and all my VM objects were marked as “invalid” on the 3 remaining hosts.
I was busy on projects so I didn’t have much lab-time anyway, so I waited for the repair of the 4th host. Last week it finally arrived and I instantly assembled boot media, cache and capacity disks. I checked MAC addresses and settings on the repaired host and everything looked good. But after booting the reunited cluster still all objects were marked invalid.
Time for troubleshooting
First I opened SSH shells to each host. There’s a quick powerCLI one-liner to enable SSH throughout the cluster. Too bad I didn’t have a functional vCenter at that time, so I had to activate SSH on each host with the host client.
From the shell of the repaired host I’ve checked the vSAN-Network connection to all other vSAN kernel ports . The command below pings from interface vmk1 (vSAN) to IP 10.0.100.11 (vSAN kernel port of esx01 for example)
vmkping -I vmk1 10.0.100.11
I received ping responses from all hosts on all vSAN kernel ports. So I could conclude there’s no connection issue in the vSAN-network.
Continue reading “vSAN Objects invalid”