Just a little mind-boggling thing I’ve been troubled with over the last months and finally I realised what the issue was. I’ve been troubleshooting an issue on one of my ESXi clusters which uses HPE StoreVirtual VSA and Synology as datastores.
During this troubleshooting (trouble was the Synology – looong sad story) I’ve had to reboot the hosts numerous times and every time I’ve been forced to shut down the network ports on the switch, that holds the iSCSI kernels’ network ports, otherwise the hosts wouldn’t boot it would just stay up and look as if it was waiting for a timeout. It might be that this timeout would’ve occurred if I had waited long enough, but I found myself patiently waiting for 10-20 minutes it seemed at the time… Suddenly I read and understood the message it was giving me: My set up is I’m running 3 Storevirtual datastores with Network-RAID10, 1 without Network RAID and 3 Synology datastores. When I wanted to reboot, I shut down the VSA appliance on the host and all my VMs were moved to the other host – but the issue was that since I had 1 datastore without Network RAID, this goes offline whenever I shut down any 1 of my VSA appliances.
This creates the issue, that the ESXi host cannot tell that volume that it is going down for a reboot and for some reason this is bad for the host – even though there is nothing using that datastore on the host. The VMs that have a VMDK on that datastore just shouldn’t use data to or from that VMDK while it’s offline and then there is no issue. The problem is that I cannot turn on the network ports again, until the host is completely booted because it also MUST for some reason connect to this offline volume under start-up, it just stays saying starting up ISCSI in the ESXi boot screen. If I keep the ports offline, the host boots quickly and then I can turn on the VSA appliance and the network ports and do a rescan of the iSCSI adapter and it’ll connect to the datastores – but it sometimes has the error message “All shared datastores has failed” on the Summary page.
To me this seems like a design problem and it would seem to me that a lot of people could have this potential issue/bad experience if using volumes without Network-RAID10?