I have a Self-hosted Engine running on iSCSI as well as couple of Storage domains using
iSCSI, both the SE and those Storage Domains uses the same target portals (two).
I can see the iSCSI sessions and multipath working well from the Host point of view.
Yesterday after doing a restart for the “ovirt-engine” all the hosts besides the one that
runs the SE and the SPM went into “Unassigned” mode with an error stating the ovirt-engine
can’t communicate with the hosts.
Network wise, everything is good, I can reach all the ports, all the network is well
configured, so I ruled this.
Looking at the VDSM logs on those “Unassigned” hosts it looks like the VDSM can’t find the
Storage Pool.
(vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal,
task_id=217ec32b-591c-4376-8dc0-8d62200557ee (api:48)
(vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist':
[]} from=internal, task_id=217ec32b-591c-4376-8dc0-8d62200557ee (api:54)
(vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
If I look at the VDSM logs on the host where the SE and SPM is running, no issues there
and the node appears up (green) in the ovirt-engine UI.
I managed to set the hosted-engine to maintenance, shut it down and then start it again on
another Host, when it starts on that Host, the host goes “green” and if the SPM stays on
the previous host, I have two hosts working and the rest remains “Unassigned”.
All the “ovirt-ha-agent”/”ovirt-ha-broker” services seems ok, I restarted them, I also
tried to restart the VDSM on the hosts with no luck.
I have the VMs still running, I did shutdown one host (even used the “SSH restart” from
the WebUI) to see if that helps, it came back and still went into “Unassigned”.
It seems like the hosts can’t see the Storage pool.
Where should I start to troubleshoot this?
Thanks