Matthew, good morning.

Is iSCSI Target configured with ACL?
Do all Gateways have the same amount of active sessions? It could be that some Gateway has crashed the sessions (specifically gateway 3).

If you are not actually using Storage Domain on iSCSI, I recommend the following:
1- Logout through oVirt
2- Check if there is still an initiator in the multipath on each oVirt Host
3- Log out of all sessions and delete through iscsiadm on each oVirt Host
4- Check if there is still an active session in CEPH
5- Restart all Gateway Daemons in CEPH, it may take a while if there is a stuck session
6- Try to perform Discovery again through oVirt

Em ter., 29 de nov. de 2022 às 03:28, Matthew J Black <matthew@peregrineit.net> escreveu:
Hi All,

I've got some issues with connecting my oVirt Cluster to my Ceph Cluster via iSCSI. There are two issues, and I don't know if one is causing the other, if they are related at all, or if they are two separate, unrelated issues. Let me explain.

The Situation
-------------
- I have a working three node Ceph Cluster (Ceph Quincy on Rocky Linux 8.6)
- The Ceph Cluster has four Storage Pools of between 4 and 8 TB each
- The Ceph Cluster has three iSCSI Gateways
- There is a single iSCSI Target on the Ceph Cluster
- The iSCSI Target has all three iSCSI Gateways attached
- The iSCSI Target has all four Storage Pools attached
- The four Storage Pools have been assigned LUNs 0-3
- I have set up (Discovery) CHAP Authorisation on the iSCSI Target
- I have a working three node self-hosted oVirt Cluster (oVirt v4.5.3 on Rocky Linux 8.6)
- The oVirt Cluster has (in addition to the hosted_storage Storage Domain) three GlusterFS Storage Domains
- I can ping all three Ceph Cluster Nodes to/from all three oVirt Hosts
- The iSCSI Target on the Ceph Cluster has all three oVirt Hosts Initiators attached
- Each Initiator has all four Ceph Storage Pools attached
- I have set up CHAP Authorisation on the iSCSI Target's Initiators
- The Ceph Cluster Admin Portal reports that all three Initiators are "logged_in"
- I have previous connected Ceph iSCSI LUNs to the oVirt Cluster successfully (as an experiment), but had to remove and re-instate them for the "final" version(?).
- The oVirt Admin Portal (ie HostedEngine) reports that Initiators are 1 & 2 (ie oVirt Hosts 1 & 2) are "logged_in" to all three iSCSI Gateways
- The oVirt Admin Portal reports that Initiator 3 (ie oVirt Host 3) is "logged_in" to iSCSI Gateways 1 & 2
- I can "force" Initiator 3 to become "logged_in" to iSCSI Gateway 3, but when I do this it is *not* persistent
- oVirt Hosts 1 & 2 can/have discovered all three iSCSI Gateways
- oVirt Hosts 1 & 2 can/have discovered all four LUNs/Targets on all three iSCSI Gateways
- oVirt Host 3 can only discover 2 of the iSCSI Gateways
- For Target/LUN 0 oVirt Host 3 can only "see" the LUN provided by iSCSI Gateway 1
- For Targets/LUNs 1-3 oVirt Host 3 can only "see" the LUNs provided by iSCSI Gateways 1 & 2
- oVirt Host 3 can *not* "see" any of the Targets/LUNs provided by iSCSI Gateway 3
- When I create a new oVirt Storage Domain for any of the four LUNs:
  - I am presented with a message saying "The following LUNs are already in use..."
  - I am asked to "Approve operation" via a checkbox, which I do
  - As I watch the oVirt Admin Portal I can see the new iSCSI Storage Domain appear in the Storage Domain list, and then after a few minutes it is removed
  - After those few minutes I am presented with this failure message: "Error while executing action New SAN Storage Domain: Network error during communication with the Host."
- I have looked in the engine.log and all I could find that was relevant (as far as I know) was this:
~~~
2022-11-28 19:59:20,506+11 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-1) [77b0c12d] Command 'CreateStorageDomainVDSCommand(HostName = ovirt_node_1.mynet.local, CreateStorageDomainVDSCommandParameters:{hostId='967301de-be9f-472a-8e66-03c24f01fa71', storageDomain='StorageDomainStatic:{name='data', id='2a14e4bd-c273-40a0-9791-6d683d145558'}', args='s0OGKR-80PH-KVPX-Fi1q-M3e4-Jsh7-gv337P'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues

2022-11-28 19:59:20,507+11 ERROR [org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand] (default task-1) [77b0c12d] Command 'org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
~~~

I cannot see/detect any "communication issue" - but then again I'm not 100% sure what I should be looking for

I have looked on-line for an answer, and apart from not being able to get past Red Hat's "wall" to see the solutions that they have, all I could find that was relevant was this: https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/AVLORQNOLJHRWMHTM4WCDRVP7VSIZBGR/ . If this *is* relevant then there is not enough context here for me to proceed (ie/eg *where* (which host/vm) should that command be run?).

I also found (for a previous version of oVirt) notes about modifying the Postgres DB manual to resolve a similar issue. While I am more than comfortable doing this (I've been an SQL DBA for well over 20 years) this seems like asking for trouble - at least until I hear back from the oVirt Devs that this is OK to do - and of course, I'll need the relevant commands / locations / authorisations to get into the DB.

Questions
---------
- Are the two issues (oVirt Host 3 not having a full picture of the Ceph iSCSI environment and the oVirt iSCSI Storage Domain creation failure) related?
- Do I need to "refresh" the iSCSI info on the oVirt Hosts, and if so, how do I do this?
- Do I need to "flush" the old LUNs from the oVirt Cluster, and if so, how do I do this?
- Where else should I be looking for info in the logs (& which logs)?
- Does *anyone* have any other ideas how to resolve the situation - especially when using the Ceph iSCII Gateways?

Thanks in advance

Cheers

Dulux-Oz
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MCOJD4R6PS4BKUTUM3BXYWSX5RDPWR2N/