Matthew, good morning.
Is iSCSI Target configured with ACL?
Do all Gateways have the same amount of active sessions? It could be that
some Gateway has crashed the sessions (specifically gateway 3).
If you are not actually using Storage Domain on iSCSI, I recommend the
following:
1- Logout through oVirt
2- Check if there is still an initiator in the multipath on each oVirt Host
3- Log out of all sessions and delete through iscsiadm on each oVirt Host
4- Check if there is still an active session in CEPH
5- Restart all Gateway Daemons in CEPH, it may take a while if there is a
stuck session
6- Try to perform Discovery again through oVirt
Em ter., 29 de nov. de 2022 às 03:28, Matthew J Black <
matthew(a)peregrineit.net> escreveu:
Hi All,
I've got some issues with connecting my oVirt Cluster to my Ceph Cluster
via iSCSI. There are two issues, and I don't know if one is causing the
other, if they are related at all, or if they are two separate, unrelated
issues. Let me explain.
The Situation
-------------
- I have a working three node Ceph Cluster (Ceph Quincy on Rocky Linux 8.6)
- The Ceph Cluster has four Storage Pools of between 4 and 8 TB each
- The Ceph Cluster has three iSCSI Gateways
- There is a single iSCSI Target on the Ceph Cluster
- The iSCSI Target has all three iSCSI Gateways attached
- The iSCSI Target has all four Storage Pools attached
- The four Storage Pools have been assigned LUNs 0-3
- I have set up (Discovery) CHAP Authorisation on the iSCSI Target
- I have a working three node self-hosted oVirt Cluster (oVirt v4.5.3 on
Rocky Linux 8.6)
- The oVirt Cluster has (in addition to the hosted_storage Storage Domain)
three GlusterFS Storage Domains
- I can ping all three Ceph Cluster Nodes to/from all three oVirt Hosts
- The iSCSI Target on the Ceph Cluster has all three oVirt Hosts
Initiators attached
- Each Initiator has all four Ceph Storage Pools attached
- I have set up CHAP Authorisation on the iSCSI Target's Initiators
- The Ceph Cluster Admin Portal reports that all three Initiators are
"logged_in"
- I have previous connected Ceph iSCSI LUNs to the oVirt Cluster
successfully (as an experiment), but had to remove and re-instate them for
the "final" version(?).
- The oVirt Admin Portal (ie HostedEngine) reports that Initiators are 1 &
2 (ie oVirt Hosts 1 & 2) are "logged_in" to all three iSCSI Gateways
- The oVirt Admin Portal reports that Initiator 3 (ie oVirt Host 3) is
"logged_in" to iSCSI Gateways 1 & 2
- I can "force" Initiator 3 to become "logged_in" to iSCSI Gateway 3,
but
when I do this it is *not* persistent
- oVirt Hosts 1 & 2 can/have discovered all three iSCSI Gateways
- oVirt Hosts 1 & 2 can/have discovered all four LUNs/Targets on all three
iSCSI Gateways
- oVirt Host 3 can only discover 2 of the iSCSI Gateways
- For Target/LUN 0 oVirt Host 3 can only "see" the LUN provided by iSCSI
Gateway 1
- For Targets/LUNs 1-3 oVirt Host 3 can only "see" the LUNs provided by
iSCSI Gateways 1 & 2
- oVirt Host 3 can *not* "see" any of the Targets/LUNs provided by iSCSI
Gateway 3
- When I create a new oVirt Storage Domain for any of the four LUNs:
- I am presented with a message saying "The following LUNs are already
in use..."
- I am asked to "Approve operation" via a checkbox, which I do
- As I watch the oVirt Admin Portal I can see the new iSCSI Storage
Domain appear in the Storage Domain list, and then after a few minutes it
is removed
- After those few minutes I am presented with this failure message:
"Error while executing action New SAN Storage Domain: Network error during
communication with the Host."
- I have looked in the engine.log and all I could find that was relevant
(as far as I know) was this:
~~~
2022-11-28 19:59:20,506+11 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand]
(default task-1) [77b0c12d] Command 'CreateStorageDomainVDSCommand(HostName
= ovirt_node_1.mynet.local,
CreateStorageDomainVDSCommandParameters:{hostId='967301de-be9f-472a-8e66-03c24f01fa71',
storageDomain='StorageDomainStatic:{name='data',
id='2a14e4bd-c273-40a0-9791-6d683d145558'}',
args='s0OGKR-80PH-KVPX-Fi1q-M3e4-Jsh7-gv337P'})' execution failed:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues
2022-11-28 19:59:20,507+11 ERROR
[org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand]
(default task-1) [77b0c12d] Command
'org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand'
failed: EngineException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues (Failed with error VDS_NETWORK_ERROR and
code 5022)
~~~
I cannot see/detect any "communication issue" - but then again I'm not
100% sure what I should be looking for
I have looked on-line for an answer, and apart from not being able to get
past Red Hat's "wall" to see the solutions that they have, all I could
find
that was relevant was this:
https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/AVLORQNOLJHR...
. If this *is* relevant then there is not enough context here for me to
proceed (ie/eg *where* (which host/vm) should that command be run?).
I also found (for a previous version of oVirt) notes about modifying the
Postgres DB manual to resolve a similar issue. While I am more than
comfortable doing this (I've been an SQL DBA for well over 20 years) this
seems like asking for trouble - at least until I hear back from the oVirt
Devs that this is OK to do - and of course, I'll need the relevant commands
/ locations / authorisations to get into the DB.
Questions
---------
- Are the two issues (oVirt Host 3 not having a full picture of the Ceph
iSCSI environment and the oVirt iSCSI Storage Domain creation failure)
related?
- Do I need to "refresh" the iSCSI info on the oVirt Hosts, and if so, how
do I do this?
- Do I need to "flush" the old LUNs from the oVirt Cluster, and if so, how
do I do this?
- Where else should I be looking for info in the logs (& which logs)?
- Does *anyone* have any other ideas how to resolve the situation -
especially when using the Ceph iSCII Gateways?
Thanks in advance
Cheers
Dulux-Oz
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MCOJD4R6PS4...