failed to mount hosted engine gluster storage - how to debug?

Hello, I have an issue probably related to my particular implementation but I think some controls are missing Here is the story. I have a cluster of two nodes 4.4.10.3 with an upgraded kernel as the cpu (Ryzen 5) suffer from an incompatibility issue with the kernel provided by 4.4.10.x series. On each node there are three glusterfs "partitions" in replica mode, one for the hosted_engine, the other two are for user usage. The third node was an old i3 workstation only used to provide the arbiter partition to the glusterfs cluster. I installed a new server (ryzen processor) with 4.5.0 and successfully installed glusterfs 10.1 and inserted the arbiter bricks implemented on glusterfs 10.1 while the replica bricks are 8.6 after removing the old i3 provided bricks. I successfully imported the new node in the ovirt engine (after updating the engine to 4.5) The problem is that the ovirt-ha-broker doesn't start complaining that is not possible to connect the storage. (I suppose the hosted_engine storage) so I did some digs that I'm going to show here: #### 1. The node seem to be correctly configured: [root@ovirt-node3 devices]# vdsm-tool validate-config SUCCESS: ssl configured to true. No conflicts [root@ovirt-node3 devices]# vdsm-tool configure Checking configuration status... libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts sanlock is configured for vdsm Managed volume database is already configured lvm is configured for vdsm Current revision of multipath.conf detected, preserving Running configure... Done configuring modules to VDSM. [root@ovirt-node3 devices]# vdsm-tool validate-config SUCCESS: ssl configured to true. No conflicts #### 2. the node refuses to mount via hosted-engine (same error in broker.log) [root@ovirt-node3 devices]# hosted-engine --connect-storage Traceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/connect_storage_server.py", line 30, in <module> timeout=ohostedcons.Const.STORAGE_SERVER_TIMEOUT, File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 312, in connect_storage_server sserver.connect_storage_server(timeout=timeout) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 451, in connect_storage_server 'Connection to storage server failed' RuntimeError: Connection to storage server failed ##### 3. manually mount of glusterfs work correctly [root@ovirt-node3 devices]# grep storage /etc/ovirt-hosted-engine/hosted-engine.conf storage=ovirt-node2.ovirt:/gveng # The following are used only for iSCSI storage [root@ovirt-node3 devices]# [root@ovirt-node3 devices]# mount -t glusterfs ovirt-node2.ovirt:/gveng /mnt/tmp/ [root@ovirt-node3 devices]# ls -l /mnt/tmp total 0 drwxr-xr-x. 6 vdsm kvm 64 Dec 15 19:04 7b8f1cc9-e3de-401f-b97f-8c281ca30482 What else should I control? Thank you and sorry for the long message Diego

Hi, I think it may be a problem with vdsm and gluster 10, I’ve reported a similar issue in another thread. Vdsm is throwing an exception when parsing the XML from the gluster volume info when using the latest gluster version 10. This is particularly bad when the gluster server updates have been completed by moving the op-version, as it’s basically irreversible and it’s not even possible to easily downgrade gluster. If you downgrade to or use a gluster 8 works in fact. I’m digging the code of vdsm to see if I can find the root cause. Cheers, Alessandro
Il giorno 25 apr 2022, alle ore 09:24, diego.ercolani@ssis.sm ha scritto:
Hello, I have an issue probably related to my particular implementation but I think some controls are missing Here is the story. I have a cluster of two nodes 4.4.10.3 with an upgraded kernel as the cpu (Ryzen 5) suffer from an incompatibility issue with the kernel provided by 4.4.10.x series. On each node there are three glusterfs "partitions" in replica mode, one for the hosted_engine, the other two are for user usage. The third node was an old i3 workstation only used to provide the arbiter partition to the glusterfs cluster. I installed a new server (ryzen processor) with 4.5.0 and successfully installed glusterfs 10.1 and inserted the arbiter bricks implemented on glusterfs 10.1 while the replica bricks are 8.6 after removing the old i3 provided bricks. I successfully imported the new node in the ovirt engine (after updating the engine to 4.5) The problem is that the ovirt-ha-broker doesn't start complaining that is not possible to connect the storage. (I suppose the hosted_engine storage) so I did some digs that I'm going to show here:
#### 1. The node seem to be correctly configured: [root@ovirt-node3 devices]# vdsm-tool validate-config SUCCESS: ssl configured to true. No conflicts [root@ovirt-node3 devices]# vdsm-tool configure
Checking configuration status...
libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts sanlock is configured for vdsm Managed volume database is already configured lvm is configured for vdsm Current revision of multipath.conf detected, preserving
Running configure...
Done configuring modules to VDSM. [root@ovirt-node3 devices]# vdsm-tool validate-config SUCCESS: ssl configured to true. No conflicts
#### 2. the node refuses to mount via hosted-engine (same error in broker.log) [root@ovirt-node3 devices]# hosted-engine --connect-storage Traceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/connect_storage_server.py", line 30, in <module> timeout=ohostedcons.Const.STORAGE_SERVER_TIMEOUT, File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 312, in connect_storage_server sserver.connect_storage_server(timeout=timeout) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 451, in connect_storage_server 'Connection to storage server failed' RuntimeError: Connection to storage server failed
##### 3. manually mount of glusterfs work correctly [root@ovirt-node3 devices]# grep storage /etc/ovirt-hosted-engine/hosted-engine.conf storage=ovirt-node2.ovirt:/gveng # The following are used only for iSCSI storage [root@ovirt-node3 devices]# [root@ovirt-node3 devices]# mount -t glusterfs ovirt-node2.ovirt:/gveng /mnt/tmp/ [root@ovirt-node3 devices]# ls -l /mnt/tmp total 0 drwxr-xr-x. 6 vdsm kvm 64 Dec 15 19:04 7b8f1cc9-e3de-401f-b97f-8c281ca30482
What else should I control? Thank you and sorry for the long message Diego _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4LGBUOEBV7YNES...

I saw your report infact, they suggested to downgrade jdbc, for completeness I found also error report in vdsm.log while issuing "hosted-engine --connect-storage" corresponding to what you are noticing. I report the log except here if it can be useful. by the way, why vdsm it's searching for the storage engine storage UUID in a lvm volumegroup name? 2022-04-25 10:53:35,506+0200 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:47350 (protocoldetector:61) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:47350 (protocoldetector:125) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95) 2022-04-25 10:53:35,512+0200 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompserver:124) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [vdsm.api] START getStorageDomainInfo(sdUUID='7b8f1cc9-e3de-401f-b97f-8c281ca30482') from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:48) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache (resize=True) (sdc:80) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices (iscsi:462) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices: 0.01 seconds (utils:390) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices (hba:59) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices: 0.03 seconds (utils:390) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.multipath] Waiting until multipathd is ready (multipath:112) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:48) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] FINISH repoStats return={} from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:54) 2022-04-25 10:53:37,558+0200 WARN (periodic/3) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Waited 2.02 seconds for multipathd (tries=2, ready=2) (multipath:139) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices (multipath:220) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices: 0.00 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache: 2.07 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482 (sdc:171) 2022-04-25 10:53:37,643+0200 WARN (jsonrpc/3) [storage.lvm] All 1 tries have failed: LVM command failed: 'cmd=[\'/sbin/lvm\', \'vgs\', \'--devices\', \'/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300064E,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300066N,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300067L,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300230B\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--noheadings\', \'--units\', \'b\', \'--nosuffix\', \'--separator\', \'|\', \'--ignoreskippedcluster\', \'-o\', \'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name\', \'7b8f1cc9-e3de-401f-b97f-8c281ca30482\'] rc=5 out=[] err=[\' Volume group "7b8f1cc9-e3de-401f -b97f-8c281ca30482" not found\', \' Cannot process volume group 7b8f1cc9-e3de-401f-b97f-8c281ca30482\']' (lvm:482) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482: 0.06 seconds (utils:390) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [vdsm.api] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:52) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-131>", line 2, in getStorageDomainInfo File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2463, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 152, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 115, in produce domain.getRealDomain() File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 139, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 156, in _findDomain return findMethod(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 186, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) vdsm.storage.exception.StorageDomainDoesNotExist: Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') aborting: Task is aborted: "value=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) abortedcode=358" (task:1182) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.dispatcher] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) (dispatcher:83) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo failed (error 358) in 2.13 seconds (__init__:312) 2022-04-25 10:53:37,692+0200 INFO (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=7, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection': 'ovirt-node2.ovirt:/gveng', 'user': 'kvm', 'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'vfs_type': 'glusterfs'}]) from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:48) 2022-04-25 10:53:37,746+0200 ERROR (jsonrpc/4) [storage.storageServer] Could not connect to storage server (storageServer:92) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 90, in connect_all con.connect() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 233, in connect self.validate() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 365, in validate if not self.volinfo: File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 352, in volinfo self._volinfo = self._get_gluster_volinfo() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 405, in _get_gluster_volinfo self._volfileserver) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in glusterVolumeInfo File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=() err=[b'<cliOutput>\n <opRet>0</opRet>\n <opErrno>0</opErrno>\n <opErrstr />\n <volInfo>\n <volumes>\n <volume>\n <name>gveng</name>\n <id>aa080d92-215f-4b90-8fd4-2b60cff9f40e</id>\n <status>1</status>\n <statusStr>Started</statusStr>\n <snapshotCount>0</snapshotCount>\n <brickCount>3</brickCount>\n <distCount>1</distCount>\n <replicaCount>3</replicaCount>\n <arbiterCount>1</arbiterCount>\n <disperseCount>0</disperseCount>\n <redundancyCount>0</redundancyCount>\n <type>2</type>\n <typeStr>Replicate</typeStr>\n <transport>0</transport>\n <bricks>\n <brick uuid="e2b460d1-a0c6-4735-b82c-c5befdf31691">ovirt-node2.ovirt:/brick/glhosteng/gveng<name>ovirt-node2.ovirt:/brick/glhosteng/gveng</name><hostUuid>e2b460d1-a0c6-4735-b82c-c5befdf31691</hostUuid><isArbiter>0</isArbiter></brick>\n <b rick uuid="bff83488-7c84-4389-af47-27e3acdabd90">ovirt-node1.ovirt:/brick/glhosteng/gveng<name>ovirt-node1.ovirt:/brick/glhosteng/gveng</name><hostUuid>bff83488-7c84-4389-af47-27e3acdabd90</hostUuid><isArbiter>0</isArbiter></brick>\n <brick uuid="70823438-3804-4504-a148-d86f3ecc5f24">ovirt-node3.ovirt:/brickarbiter/gveng<name>ovirt-node3.ovirt:/brickarbiter/gveng</name><hostUuid>70823438-3804-4504-a148-d86f3ecc5f24</hostUuid><isArbiter>1</isArbiter></brick>\n </bricks>\n <optCount>31</optCount>\n <options>\n <option>\n <name>cluster.self-heal-daemon</name>\n <value>enable</value>\n </option>\n <option>\n <name>performance.client-io-threads</name>\n <value>on</value>\n </option>\n <option>\n <name>nfs.disable</name>\n <value>on</value>\n </option>\n <option>\n <name>transport.address-family</name>\n <value
inet</value>\n </option>\n <option>\n <name>storage.fips-mode-rchecksum</name>\n <value>on</value>\n </option>\n <option>\n <name>performance.quick-read</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.read-ahead</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.io-cache</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.low-prio-threads</name>\n <value>32</value>\n </option>\n <option>\n <name>network.remote-dio</name>\n <value>disable</value>\n </option>\n <option>\n <name>performance.strict-o-direct</name>\n <value>on</value>\n </option>\n <option>\n <name>cluster.eager-lock</name>\n <value>enable</valu e>\n </option>\n <option>\n <name>cluster.quorum-type</name>\n <value>auto</value>\n </option>\n <option>\n <name>cluster.server-quorum-type</name>\n <value>server</value>\n </option>\n <option>\n <name>cluster.data-self-heal-algorithm</name>\n <value>full</value>\n </option>\n <option>\n <name>cluster.locking-scheme</name>\n <value>granular</value>\n </option>\n <option>\n <name>cluster.shd-max-threads</name>\n <value>8</value>\n </option>\n <option>\n <name>cluster.shd-wait-qlength</name>\n <value>10000</value>\n </option>\n <option>\n <name>features.shard</name>\n <value>on</value>\n </option>\n <option>\n <name>user.cifs</name>\n <value>off</value>\n </opt ion>\n <option>\n <name>cluster.choose-local</name>\n <value>off</value>\n </option>\n <option>\n <name>client.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>server.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>network.ping-timeout</name>\n <value>60</value>\n </option>\n <option>\n <name>server.tcp-user-timeout</name>\n <value>20</value>\n </option>\n <option>\n <name>server.keepalive-time</name>\n <value>10</value>\n </option>\n <option>\n <name>server.keepalive-interval</name>\n <value>2</value>\n </option>\n <option>\n <name>server.keepalive-count</name>\n <value>5</value>\n </option>\n <option>\n <name>cluster.lookup-optimize</name>\n <value>off</value>\n </option>\n <option>\n <name>storage.owner-uid</name>\n <value>36</value>\n </option>\n <option>\n <name>storage.owner-gid</name>\n <value>36</value>\n </option>\n </options>\n </volume>\n <count>1</count>\n </volumes>\n </volInfo>\n</cliOutput>'] 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74) 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'status': 4106}]} from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:54)

Hi, please try this workaround, replace the following line in /usr/lib/python3.6/site-packages/vdsm/gluster/cli.py value['stripeCount'] = el.find('stripeCount').text with: if (el.find('stripeCount')): value['stripeCount'] = el.find('stripeCount').text Then restart vdsmd and supervdmsd and retry. It worked for me, and it looks like a serious bug for people upgrading to glusterfs 10. Cheers, Alessandro Il 25/04/22 10:58, diego.ercolani@ssis.sm ha scritto:
I saw your report infact, they suggested to downgrade jdbc, for completeness I found also error report in vdsm.log while issuing "hosted-engine --connect-storage" corresponding to what you are noticing. I report the log except here if it can be useful. by the way, why vdsm it's searching for the storage engine storage UUID in a lvm volumegroup name?
2022-04-25 10:53:35,506+0200 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:47350 (protocoldetector:61) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:47350 (protocoldetector:125) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95) 2022-04-25 10:53:35,512+0200 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompserver:124) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [vdsm.api] START getStorageDomainInfo(sdUUID='7b8f1cc9-e3de-401f-b97f-8c281ca30482') from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:48) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache (resize=True) (sdc:80) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices (iscsi:462) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices: 0.01 seconds (utils:390) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices (hba:59) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices: 0.03 seconds (utils:390) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.multipath] Waiting until multipathd is ready (multipath:112) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:48) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] FINISH repoStats return={} from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:54) 2022-04-25 10:53:37,558+0200 WARN (periodic/3) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Waited 2.02 seconds for multipathd (tries=2, ready=2) (multipath:139) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices (multipath:220) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices: 0.00 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache: 2.07 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482 (sdc:171) 2022-04-25 10:53:37,643+0200 WARN (jsonrpc/3) [storage.lvm] All 1 tries have failed: LVM command failed: 'cmd=[\'/sbin/lvm\', \'vgs\', \'--devices\', \'/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300064E,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300066N,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300067L,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300230B\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--noheadings\', \'--units\', \'b\', \'--nosuffix\', \'--separator\', \'|\', \'--ignoreskippedcluster\', \'-o\', \'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name\', \'7b8f1cc9-e3de-401f-b97f-8c281ca30482\'] rc=5 out=[] err=[\' Volume group "7b8f1cc9-e3de-401f -b97f-8c281ca30482" not found\', \' Cannot process volume group 7b8f1cc9-e3de-401f-b97f-8c281ca30482\']' (lvm:482) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482: 0.06 seconds (utils:390) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [vdsm.api] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:52) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-131>", line 2, in getStorageDomainInfo File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2463, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 152, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 115, in produce domain.getRealDomain() File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 139, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 156, in _findDomain return findMethod(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 186, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) vdsm.storage.exception.StorageDomainDoesNotExist: Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') aborting: Task is aborted: "value=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) abortedcode=358" (task:1182) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.dispatcher] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) (dispatcher:83) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo failed (error 358) in 2.13 seconds (__init__:312) 2022-04-25 10:53:37,692+0200 INFO (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=7, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection': 'ovirt-node2.ovirt:/gveng', 'user': 'kvm', 'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'vfs_type': 'glusterfs'}]) from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:48) 2022-04-25 10:53:37,746+0200 ERROR (jsonrpc/4) [storage.storageServer] Could not connect to storage server (storageServer:92) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 90, in connect_all con.connect() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 233, in connect self.validate() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 365, in validate if not self.volinfo: File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 352, in volinfo self._volinfo = self._get_gluster_volinfo() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 405, in _get_gluster_volinfo self._volfileserver) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in glusterVolumeInfo File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=() err=[b'<cliOutput>\n <opRet>0</opRet>\n <opErrno>0</opErrno>\n <opErrstr />\n <volInfo>\n <volumes>\n <volume>\n <name>gveng</name>\n <id>aa080d92-215f-4b90-8fd4-2b60cff9f40e</id>\n <status>1</status>\n <statusStr>Started</statusStr>\n <snapshotCount>0</snapshotCount>\n <brickCount>3</brickCount>\n <distCount>1</distCount>\n <replicaCount>3</replicaCount>\n <arbiterCount>1</arbiterCount>\n <disperseCount>0</disperseCount>\n <redundancyCount>0</redundancyCount>\n <type>2</type>\n <typeStr>Replicate</typeStr>\n <transport>0</transport>\n <bricks>\n <brick uuid="e2b460d1-a0c6-4735-b82c-c5befdf31691">ovirt-node2.ovirt:/brick/glhosteng/gveng<name>ovirt-node2.ovirt:/brick/glhosteng/gveng</name><hostUuid>e2b460d1-a0c6-4735-b82c-c5befdf31691</hostUuid><isArbiter>0</isArbiter></brick>\n <b rick uuid="bff83488-7c84-4389-af47-27e3acdabd90">ovirt-node1.ovirt:/brick/glhosteng/gveng<name>ovirt-node1.ovirt:/brick/glhosteng/gveng</name><hostUuid>bff83488-7c84-4389-af47-27e3acdabd90</hostUuid><isArbiter>0</isArbiter></brick>\n <brick uuid="70823438-3804-4504-a148-d86f3ecc5f24">ovirt-node3.ovirt:/brickarbiter/gveng<name>ovirt-node3.ovirt:/brickarbiter/gveng</name><hostUuid>70823438-3804-4504-a148-d86f3ecc5f24</hostUuid><isArbiter>1</isArbiter></brick>\n </bricks>\n <optCount>31</optCount>\n <options>\n <option>\n <name>cluster.self-heal-daemon</name>\n <value>enable</value>\n </option>\n <option>\n <name>performance.client-io-threads</name>\n <value>on</value>\n </option>\n <option>\n <name>nfs.disable</name>\n <value>on</value>\n </option>\n <option>\n <name>transport.address-family</name>\n <value
inet</value>\n </option>\n <option>\n <name>storage.fips-mode-rchecksum</name>\n <value>on</value>\n </option>\n <option>\n <name>performance.quick-read</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.read-ahead</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.io-cache</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.low-prio-threads</name>\n <value>32</value>\n </option>\n <option>\n <name>network.remote-dio</name>\n <value>disable</value>\n </option>\n <option>\n <name>performance.strict-o-direct</name>\n <value>on</value>\n </option>\n <option>\n <name>cluster.eager-lock</name>\n <value>enable</valu e>\n </option>\n <option>\n <name>cluster.quorum-type</name>\n <value>auto</value>\n </option>\n <option>\n <name>cluster.server-quorum-type</name>\n <value>server</value>\n </option>\n <option>\n <name>cluster.data-self-heal-algorithm</name>\n <value>full</value>\n </option>\n <option>\n <name>cluster.locking-scheme</name>\n <value>granular</value>\n </option>\n <option>\n <name>cluster.shd-max-threads</name>\n <value>8</value>\n </option>\n <option>\n <name>cluster.shd-wait-qlength</name>\n <value>10000</value>\n </option>\n <option>\n <name>features.shard</name>\n <value>on</value>\n </option>\n <option>\n <name>user.cifs</name>\n <value>off</value>\n </opt ion>\n <option>\n <name>cluster.choose-local</name>\n <value>off</value>\n </option>\n <option>\n <name>client.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>server.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>network.ping-timeout</name>\n <value>60</value>\n </option>\n <option>\n <name>server.tcp-user-timeout</name>\n <value>20</value>\n </option>\n <option>\n <name>server.keepalive-time</name>\n <value>10</value>\n </option>\n <option>\n <name>server.keepalive-interval</name>\n <value>2</value>\n </option>\n <option>\n <name>server.keepalive-count</name>\n <value>5</value>\n </option>\n <option>\n <name>cluster.lookup-optimize</name>\n <value>off</value>\n </option>\n <option>\n <name>storage.owner-uid</name>\n <value>36</value>\n </option>\n <option>\n <name>storage.owner-gid</name>\n <value>36</value>\n </option>\n </options>\n </volume>\n <count>1</count>\n </volumes>\n </volInfo>\n</cliOutput>'] 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74) 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'status': 4106}]} from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:54)
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ALVL24X6RPAH4J...

yes it seem it worked thank you very much

Bug is here: https://bugzilla.redhat.com/show_bug.cgi?id=2078569 Il giorno lun 25 apr 2022 alle ore 13:42 Alessandro De Salvo < Alessandro.DeSalvo@roma1.infn.it> ha scritto:
Hi,
please try this workaround, replace the following line in /usr/lib/python3.6/site-packages/vdsm/gluster/cli.py
value['stripeCount'] = el.find('stripeCount').text
with:
if (el.find('stripeCount')): value['stripeCount'] = el.find('stripeCount').text
Then restart vdsmd and supervdmsd and retry. It worked for me, and it looks like a serious bug for people upgrading to glusterfs 10.
Cheers,
Alessandro
I saw your report infact, they suggested to downgrade jdbc, for completeness I found also error report in vdsm.log while issuing "hosted-engine --connect-storage" corresponding to what you are noticing. I report the log except here if it can be useful. by the way, why vdsm it's searching for the storage engine storage UUID in a lvm volumegroup name?
2022-04-25 10:53:35,506+0200 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:47350 (protocoldetector:61) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:47350 (protocoldetector:125) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95) 2022-04-25 10:53:35,512+0200 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompserver:124) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [vdsm.api] START getStorageDomainInfo(sdUUID='7b8f1cc9-e3de-401f-b97f-8c281ca30482') from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:48) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache (resize=True) (sdc:80) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices (iscsi:462) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices: 0.01 seconds (utils:390) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices (hba:59) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices: 0.03 seconds (utils:390) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.multipath] Waiting until multipathd is ready (multipath:112) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:48) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] FINISH repoStats return={} from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:54) 2022-04-25 10:53:37,558+0200 WARN (periodic/3) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Waited 2.02 seconds for multipathd (tries=2, ready=2) (multipath:139) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices (multipath:220) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices: 0.00 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache: 2.07 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482 (sdc:171) 2022-04-25 10:53:37,643+0200 WARN (jsonrpc/3) [storage.lvm] All 1 tries have failed: LVM command failed: 'cmd=[\'/sbin/lvm\', \'vgs\', \'--devices\', \'/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300064E,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300066N,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300067L,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300230B\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--noheadings\', \'--units\', \'b\', \'--nosuffix\', \'--separator\', \'|\', \'--ignoreskippedcluster\', \'-o\', \'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name\', \'7b8f1cc9-e3de-401f-b97f-8c281ca30482\'] rc=5 out=[] err=[\' Volume group "7b8f1cc9-e3de-401f -b97f-8c281ca30482" not found\', \' Cannot process volume group 7b8f1cc9-e3de-401f-b97f-8c281ca30482\']' (lvm:482) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482: 0.06 seconds (utils:390) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [vdsm.api] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:52) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-131>", line 2, in getStorageDomainInfo File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2463, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 152, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 115, in produce domain.getRealDomain() File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 139, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 156, in _findDomain return findMethod(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 186, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) vdsm.storage.exception.StorageDomainDoesNotExist: Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') aborting: Task is aborted: "value=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) abortedcode=358" (task:1182) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.dispatcher] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) (dispatcher:83) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo failed (error 358) in 2.13 seconds (__init__:312) 2022-04-25 10:53:37,692+0200 INFO (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=7, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection': 'ovirt-node2.ovirt:/gveng', 'user': 'kvm', 'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'vfs_type': 'glusterfs'}]) from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:48) 2022-04-25 10:53:37,746+0200 ERROR (jsonrpc/4) [storage.storageServer] Could not connect to storage server (storageServer:92) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 90, in connect_all con.connect() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 233, in connect self.validate() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 365, in validate if not self.volinfo: File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 352, in volinfo self._volinfo = self._get_gluster_volinfo() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 405, in _get_gluster_volinfo self._volfileserver) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
Il 25/04/22 10:58, diego.ercolani@ssis.sm ha scritto: line 56, in __call__
return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
line 54, in <lambda>
**kwargs) File "<string>", line 2, in glusterVolumeInfo File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in
_callmethod
raise convert_to_error(kind, result) vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=()
err=[b'<cliOutput>\n <opRet>0</opRet>\n <opErrno>0</opErrno>\n <opErrstr />\n <volInfo>\n <volumes>\n <volume>\n <name>gveng</name>\n <id>aa080d92-215f-4b90-8fd4-2b60cff9f40e</id>\n <status>1</status>\n <statusStr>Started</statusStr>\n <snapshotCount>0</snapshotCount>\n <brickCount>3</brickCount>\n <distCount>1</distCount>\n <replicaCount>3</replicaCount>\n <arbiterCount>1</arbiterCount>\n <disperseCount>0</disperseCount>\n <redundancyCount>0</redundancyCount>\n <type>2</type>\n <typeStr>Replicate</typeStr>\n <transport>0</transport>\n <bricks>\n <brick uuid="e2b460d1-a0c6-4735-b82c-c5befdf31691">ovirt-node2.ovirt:/brick/glhosteng/gveng<name>ovirt-node2.ovirt:/brick/glhosteng/gveng</name><hostUuid>e2b460d1-a0c6-4735-b82c-c5befdf31691</hostUuid><isArbiter>0</isArbiter></brick>\n <b
rick uuid="bff83488-7c84-4389-af47-27e3acdabd90">ovirt-node1.ovirt:/brick/glhosteng/gveng<name>ovirt-node1.ovirt:/brick/glhosteng/gveng</name><hostUuid>bff83488-7c84-4389-af47-27e3acdabd90</hostUuid><isArbiter>0</isArbiter></brick>\n <brick uuid="70823438-3804-4504-a148-d86f3ecc5f24">ovirt-node3.ovirt:/brickarbiter/gveng<name>ovirt-node3.ovirt:/brickarbiter/gveng</name><hostUuid>70823438-3804-4504-a148-d86f3ecc5f24</hostUuid><isArbiter>1</isArbiter></brick>\n </bricks>\n <optCount>31</optCount>\n <options>\n <option>\n <name>cluster.self-heal-daemon</name>\n <value>enable</value>\n </option>\n <option>\n <name>performance.client-io-threads</name>\n <value>on</value>\n </option>\n <option>\n <name>nfs.disable</name>\n <value>on</value>\n </option>\n <option>\n <name>transport.address-family</name>\n <value
inet</value>\n </option>\n <option>\n <name>storage.fips-mode-rchecksum</name>\n <value>on</value>\n </option>\n <option>\n <name>performance.quick-read</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.read-ahead</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.io-cache</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.low-prio-threads</name>\n <value>32</value>\n </option>\n <option>\n <name>network.remote-dio</name>\n <value>disable</value>\n </option>\n <option>\n <name>performance.strict-o-direct</name>\n <value>on</value>\n </option>\n <option>\n <name>cluster.eager-lock</name>\n <value>enable</valu e>\n </option>\n <option>\n <name>cluster.quorum-type</name>\n <value>auto</value>\n </option>\n <option>\n <name>cluster.server-quorum-type</name>\n <value>server</value>\n </option>\n <option>\n <name>cluster.data-self-heal-algorithm</name>\n <value>full</value>\n </option>\n <option>\n <name>cluster.locking-scheme</name>\n <value>granular</value>\n </option>\n <option>\n <name>cluster.shd-max-threads</name>\n <value>8</value>\n </option>\n <option>\n <name>cluster.shd-wait-qlength</name>\n <value>10000</value>\n </option>\n <option>\n <name>features.shard</name>\n <value>on</value>\n </option>\n <option>\n <name>user.cifs</name>\n <value>off</value>\n </opt ion>\n <option>\n <name>cluster.choose-local</name>\n <value>off</value>\n </option>\n <option>\n <name>client.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>server.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>network.ping-timeout</name>\n <value>60</value>\n </option>\n <option>\n <name>server.tcp-user-timeout</name>\n <value>20</value>\n </option>\n <option>\n <name>server.keepalive-time</name>\n <value>10</value>\n </option>\n <option>\n <name>server.keepalive-interval</name>\n <value>2</value>\n </option>\n <option>\n <name>server.keepalive-count</name>\n <value>5</value>\n </option>\n <option>\n <name>cluster.lookup-optimize</name>\n <value>off</value>\n </option>\n <option>\n <name>storage.owner-uid</name>\n <value>36</value>\n </option>\n <option>\n <name>storage.owner-gid</name>\n <value>36</value>\n </option>\n </options>\n </volume>\n <count>1</count>\n </volumes>\n </volInfo>\n</cliOutput>'] 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74) 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'status': 4106}]} from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:54)
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ALVL24X6RPAH4J...
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/66PWY6JFI63QOS...
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

Il giorno lun 25 apr 2022 alle ore 13:42 Alessandro De Salvo < Alessandro.DeSalvo@roma1.infn.it> ha scritto:
Hi,
please try this workaround, replace the following line in /usr/lib/python3.6/site-packages/vdsm/gluster/cli.py
value['stripeCount'] = el.find('stripeCount').text
with:
if (el.find('stripeCount')): value['stripeCount'] = el.find('stripeCount').text
Then restart vdsmd and supervdmsd and retry. It worked for me, and it looks like a serious bug for people upgrading to glusterfs 10.
Indeed. bug severity raised to urgent. Alessandro, would you mind sending a PR to https://github.com/oVirt/vdsm ? +Arik Hadas <ahadas@redhat.com> as Gobinda is on vacation, anyone in the storage team who can assist?
Cheers,
Alessandro
I saw your report infact, they suggested to downgrade jdbc, for completeness I found also error report in vdsm.log while issuing "hosted-engine --connect-storage" corresponding to what you are noticing. I report the log except here if it can be useful. by the way, why vdsm it's searching for the storage engine storage UUID in a lvm volumegroup name?
2022-04-25 10:53:35,506+0200 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:47350 (protocoldetector:61) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:47350 (protocoldetector:125) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95) 2022-04-25 10:53:35,512+0200 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompserver:124) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [vdsm.api] START getStorageDomainInfo(sdUUID='7b8f1cc9-e3de-401f-b97f-8c281ca30482') from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:48) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache (resize=True) (sdc:80) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices (iscsi:462) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices: 0.01 seconds (utils:390) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices (hba:59) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices: 0.03 seconds (utils:390) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.multipath] Waiting until multipathd is ready (multipath:112) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:48) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] FINISH repoStats return={} from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:54) 2022-04-25 10:53:37,558+0200 WARN (periodic/3) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Waited 2.02 seconds for multipathd (tries=2, ready=2) (multipath:139) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices (multipath:220) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices: 0.00 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache: 2.07 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482 (sdc:171) 2022-04-25 10:53:37,643+0200 WARN (jsonrpc/3) [storage.lvm] All 1 tries have failed: LVM command failed: 'cmd=[\'/sbin/lvm\', \'vgs\', \'--devices\', \'/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300064E,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300066N,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300067L,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300230B\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--noheadings\', \'--units\', \'b\', \'--nosuffix\', \'--separator\', \'|\', \'--ignoreskippedcluster\', \'-o\', \'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name\', \'7b8f1cc9-e3de-401f-b97f-8c281ca30482\'] rc=5 out=[] err=[\' Volume group "7b8f1cc9-e3de-401f -b97f-8c281ca30482" not found\', \' Cannot process volume group 7b8f1cc9-e3de-401f-b97f-8c281ca30482\']' (lvm:482) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482: 0.06 seconds (utils:390) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [vdsm.api] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:52) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-131>", line 2, in getStorageDomainInfo File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2463, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 152, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 115, in produce domain.getRealDomain() File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 139, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 156, in _findDomain return findMethod(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 186, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) vdsm.storage.exception.StorageDomainDoesNotExist: Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') aborting: Task is aborted: "value=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) abortedcode=358" (task:1182) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.dispatcher] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) (dispatcher:83) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo failed (error 358) in 2.13 seconds (__init__:312) 2022-04-25 10:53:37,692+0200 INFO (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=7, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection': 'ovirt-node2.ovirt:/gveng', 'user': 'kvm', 'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'vfs_type': 'glusterfs'}]) from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:48) 2022-04-25 10:53:37,746+0200 ERROR (jsonrpc/4) [storage.storageServer] Could not connect to storage server (storageServer:92) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 90, in connect_all con.connect() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 233, in connect self.validate() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 365, in validate if not self.volinfo: File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 352, in volinfo self._volinfo = self._get_gluster_volinfo() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 405, in _get_gluster_volinfo self._volfileserver) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
Il 25/04/22 10:58, diego.ercolani@ssis.sm ha scritto: line 56, in __call__
return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
line 54, in <lambda>
**kwargs) File "<string>", line 2, in glusterVolumeInfo File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in
_callmethod
raise convert_to_error(kind, result) vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=()
err=[b'<cliOutput>\n <opRet>0</opRet>\n <opErrno>0</opErrno>\n <opErrstr />\n <volInfo>\n <volumes>\n <volume>\n <name>gveng</name>\n <id>aa080d92-215f-4b90-8fd4-2b60cff9f40e</id>\n <status>1</status>\n <statusStr>Started</statusStr>\n <snapshotCount>0</snapshotCount>\n <brickCount>3</brickCount>\n <distCount>1</distCount>\n <replicaCount>3</replicaCount>\n <arbiterCount>1</arbiterCount>\n <disperseCount>0</disperseCount>\n <redundancyCount>0</redundancyCount>\n <type>2</type>\n <typeStr>Replicate</typeStr>\n <transport>0</transport>\n <bricks>\n <brick uuid="e2b460d1-a0c6-4735-b82c-c5befdf31691">ovirt-node2.ovirt:/brick/glhosteng/gveng<name>ovirt-node2.ovirt:/brick/glhosteng/gveng</name><hostUuid>e2b460d1-a0c6-4735-b82c-c5befdf31691</hostUuid><isArbiter>0</isArbiter></brick>\n <b
rick uuid="bff83488-7c84-4389-af47-27e3acdabd90">ovirt-node1.ovirt:/brick/glhosteng/gveng<name>ovirt-node1.ovirt:/brick/glhosteng/gveng</name><hostUuid>bff83488-7c84-4389-af47-27e3acdabd90</hostUuid><isArbiter>0</isArbiter></brick>\n <brick uuid="70823438-3804-4504-a148-d86f3ecc5f24">ovirt-node3.ovirt:/brickarbiter/gveng<name>ovirt-node3.ovirt:/brickarbiter/gveng</name><hostUuid>70823438-3804-4504-a148-d86f3ecc5f24</hostUuid><isArbiter>1</isArbiter></brick>\n </bricks>\n <optCount>31</optCount>\n <options>\n <option>\n <name>cluster.self-heal-daemon</name>\n <value>enable</value>\n </option>\n <option>\n <name>performance.client-io-threads</name>\n <value>on</value>\n </option>\n <option>\n <name>nfs.disable</name>\n <value>on</value>\n </option>\n <option>\n <name>transport.address-family</name>\n <value
inet</value>\n </option>\n <option>\n <name>storage.fips-mode-rchecksum</name>\n <value>on</value>\n </option>\n <option>\n <name>performance.quick-read</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.read-ahead</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.io-cache</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.low-prio-threads</name>\n <value>32</value>\n </option>\n <option>\n <name>network.remote-dio</name>\n <value>disable</value>\n </option>\n <option>\n <name>performance.strict-o-direct</name>\n <value>on</value>\n </option>\n <option>\n <name>cluster.eager-lock</name>\n <value>enable</valu e>\n </option>\n <option>\n <name>cluster.quorum-type</name>\n <value>auto</value>\n </option>\n <option>\n <name>cluster.server-quorum-type</name>\n <value>server</value>\n </option>\n <option>\n <name>cluster.data-self-heal-algorithm</name>\n <value>full</value>\n </option>\n <option>\n <name>cluster.locking-scheme</name>\n <value>granular</value>\n </option>\n <option>\n <name>cluster.shd-max-threads</name>\n <value>8</value>\n </option>\n <option>\n <name>cluster.shd-wait-qlength</name>\n <value>10000</value>\n </option>\n <option>\n <name>features.shard</name>\n <value>on</value>\n </option>\n <option>\n <name>user.cifs</name>\n <value>off</value>\n </opt ion>\n <option>\n <name>cluster.choose-local</name>\n <value>off</value>\n </option>\n <option>\n <name>client.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>server.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>network.ping-timeout</name>\n <value>60</value>\n </option>\n <option>\n <name>server.tcp-user-timeout</name>\n <value>20</value>\n </option>\n <option>\n <name>server.keepalive-time</name>\n <value>10</value>\n </option>\n <option>\n <name>server.keepalive-interval</name>\n <value>2</value>\n </option>\n <option>\n <name>server.keepalive-count</name>\n <value>5</value>\n </option>\n <option>\n <name>cluster.lookup-optimize</name>\n <value>off</value>\n </option>\n <option>\n <name>storage.owner-uid</name>\n <value>36</value>\n </option>\n <option>\n <name>storage.owner-gid</name>\n <value>36</value>\n </option>\n </options>\n </volume>\n <count>1</count>\n </volumes>\n </volInfo>\n</cliOutput>'] 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74) 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'status': 4106}]} from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:54)
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ALVL24X6RPAH4J...
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/66PWY6JFI63QOS...
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

Same problem after upgrade to 4.5.0.2 I've reapplied the patch to make it work againt so please tell upstream yum --enablerepo=baseos install patch cd / patch -p0 <<EOF *** /usr/lib/python3.6/site-packages/vdsm/gluster/cli.py.orig Tue Apr 19 17:54:05 2022 --- /usr/lib/python3.6/site-packages/vdsm/gluster/cli.py Mon Apr 25 14:29:29 2022 *************** *** 426,432 **** value["volumeStatus"] = VolumeStatus.OFFLINE value['brickCount'] = el.find('brickCount').text value['distCount'] = el.find('distCount').text ! value['stripeCount'] = el.find('stripeCount').text value['replicaCount'] = el.find('replicaCount').text value['disperseCount'] = el.find('disperseCount').text value['redundancyCount'] = el.find('redundancyCount').text --- 426,432 ---- value["volumeStatus"] = VolumeStatus.OFFLINE value['brickCount'] = el.find('brickCount').text value['distCount'] = el.find('distCount').text ! if (el.find('stripeCount')): value['stripeCount'] = el.find('stripeCount').text value['replicaCount'] = el.find('replicaCount').text value['disperseCount'] = el.find('disperseCount').text value['redundancyCount'] = el.find('redundancyCount').text EOF

Same in 4.5.0.3, workaround seem to work event in this version

This work around does work for me, for what its worth. Sent with Proton Mail secure email. ------- Original Message ------- On Tuesday, April 26th, 2022 at 8:09 AM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Il giorno lun 25 apr 2022 alle ore 13:42 Alessandro De Salvo <Alessandro.DeSalvo@roma1.infn.it> ha scritto:
Hi,
please try this workaround, replace the following line in /usr/lib/python3.6/site-packages/vdsm/gluster/cli.py
value['stripeCount'] = el.find('stripeCount').text
with:
if (el.find('stripeCount')): value['stripeCount'] = el.find('stripeCount').text
Then restart vdsmd and supervdmsd and retry. It worked for me, and it looks like a serious bug for people upgrading to glusterfs 10.
Indeed. bug severity raised to urgent. Alessandro, would you mind sending a PR to https://github.com/oVirt/vdsm ?+Arik Hadas as Gobinda is on vacation, anyone in the storage team who can assist?
Cheers,
Alessandro
Il 25/04/22 10:58, diego.ercolani@ssis.sm ha scritto:
I saw your report infact, they suggested to downgrade jdbc, for completeness I found also error report in vdsm.log while issuing "hosted-engine --connect-storage" corresponding to what you are noticing. I report the log except here if it can be useful. by the way, why vdsm it's searching for the storage engine storage UUID in a lvm volumegroup name?
2022-04-25 10:53:35,506+0200 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:47350 (protocoldetector:61) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:47350 (protocoldetector:125) 2022-04-25 10:53:35,510+0200 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95) 2022-04-25 10:53:35,512+0200 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompserver:124) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [vdsm.api] START getStorageDomainInfo(sdUUID='7b8f1cc9-e3de-401f-b97f-8c281ca30482') from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:48) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache (resize=True) (sdc:80) 2022-04-25 10:53:35,518+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices (iscsi:462) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.iscsi] Scanning iSCSI devices: 0.01 seconds (utils:390) 2022-04-25 10:53:35,532+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices (hba:59) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.hba] Scanning FC devices: 0.03 seconds (utils:390) 2022-04-25 10:53:35,565+0200 INFO (jsonrpc/3) [storage.multipath] Waiting until multipathd is ready (multipath:112) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:48) 2022-04-25 10:53:37,556+0200 INFO (periodic/3) [vdsm.api] FINISH repoStats return={} from=internal, task_id=f4266860-9162-417e-85a5-087f9cb5cd51 (api:54) 2022-04-25 10:53:37,558+0200 WARN (periodic/3) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Waited 2.02 seconds for multipathd (tries=2, ready=2) (multipath:139) 2022-04-25 10:53:37,584+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices (multipath:220) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.multipath] Resizing multipath devices: 0.00 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Refreshing storage domain cache: 2.07 seconds (utils:390) 2022-04-25 10:53:37,586+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482 (sdc:171) 2022-04-25 10:53:37,643+0200 WARN (jsonrpc/3) [storage.lvm] All 1 tries have failed: LVM command failed: 'cmd=[\'/sbin/lvm\', \'vgs\', \'--devices\', \'/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300064E,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300066N,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300067L,/dev/mapper/Samsung_SSD_870_EVO_4TB_S6BCNG0R300230B\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--noheadings\', \'--units\', \'b\', \'--nosuffix\', \'--separator\', \'|\', \'--ignoreskippedcluster\', \'-o\', \'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name\', \'7b8f1cc9-e3de-401f-b97f-8c281ca30482\'] rc=5 out=[] err=[\' Volume group "7b8f1cc9-e3de-401f -b97f-8c281ca30482" not found\', \' Cannot process volume group 7b8f1cc9-e3de-401f-b97f-8c281ca30482\']' (lvm:482) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.storagedomaincache] Looking up domain 7b8f1cc9-e3de-401f-b97f-8c281ca30482: 0.06 seconds (utils:390) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [vdsm.api] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) from=::1,47350, task_id=1803abb2-9e9a-4292-8349-678c793f7264 (api:52) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') Unexpected error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "</usr/lib/python3.6/site-packages/decorator.py:decorator-gen-131>", line 2, in getStorageDomainInfo File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2463, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 152, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 115, in produce domain.getRealDomain() File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 139, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 156, in _findDomain return findMethod(sdUUID) File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 186, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) vdsm.storage.exception.StorageDomainDoesNotExist: Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [storage.taskmanager.task] (Task='1803abb2-9e9a-4292-8349-678c793f7264') aborting: Task is aborted: "value=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) abortedcode=358" (task:1182) 2022-04-25 10:53:37,643+0200 ERROR (jsonrpc/3) [storage.dispatcher] FINISH getStorageDomainInfo error=Storage domain does not exist: ('7b8f1cc9-e3de-401f-b97f-8c281ca30482',) (dispatcher:83) 2022-04-25 10:53:37,643+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo failed (error 358) in 2.13 seconds (__init__:312) 2022-04-25 10:53:37,692+0200 INFO (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=7, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection': 'ovirt-node2.ovirt:/gveng', 'user': 'kvm', 'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'vfs_type': 'glusterfs'}]) from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:48) 2022-04-25 10:53:37,746+0200 ERROR (jsonrpc/4) [storage.storageServer] Could not connect to storage server (storageServer:92) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 90, in connect_all con.connect() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 233, in connect self.validate() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 365, in validate if not self.volinfo: File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 352, in volinfo self._volinfo = self._get_gluster_volinfo() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 405, in _get_gluster_volinfo self._volfileserver) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in glusterVolumeInfo File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=() err=[b'<cliOutput>\n <opRet>0</opRet>\n <opErrno>0</opErrno>\n <opErrstr />\n <volInfo>\n <volumes>\n <volume>\n <name>gveng</name>\n <id>aa080d92-215f-4b90-8fd4-2b60cff9f40e</id>\n <status>1</status>\n <statusStr>Started</statusStr>\n <snapshotCount>0</snapshotCount>\n <brickCount>3</brickCount>\n <distCount>1</distCount>\n <replicaCount>3</replicaCount>\n <arbiterCount>1</arbiterCount>\n <disperseCount>0</disperseCount>\n <redundancyCount>0</redundancyCount>\n <type>2</type>\n <typeStr>Replicate</typeStr>\n <transport>0</transport>\n <bricks>\n <brick uuid="e2b460d1-a0c6-4735-b82c-c5befdf31691">ovirt-node2.ovirt:/brick/glhosteng/gveng<name>ovirt-node2.ovirt:/brick/glhosteng/gveng</name><hostUuid>e2b460d1-a0c6-4735-b82c-c5befdf31691</hostUuid><isArbiter>0</isArbiter></brick>\n <b rick uuid="bff83488-7c84-4389-af47-27e3acdabd90">ovirt-node1.ovirt:/brick/glhosteng/gveng<name>ovirt-node1.ovirt:/brick/glhosteng/gveng</name><hostUuid>bff83488-7c84-4389-af47-27e3acdabd90</hostUuid><isArbiter>0</isArbiter></brick>\n <brick uuid="70823438-3804-4504-a148-d86f3ecc5f24">ovirt-node3.ovirt:/brickarbiter/gveng<name>ovirt-node3.ovirt:/brickarbiter/gveng</name><hostUuid>70823438-3804-4504-a148-d86f3ecc5f24</hostUuid><isArbiter>1</isArbiter></brick>\n </bricks>\n <optCount>31</optCount>\n <options>\n <option>\n <name>cluster.self-heal-daemon</name>\n <value>enable</value>\n </option>\n <option>\n <name>performance.client-io-threads</name>\n <value>on</value>\n </option>\n <option>\n <name>nfs.disable</name>\n <value>on</value>\n </option>\n <option>\n <name>transport.address-family</name>\n <value
inet</value>\n </option>\n <option>\n <name>storage.fips-mode-rchecksum</name>\n <value>on</value>\n </option>\n <option>\n <name>performance.quick-read</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.read-ahead</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.io-cache</name>\n <value>off</value>\n </option>\n <option>\n <name>performance.low-prio-threads</name>\n <value>32</value>\n </option>\n <option>\n <name>network.remote-dio</name>\n <value>disable</value>\n </option>\n <option>\n <name>performance.strict-o-direct</name>\n <value>on</value>\n </option>\n <option>\n <name>cluster.eager-lock</name>\n <value>enable</valu e>\n </option>\n <option>\n <name>cluster.quorum-type</name>\n <value>auto</value>\n </option>\n <option>\n <name>cluster.server-quorum-type</name>\n <value>server</value>\n </option>\n <option>\n <name>cluster.data-self-heal-algorithm</name>\n <value>full</value>\n </option>\n <option>\n <name>cluster.locking-scheme</name>\n <value>granular</value>\n </option>\n <option>\n <name>cluster.shd-max-threads</name>\n <value>8</value>\n </option>\n <option>\n <name>cluster.shd-wait-qlength</name>\n <value>10000</value>\n </option>\n <option>\n <name>features.shard</name>\n <value>on</value>\n </option>\n <option>\n <name>user.cifs</name>\n <value>off</value>\n </opt ion>\n <option>\n <name>cluster.choose-local</name>\n <value>off</value>\n </option>\n <option>\n <name>client.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>server.event-threads</name>\n <value>4</value>\n </option>\n <option>\n <name>network.ping-timeout</name>\n <value>60</value>\n </option>\n <option>\n <name>server.tcp-user-timeout</name>\n <value>20</value>\n </option>\n <option>\n <name>server.keepalive-time</name>\n <value>10</value>\n </option>\n <option>\n <name>server.keepalive-interval</name>\n <value>2</value>\n </option>\n <option>\n <name>server.keepalive-count</name>\n <value>5</value>\n </option>\n <option>\n <name>cluster.lookup-optimize</name>\n <value>off</value>\n </option>\n <option>\n <name>storage.owner-uid</name>\n <value>36</value>\n </option>\n <option>\n <name>storage.owner-gid</name>\n <value>36</value>\n </option>\n </options>\n </volume>\n <count>1</count>\n </volumes>\n </volInfo>\n</cliOutput>'] 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74) 2022-04-25 10:53:37,746+0200 INFO (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'e29cf818-5ee5-46e1-85c1-8aeefa33e95d', 'status': 4106}]} from=::1,47350, task_id=51b9a69f-a90b-4867-86ec-19f9a4ebbc6f (api:54)
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ALVL24X6RPAH4J...
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/66PWY6JFI63QOS...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA
sbonazzo@redhat.com
Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.
participants (5)
-
Alessandro De Salvo
-
David White
-
Diego Ercolani
-
diego.ercolani@ssis.sm
-
Sandro Bonazzola