Can't connect vdsm storage: Command StorageDomain.getInfo with args failed: (code=350, message=Error in storage domain action

Hi! I trying to upgrade my hosts and have problem with it. After uprgading one host i see that this one NonOperational. All was fine with vdsm-4.30.24-1.el7 but after upgrading with new version vdsm-4.30.40-1.el7.x86_64 and some others i have errors. Firtst of all i see in ovirt Events: Host srv02 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational. My Default storage domain with HE VM data on NFS storage. In messages log of host: srv02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/a gent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring ()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python2.7/site-packages/ ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 8 9, in start_monitor#012 ).format(t=type, o=options, e=e)#012RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '192.168.2.248'}] Feb 1 15:41:42 srv02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent In broker log: MainThread::WARNING::2020-02-01 15:43:35,167::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with ar gs {'storagedomainID': 'bbdddea7-9cd6-41e7-ace5-fb9a6795caa8'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=bbdddea7-9cd6-41e7-ace5-fb9a6795caa8',)) In vdsm.lod 2020-02-01 15:44:19,930+0600 INFO (jsonrpc/0) [vdsm.api] FINISH getStorageDomainInfo error=[Errno 1] Operation not permitted from=::1,57528, task_id=40683f67-d7b0-4105-aab8-6338deb54b00 (api:52) 2020-02-01 15:44:19,930+0600 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='40683f67-d7b0-4105-aab8-6338deb54b00') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in getStorageDomainInfo File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2753, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 305, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce domain.getRealDomain() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 145, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__ manifest.sdUUID, manifest.mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size block_size = iop.probe_block_size(mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size return self._ioproc.probe_block_size(dir_path) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size "probe_block_size", {"dir": dir_path}, self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 1] Operation not permitted 2020-02-01 15:44:19,930+0600 INFO (jsonrpc/0) [storage.TaskManager.Task] (Task='40683f67-d7b0-4105-aab8-6338deb54b00') aborting: Task is aborted: u'[Errno 1] Operation not permitted' - code 100 (task:1 181) 2020-02-01 15:44:19,930+0600 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH getStorageDomainInfo error=[Errno 1] Operation not permitted (dispatcher:87) But i see that this domain is mounted (by mount command): storage:/volume3/ovirt-hosted on /rhev/data-center/mnt/storage:_volume3_ovirt-hosted type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.2.251,local_lock=none,addr=192.168.2.248) I didnt see storage directory in /var/run/vdsm? I see many differences with another hosts. Here is listing of var/run/vdsm: bonding-defaults.json dhclientmon nets_restored payload svdsm.sock v2v vhostuser bonding-name2numeric.json mom-vdsm.sock ovirt-imageio-daemon.sock supervdsmd.lock trackedInterfaces vdsmd.lock What whe problem? Please help.

Htis issue can be resolve by downgrading of the packages: Installing : vdsm-api-4.30.32-1.el7.noarch 1/26 Installing : vdsm-common-4.30.32-1.el7.noarch 2/26 Installing : vdsm-yajsonrpc-4.30.32-1.el7.noarch 3/26 Installing : vdsm-network-4.30.32-1.el7.x86_64 4/26 Installing : vdsm-python-4.30.32-1.el7.noarch 5/26 Installing : vdsm-jsonrpc-4.30.32-1.el7.noarch 6/26 Installing : vdsm-http-4.30.32-1.el7.noarch 7/26 Installing : vdsm-hook-vmfex-dev-4.30.32-1.el7.noarch 8/26 Installing : vdsm-4.30.32-1.el7.x86_64 9/26 Installing : vdsm-gluster-4.30.32-1.el7.x86_64 10/26 Installing : vdsm-hook-ethtool-options-4.30.32-1.el7.noarch 11/26 Installing : vdsm-hook-fcoe-4.30.32-1.el7.noarch 12/26 Installing : vdsm-client-4.30.32-1.el7.noarch 13/26 Cleanup : vdsm-client-4.30.33-1.el7.noarch 14/26 Cleanup : vdsm-hook-ethtool-options-4.30.33-1.el7.noarch 15/26 Cleanup : vdsm-gluster-4.30.33-1.el7.x86_64 16/26 Cleanup : vdsm-hook-fcoe-4.30.33-1.el7.noarch 17/26 Cleanup : vdsm-hook-vmfex-dev-4.30.33-1.el7.noarch 18/26 Cleanup : vdsm-4.30.33-1.el7.x86_64 19/26 Cleanup : vdsm-jsonrpc-4.30.33-1.el7.noarch 20/26 Cleanup : vdsm-http-4.30.33-1.el7.noarch 21/26 Cleanup : vdsm-python-4.30.33-1.el7.noarch 22/26 Cleanup : vdsm-network-4.30.33-1.el7.x86_64 23/26 Cleanup : vdsm-common-4.30.33-1.el7.noarch 24/26 Cleanup : vdsm-api-4.30.33-1.el7.noarch 25/26 Cleanup : vdsm-yajsonrpc-4.30.33-1.el7.noarch 26/26 In other words, something happened after the version 4.30.32-1. With 4.30.33-1 all working fine, after installing 4.30.33-1 - i see this issue.

Working fine with 4.30.32-1 of course, sorry.

On Saturday, February 1, 2020, <asm@pioner.kz> wrote:
Hi! I trying to upgrade my hosts and have problem with it. After uprgading one host i see that this one NonOperational. All was fine with vdsm-4.30.24-1.el7 but after upgrading with new version vdsm-4.30.40-1.el7.x86_64 and some others i have errors. Firtst of all i see in ovirt Events: Host srv02 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational. My Default storage domain with HE VM data on NFS storage.
In messages log of host: srv02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python2.7/site- packages/ovirt_hosted_engine_ha/agent/a gent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring ()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python2.7/site-packages/ ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 8 9, in start_monitor#012 ).format(t=type, o=options, e=e)#012RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '192.168.2.248'}] Feb 1 15:41:42 srv02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
In broker log: MainThread::WARNING::2020-02-01 15:43:35,167::storage_broker:: 97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with ar gs {'storagedomainID': 'bbdddea7-9cd6-41e7-ace5-fb9a6795caa8'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=bbdddea7-9cd6-41e7-ace5-fb9a6795caa8',))
In vdsm.lod 2020-02-01 15:44:19,930+0600 INFO (jsonrpc/0) [vdsm.api] FINISH getStorageDomainInfo error=[Errno 1] Operation not permitted from=::1,57528, task_id=40683f67-d7b0-4105-aab8-6338deb54b00 (api:52) 2020-02-01 15:44:19,930+0600 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='40683f67-d7b0-4105-aab8-6338deb54b00') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in getStorageDomainInfo File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2753, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 305, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce domain.getRealDomain() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 145, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__ manifest.sdUUID, manifest.mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size block_size = iop.probe_block_size(mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size return self._ioproc.probe_block_size(dir_path) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size "probe_block_size", {"dir": dir_path}, self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 1] Operation not permitted 2020-02-01 15:44:19,930+0600 INFO (jsonrpc/0) [storage.TaskManager.Task] (Task='40683f67-d7b0-4105-aab8-6338deb54b00') aborting: Task is aborted: u'[Errno 1] Operation not permitted' - code 100 (task:1 181) 2020-02-01 15:44:19,930+0600 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH getStorageDomainInfo error=[Errno 1] Operation not permitted (dispatcher:87)
Seems like a reproduction of https://bugzilla.redhat.com/show_bug.cgi?id=1777726#c1 Missing file creation/removal permissions on nfs storage.
But i see that this domain is mounted (by mount command): storage:/volume3/ovirt-hosted on /rhev/data-center/mnt/storage:_volume3_ovirt-hosted type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen= 255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec= sys,clientaddr=192.168.2.251,local_lock=none,addr=192.168.2.248)
I didnt see storage directory in /var/run/vdsm? I see many differences with another hosts. Here is listing of var/run/vdsm: bonding-defaults.json dhclientmon nets_restored payload svdsm.sock v2v vhostuser bonding-name2numeric.json mom-vdsm.sock ovirt-imageio-daemon.sock supervdsmd.lock trackedInterfaces vdsmd.lock What whe problem? Please help. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/IBUDRUOETQ5WCTZQMGIVBZZZUAITDVHL/

Ok, i will try to set 777 permissoin on NFS storage. But, why this issue starting from updating 4.30.32-1 to 4.30.33-1? Withowt any another changes.

On Sat, Feb 1, 2020 at 6:39 PM <asm@pioner.kz> wrote:
Ok, i will try to set 777 permissoin on NFS storage. But, why this issue starting from updating 4.30.32-1 to 4.30.33-1? Withowt any another changes.
The differing commit for 4.30.33 over 4.30.32 is the transition into block size probing done by ioprocess-1.3.0: https://github.com/oVirt/vdsm/commit/9bd210e340be0855126d1620cdb94840ced5612...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ICWRJ75Q7DIZDD...

On Sat, Feb 1, 2020 at 5:39 PM <asm@pioner.kz> wrote:
Ok, i will try to set 777 permissoin on NFS storage.
This is invalid configuration. See RHV docs for proper configuration: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/htm...
But, why this issue starting from updating 4.30.32-1 to 4.30.33-1? Withowt any another changes.
I guess you had wrong permissions and ownership on the storage before, but vdsm was not detecting the issue because it was missing validations in older versions. Current version is validating that creating and deleting files and using direct I/O works with the storage when creating and activating a storage domain. Nir

One thing not in any of the documentation I have found are the extra options required for the export. I followed all the docs and it was still failing. I had to add, “sync,no_subtree_check,all_squash,anonuid=36,anongid=36”, to my export in order to get it to work. From: Nir Soffer <nsoffer@redhat.com> Sent: Saturday, February 1, 2020 4:51 PM To: asm@pioner.kz Cc: users <users@ovirt.org> Subject: [ovirt-users] Re: Can't connect vdsm storage: Command StorageDomain.getInfo with args failed: (code=350, message=Error in storage domain action On Sat, Feb 1, 2020 at 5:39 PM <asm@pioner.kz<mailto:asm@pioner.kz>> wrote: Ok, i will try to set 777 permissoin on NFS storage. This is invalid configuration. See RHV docs for proper configuration: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/htm... But, why this issue starting from updating 4.30.32-1 to 4.30.33-1? Withowt any another changes. I guess you had wrong permissions and ownership on the storage before, but vdsm was not detecting the issue because it was missing validations in older versions. Current version is validating that creating and deleting files and using direct I/O works with the storage when creating and activating a storage domain. Nir

Yes. The problem with permission. But not with permissoin on export directory. Two ways to resolve this: 1) Make 777 permission on directory - dont right solution. 2) anonuid=36,anongid=36 in export - this right solution, very strange that this is not in any documentation, but very important! I has right permission on my export directory, but this is not working before i mke 777 chmod. After i chande parameters in exports file i return chmod to 755 and all working fine now! Thank you and all very much.
participants (5)
-
Alexandr Mikhailov
-
Amit Bawer
-
asm@pioner.kz
-
Nir Soffer
-
Robert Webb