February 2020 - Users - oVirt List Archives

Can't connect vdsm storage: Command StorageDomain.getInfo with args failed: (code=350, message=Error in storage domain action
by asm＠pioner.kz 02 Feb '20

02 Feb '20

Hi! I trying to upgrade my hosts and have problem with it. After uprgading one host i see that this one NonOperational. All was fine with vdsm-4.30.24-1.el7 but after upgrading with new version vdsm-4.30.40-1.el7.x86_64 and some others i have errors. Firtst of all i see in ovirt Events: Host srv02 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational. My Default storage domain with HE VM data on NFS storage. In messages log of host: srv02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/a gent.py", line 131, in _run_agent#012 return action(he)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper#012 return he.start_monitoring ()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring#012 self._initialize_broker()#012 File "/usr/lib/python2.7/site-packages/ ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker#012 m.get('options', {}))#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 8 9, in start_monitor#012 ).format(t=type, o=options, e=e)#012RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '192.168.2.248'}] Feb 1 15:41:42 srv02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent In broker log: MainThread::WARNING::2020-02-01 15:43:35,167::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with ar gs {'storagedomainID': 'bbdddea7-9cd6-41e7-ace5-fb9a6795caa8'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=bbdddea7-9cd6-41e7-ace5-fb9a6795caa8',)) In vdsm.lod 2020-02-01 15:44:19,930+0600 INFO (jsonrpc/0) [vdsm.api] FINISH getStorageDomainInfo error=[Errno 1] Operation not permitted from=::1,57528, task_id=40683f67-d7b0-4105-aab8-6338deb54b00 (api:52) 2020-02-01 15:44:19,930+0600 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='40683f67-d7b0-4105-aab8-6338deb54b00') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in getStorageDomainInfo File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2753, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 305, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce domain.getRealDomain() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 145, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__ manifest.sdUUID, manifest.mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size block_size = iop.probe_block_size(mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size return self._ioproc.probe_block_size(dir_path) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size "probe_block_size", {"dir": dir_path}, self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 1] Operation not permitted 2020-02-01 15:44:19,930+0600 INFO (jsonrpc/0) [storage.TaskManager.Task] (Task='40683f67-d7b0-4105-aab8-6338deb54b00') aborting: Task is aborted: u'[Errno 1] Operation not permitted' - code 100 (task:1 181) 2020-02-01 15:44:19,930+0600 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH getStorageDomainInfo error=[Errno 1] Operation not permitted (dispatcher:87) But i see that this domain is mounted (by mount command): storage:/volume3/ovirt-hosted on /rhev/data-center/mnt/storage:_volume3_ovirt-hosted type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.2.251,local_lock=none,addr=192.168.2.248) I didnt see storage directory in /var/run/vdsm? I see many differences with another hosts. Here is listing of var/run/vdsm: bonding-defaults.json dhclientmon nets_restored payload svdsm.sock v2v vhostuser bonding-name2numeric.json mom-vdsm.sock ovirt-imageio-daemon.sock supervdsmd.lock trackedInterfaces vdsmd.lock What whe problem? Please help.

5 8

Re: Can't connect vdsm storage: Command StorageDomain.getInfo with args failed: (code=350, message=Error in storage domain action
by Alexandr Mikhailov 02 Feb '20

02 Feb '20

Yes. The problem with permission. But not with permissoin on export directory. Two ways to resolve this: 1) Make 777 permission on directory - dont right solution. 2) anonuid=36,anongid=36 in export - this right solution, very strange that this is not in any documentation, but very important! I has right permission on my export directory, but this is not working before i mke 777 chmod. After i chande parameters in exports file i return chmod to 755 and all working fine now! Thank you and all very much.

1 0

Re: Can't connect vdsm storage: Command StorageDomain.getInfo with args failed: (code=350, message=Error in storage domain action
by asm＠pioner.kz 02 Feb '20

02 Feb '20

First of all i have check permission on my storage. And all permission is right: group kvm (36) and user vdsm (36), chown 36:36 and chmod 0755. After i made 0777 permissions and updating all working fine! My NFS storage on Synology NAS, i now i trying to find a way to make anonuid=36,anongid=36 in the export by right way, not by editing of exports file. After i find it, i will try to use it and 755 permission. Thanks to ALL! BR, Alex

1 0

Device /dev/sdb excluded by a filter.\n
by Steve Watkins 01 Feb '20

01 Feb '20

Since I managed to crash my last attempt at installing by uploading an ISO, I wound up just reloading all the nodes and starting from scratch. Now one node gets "Device /dev/sdb excluded by a filter.\n" and fails when creating the volumes. Can't seem to get passed that -- the other machiens are set up identally and don't fail, and it worked before when installed but now... Any ideas?

2 2

oVirt upgrade problems...
by matteo fedeli 01 Feb '20

01 Feb '20

Hi at all, I have many problems after upgrade ovirt version. I come from 4.3.5.2. When I upgraded to 4.3.7 my three host (hyperconverged emviroment) I passed a few days of instability: HA agent down gluster problem... When I rebooted for the umpteenth time (after reinitialized lockspace, heal, heal full...) all cames up ha agent and broker included. Two days after arrived 4.3.8, so i decided to start the update, america with success, europa after a various problems (as described before) came up. Asia host, after 30 second of updating the engine goes down together with all three nodes ... Are there any known important bugs? This is the state of my nodes: https://pastebin.com/2XsnTyHi https://pastebin.com/ZeKTdaZ7 https://pastebin.com/ZDUBg4vG

2 2

Gluster Heal Issue
by Christian Reiss 01 Feb '20

01 Feb '20

Hey folks, in our production setup with 3 nodes (HCI) we took one host down (maintenance, stop gluster, poweroff via ssh/ovirt engine). Once it was up the gluster hat 2k healing entries that went down in a matter on 10 minutes to 2. Those two give me a headache: [root@node03:~] # gluster vol heal ssd_storage info Brick node01:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2 Brick node02:/gluster_bricks/ssd_storage/ssd_storage Status: Connected Number of entries: 0 Brick node03:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2 No paths, only gfid. We took down node2, so it does not have the file: [root@node01:~] # md5sum /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 75c4941683b7eabc223fc9d5f022a77c /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 [root@node02:~] # md5sum /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 md5sum: /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6: No such file or directory [root@node03:~] # md5sum /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 75c4941683b7eabc223fc9d5f022a77c /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 The other two files are md5-identical. These flags are identical, too: [root@node01:~] # getfattr -d -m . -e hex /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6 trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030 trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6 trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030 trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2 Now, I dont dare simply proceeding withouth some advice. Anyone got a clue on who to resolve this issue? File #2 is identical to this one, from a problem point of view. Have a great weekend! -Chris. -- with kind regards, mit freundlichen Gruessen, Christian Reiss

4 6

Ovirt-engine-ha cannot to see live status of Hosted Engine
by asm＠pioner.kz 01 Feb '20

01 Feb '20

Good day for all. I have some issues with Ovirt 4.2.6. But now the main this of it: I have two Centos 7 Nodes with same config and last Ovirt 4.2.6 with Hostedengine with disk on NFS storage. Also some of virtual machines working good. But, when HostedEngine running on one node (srv02.local) everything is fine. After migrating to another node (srv00.local), i see that agent cannot to check livelinness of HostedEngine. After few minutes HostedEngine going to reboot and after some time i see some situation. After migration to another node (srv00.local) all looks OK. hosted-engine --vm-status commang when HosterEngine on srv00 node: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv02.local Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : ecc7ad2d local_conf_timestamp : 78328 Host timestamp : 78328 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=78328 (Tue Sep 18 12:44:18 2018) host-id=1 score=0 vm_conf_refresh_time=78328 (Tue Sep 18 12:44:18 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Fri Jan 2 03:49:58 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv00.local Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1d62b106 local_conf_timestamp : 326288 Host timestamp : 326288 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=326288 (Tue Sep 18 12:44:21 2018) host-id=2 score=3400 vm_conf_refresh_time=326288 (Tue Sep 18 12:44:21 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Log agent.log from srv00.local: MainThread::INFO::2018-09-18 12:40:51,749::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:40:52,052::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:01,066::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:01,374::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:11,393::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Host srv02.local.pioner.kz (id 1): {'conf_on_shared_storage': True, 'extra': 'meta data_parse_version=1\nmetadata_feature_version=1\ntimestamp=78128 (Tue Sep 18 12:40:58 2018)\nhost-id=1\ns core=0\nvm_conf_refresh_time=78128 (Tue Sep 18 12:40:58 2018)\nconf_on_shared_storage=True\nmaintenance=Fa lse\nstate=EngineUnexpectedlyDown\nstopped=False\ntimeout=Fri Jan 2 03:49:58 1970\n', 'hostname': 'srv02. local.pioner.kz', 'alive': True, 'host-id': 1, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down_unexpected', 'detail': 'unknown'}, 'score': 0, 'stopped': False, 'maintenance ': False, 'crc32': 'e18e3f22', 'local_conf_timestamp': 78128, 'host-ts': 78128} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Local (id 2): {'engine-health': {'reason': 'failed liveliness check', 'health': 'b ad', 'vm': 'up', 'detail': 'Up'}, 'bridge': True, 'mem-free': 12763.0, 'maintenance': False, 'cpu-load': 0 .0364, 'gateway': 1.0, 'storage-domain': True} MainThread::INFO::2018-09-18 12:41:11,393::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:11,703::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:21,716::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:22,020::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:31,033::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:31,344::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) As we can see, agent thinking that HostedEngine just in powering up mode. I cannot to do anythink with it. I allready reinstalled many times srv00 node without success. One time i even has to uninstall ovirt* and vdsm* software. Also here one interesting point, after installing just "yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release42.rpm" on this node i try to install this node from engine web interface with "Deploy" action. But, installation was unsuccesfull, before i didnt install ovirt-hosted-engine-ha on this node. I dont see in documentation that its need bofore installation of new hosts. But this is for information and checking. After installing ovirt-hosted-engine-ha node was installed with HostedEngine support. But the main issue not changed. Thanks in advance for help. BR, Alexandr

3 5