November 2019 - Users - Ovirt List Archives

Ovirt-engine-ha cannot to see live status of Hosted Engine

by asm＠pioner.kz

Good day for all. I have some issues with Ovirt 4.2.6. But now the main this of it: I have two Centos 7 Nodes with same config and last Ovirt 4.2.6 with Hostedengine with disk on NFS storage. Also some of virtual machines working good. But, when HostedEngine running on one node (srv02.local) everything is fine. After migrating to another node (srv00.local), i see that agent cannot to check livelinness of HostedEngine. After few minutes HostedEngine going to reboot and after some time i see some situation. After migration to another node (srv00.local) all looks OK. hosted-engine --vm-status commang when HosterEngine on srv00 node: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv02.local Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : ecc7ad2d local_conf_timestamp : 78328 Host timestamp : 78328 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=78328 (Tue Sep 18 12:44:18 2018) host-id=1 score=0 vm_conf_refresh_time=78328 (Tue Sep 18 12:44:18 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Fri Jan 2 03:49:58 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv00.local Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1d62b106 local_conf_timestamp : 326288 Host timestamp : 326288 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=326288 (Tue Sep 18 12:44:21 2018) host-id=2 score=3400 vm_conf_refresh_time=326288 (Tue Sep 18 12:44:21 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Log agent.log from srv00.local: MainThread::INFO::2018-09-18 12:40:51,749::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:40:52,052::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:01,066::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:01,374::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:11,393::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Host srv02.local.pioner.kz (id 1): {'conf_on_shared_storage': True, 'extra': 'meta data_parse_version=1\nmetadata_feature_version=1\ntimestamp=78128 (Tue Sep 18 12:40:58 2018)\nhost-id=1\ns core=0\nvm_conf_refresh_time=78128 (Tue Sep 18 12:40:58 2018)\nconf_on_shared_storage=True\nmaintenance=Fa lse\nstate=EngineUnexpectedlyDown\nstopped=False\ntimeout=Fri Jan 2 03:49:58 1970\n', 'hostname': 'srv02. local.pioner.kz', 'alive': True, 'host-id': 1, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down_unexpected', 'detail': 'unknown'}, 'score': 0, 'stopped': False, 'maintenance ': False, 'crc32': 'e18e3f22', 'local_conf_timestamp': 78128, 'host-ts': 78128} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Local (id 2): {'engine-health': {'reason': 'failed liveliness check', 'health': 'b ad', 'vm': 'up', 'detail': 'Up'}, 'bridge': True, 'mem-free': 12763.0, 'maintenance': False, 'cpu-load': 0 .0364, 'gateway': 1.0, 'storage-domain': True} MainThread::INFO::2018-09-18 12:41:11,393::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:11,703::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:21,716::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:22,020::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:31,033::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:31,344::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) As we can see, agent thinking that HostedEngine just in powering up mode. I cannot to do anythink with it. I allready reinstalled many times srv00 node without success. One time i even has to uninstall ovirt* and vdsm* software. Also here one interesting point, after installing just "yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release42.rpm" on this node i try to install this node from engine web interface with "Deploy" action. But, installation was unsuccesfull, before i didnt install ovirt-hosted-engine-ha on this node. I dont see in documentation that its need bofore installation of new hosts. But this is for information and checking. After installing ovirt-hosted-engine-ha node was installed with HostedEngine support. But the main issue not changed. Thanks in advance for help. BR, Alexandr

5 years, 5 months

3
5
0 / 0

Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

by Strahil

I got upgraded to RC3 and now cannot power any VM . Constantly getting I/O error, but checking at gluster level - I can dd from each disk or even create a new one. Removing the HighAvailability doesn't help. I guess I should restore the engine from the gluster snapshot and rollback via 'yum history undo last'. Does anyone else have my issues ? Best Regards, Strahil NikolovOn Nov 13, 2019 15:31, Sandro Bonazzola <sbonazzo(a)redhat.com> wrote: > > > > Il giorno mer 13 nov 2019 alle ore 14:25 Sandro Bonazzola <sbonazzo(a)redhat.com> ha scritto: >> >> >> >> Il giorno mer 13 nov 2019 alle ore 13:56 Florian Schmid <fschmid(a)ubimet.com> ha scritto: >>> >>> Hello, >>> >>> I have a question about bugs, which are flagged as [downstream clone - 4.3.7], but are not yet released. >>> >>> I'm talking about this bug: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1749202 >>> >>> I can't see it in 4.3.7 release notes. Will it be included in a further release candidate? This fix is very important I think and I can't upgrade yet because of this bug. >> >> >> >> Looking at the bug, the fix was done with $ git tag --contains 12bd5cb1fe7c95e29b4065fca968913722fe9eaa >> ovirt-engine-4.3.6.6 >> ovirt-engine-4.3.6.7 >> ovirt-engine-4.3.7.0 >> ovirt-engine-4.3.7.1 >> >> So the fix is already included in release oVirt 4.3.6. > > > Sent a fix to 4.3.6 release notes: https://github.com/oVirt/ovirt-site/pull/2143. @Ryan Barry can you please review? > > >> >> >> >> >> >> >>> >>> >>> BR Florian Schmid >>> >>> ________________________________ >>> Von: "Sandro Bonazzola" <sbonazzo(a)redhat.com> >>> An: "users" <users(a)ovirt.org> >>> Gesendet: Mittwoch, 13. November 2019 13:34:59 >>> Betreff: [ovirt-users] [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing >>> >>> The oVirt Project is pleased to announce the availability of the oVirt 4.3.7 Third Release Candidate for testing, as of November 13th, 2019. >>> >>> This update is a release candidate of the seventh in a series of stabilization updates to the 4.3 series. >>> This is pre-release software. This pre-release should not to be used in production. >>> >>> This release is available now on x86_64 architecture for: >>> * Red Hat Enterprise Linux 7.7 or later (but <8) >>> * CentOS Linux (or similar) 7.7 or later (but <8) >>> >>> This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: >>> * Red Hat Enterprise Linux 7.7 or later (but <8) >>> * CentOS Linux (or similar) 7.7 or later (but <8) >>> * oVirt Node 4.3 (available for x86_64 only) has been built consuming CentOS 7.7 Release >>> >>> See the release notes [1] for known issues, new features and bugs fixed. >>> >>> While testing this release candidate please note that oVirt node now includes: >>> - ansible 2.9.0 >>> - GlusterFS 6.6 >>> >>> Notes: >>> - oVirt Appliance is already available >>> - oVirt Node is already available >>> >>> Additional Resources: >>> * Read more about the oVirt 4.3.7 release highlights: http://www.ovirt.org/release/4.3.7/ >>> * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt >>> * Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/ >>> >>> [1] http://www.ovirt.org/release/4.3.7/ >>> [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ >>> >>> -- >>> >>> Sandro Bonazzola >>> >>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>> >>> Red Hat EMEA >>> >>> sbonazzo(a)redhat.com >>> >>> Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. >>> >>> _______________________________________________ >>> Users mailing list -- users(a)ovirt.org >>> To unsubscribe send an email to users-leave(a)ovirt.org >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/24QUREJPZHT... >> >> >> >> -- >> >> Sandro Bonazzola >> >> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >> >> Red Hat EMEA >> >> sbonazzo(a)redhat.com >> >> Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. > > > > -- > > Sandro Bonazzola > > MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV > > Red Hat EMEA > > sbonazzo(a)redhat.com > > Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.

5 years, 5 months

7
20
0 / 0

Low disk space on Storage

by suporte＠logicworks.pt

Hi, I'm running ovirt Version:4.3.4.3-1.el7 My filesystem disk has 30 GB free space. Cannot start a VM due to an I/O error storage. When tryng to move the disk to another storage domain get this error: Error while executing action: Cannot move Virtual Disk. Low disk space on Storage Domain DATA4. The sum of pre-allocated disk is the total of the storage domain disk. Any idea what can I do to move a disk to other storage domain? Many thanks -- Jose Ferradeira http://www.logicworks.pt

5 years, 6 months

5
5
0 / 0

Hyperconverged setup - storage architecture - scaling

by Leo David

Hello Everyone, Reading through the document: "Red Hat Hyperconverged Infrastructure for Virtualization 1.5 Automating RHHI for Virtualization deployment" Regarding storage scaling, i see the following statements: *2.7. SCALINGRed Hat Hyperconverged Infrastructure for Virtualization is supported for one node, and for clusters of 3, 6, 9, and 12 nodes.The initial deployment is either 1 or 3 nodes.There are two supported methods of horizontally scaling Red Hat Hyperconverged Infrastructure for Virtualization:* *1 Add new hyperconverged nodes to the cluster, in sets of three, up to the maximum of 12 hyperconverged nodes.* *2 Create new Gluster volumes using new disks on existing hyperconverged nodes.You cannot create a volume that spans more than 3 nodes, or expand an existing volume so that it spans across more than 3 nodes at a time* *2.9.1. Prerequisites for geo-replicationBe aware of the following requirements and limitations when configuring geo-replication:One geo-replicated volume onlyRed Hat Hyperconverged Infrastructure for Virtualization (RHHI for Virtualization) supports only one geo-replicated volume. Red Hat recommends backing up the volume that stores the data of your virtual machines, as this is usually contains the most valuable data.* ------ Also in oVirtEngine UI, when I add a brick to an existing volume i get the following warning: *"Expanding gluster volume in a hyper-converged setup is not recommended as it could lead to degraded performance. To expand storage for cluster, it is advised to add additional gluster volumes." * Those things are raising a couple of questions that maybe for some for you guys are easy to answer, but for me it creates a bit of confusion... I am also referring to RedHat product documentation, because I treat oVirt as production-ready as RHHI is. *1*. Is there any reason for not going to distributed-replicated volumes ( ie: spread one volume across 6,9, or 12 nodes ) ? - ie: is recomanded that in a 9 nodes scenario I should have 3 separated volumes, but how should I deal with the folowing question *2.* If only one geo-replicated volume can be configured, how should I deal with 2nd and 3rd volume replication for disaster recovery *3.* If the limit of hosts per datacenter is 250, then (in theory ) the recomended way in reaching this treshold would be to create 20 separated oVirt logical clusters with 12 nodes per each ( and datacenter managed from one ha-engine ) ? *4.* In present, I have the folowing one 9 nodes cluster , all hosts contributing with 2 disks each to a single replica 3 distributed replicated volume. They where added to the volume in the following order: node1 - disk1 node2 - disk1 ...... node9 - disk1 node1 - disk2 node2 - disk2 ...... node9 - disk2 At the moment, the volume is arbitrated, but I intend to go for full distributed replica 3. Is this a bad setup ? Why ? It oviously brakes the redhat recommended rules... Is there anyone so kind to discuss on these things ? Thank you very much ! Leo -- Best regards, Leo David -- Best regards, Leo David

5 years, 6 months

3
5
0 / 0

AWX and error using ovirt as an inventory source

by Gianluca Cecchi

Hello, I have awx 9.0.1 and ansible 2.8.5 in container of a CentOS 7.7 server. I'm trying to use oVirt 4.3.6.7-1.el7 as a source of an inventory in awx but I get error when syncing Find at bottom below the error messages. I see that in recent past (around June this year) there were some problems, but they should be solved now, correct? There was also a problem in syncing when some powered off VMs were present in oVirt env, but I think this solved too, correct? Any way to replicate / test from command line of awx container? I try some things but in command line I always get error regarding oVirt inventory script requires ovirt-engine-sdk-python >= 4.0.0 that I think depends on not using correct command line and/or not setting needed env. Thanks in advance, Gianluca 2.536 INFO Updating inventory 4: MYDC_OVIRT 3.011 INFO Reading Ansible inventory source: /var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/plugins/inventory/ovirt4.py 3.013 INFO Using VIRTUAL_ENV: /var/lib/awx/venv/ansible 3.013 INFO Using PATH: /var/lib/awx/venv/ansible/bin:/var/lib/awx/venv/awx/bin:/var/lib/awx/venv/awx/bin:/var/lib/awx/venv/awx/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 3.013 INFO Using PYTHONPATH: /var/lib/awx/venv/ansible/lib/python3.6/site-packages: Traceback (most recent call last): File "/var/lib/awx/venv/awx/bin/awx-manage", line 11, in <module> load_entry_point('awx==9.0.1.0', 'console_scripts', 'awx-manage')() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/__init__.py", line 158, in manage execute_from_command_line(sys.argv) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line utility.execute() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv self.execute(*args, **cmd_options) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/management/base.py", line 364, in execute output = self.handle(*args, **options) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/inventory_import.py", line 1153, in handle raise exc File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/inventory_import.py", line 1043, in handle venv_path=venv_path, verbosity=self.verbosity).load() File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/inventory_import.py", line 214, in load return self.command_to_json(base_args + ['--list']) File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/inventory_import.py", line 197, in command_to_json self.method, proc.returncode, stdout, stderr)) RuntimeError: ansible-inventory failed (rc=1) with stdout: stderr: ansible-inventory 2.8.5 config file = /etc/ansible/ansible.cfg configured module search path = ['/var/lib/awx/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.6/site-packages/ansible executable location = /usr/bin/ansible-inventory python version = 3.6.8 (default, Oct 7 2019, 17:58:22) [GCC 8.2.1 20180905 (Red Hat 8.2.1-3)] Using /etc/ansible/ansible.cfg as config file [WARNING]: * Failed to parse /var/lib/awx/venv/awx/lib64/python3.6/site- packages/awx/plugins/inventory/ovirt4.py with script plugin: Inventory script (/var/lib/awx/venv/awx/lib64/python3.6/site- packages/awx/plugins/inventory/ovirt4.py) had an execution error: File "/usr/lib/python3.6/site-packages/ansible/inventory/manager.py", line 268, in parse_source plugin.parse(self._inventory, self._loader, source, cache=cache) File "/usr/lib/python3.6/site-packages/ansible/plugins/inventory/script.py", line 161, in parse raise AnsibleParserError(to_native(e)) [WARNING]: Unable to parse /var/lib/awx/venv/awx/lib64/python3.6/site- packages/awx/plugins/inventory/ovirt4.py as an inventory source ERROR! No inventory was parsed, please check your configuration and options.

5 years, 6 months

3
23
0 / 0

ovirt hosted-engine on iSCSI offering one target

by wodel youchi

Hi, We have an oVirt Platforme using the 4.1 version. when the platforme was installed, it was made of : - Two HP Proliant DL380 G9 as hypervisors - One HP MSA1040 for iSCSI - One Synology for NFS - Two switches, one for network/vm traffic, the second for storage traffic. The problem : the hosted-engine domain was created using iSCSI on the HP MSA. The problem is that this disk array does not give the possibility to create different targets, it presents just one target. At that time we create both the hosted-engine and the first data domain using the same target, and we didn't pay attention to the information saying "i*f you are using iSCSI storage, do not use the same iSCSI target for the shared storage domain and data storage domain*". Question : - what problems can be generated by this (mis-)configuration? - Is this a must to do (correct) configuration. Regards.

5 years, 6 months

3
2
0 / 0

Unable to attach ISO domain to Datacenter

by Ivan de Gusmão Apolonio

I'm having trouble to create a storage ISO Domain and attach it to a Datacenter. It just give me this error message: Error while executing action Attach Storage Domain: Could not obtain lock Also the oVirt's Engine log files show this error message: "setsid: failed to execute /usr/bin/ionice: Permission denied", but I was unable to identify what exactly it's trying to do to get this permission denied. 2019-11-14 16:46:07,779-03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-7388) [86161370-2aaa-4eff-9aab-c184bdf5bb98] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command AttachStorageDomainVDS failed: Cannot obtain lock: u"id=e6b34c42-0ca6-41f4-be3e-3c9b2af1747b, rc=1, out=[], err=['setsid: failed to execute /usr/bin/ionice: Permission denied']" This behavior just happens on ISO Domains, while Data Domains works fine. I have read oVirt documentation and searched everywhere but I was unable to find the solution for this issue. I'm using CentOS 7 with last update of all packages (oVirt version 4.3.6.7). Please help! Thanks, Ivan de Gusmão Apolonio

5 years, 7 months

6
18
1 / 0

Gluster set up fails - Nearly there I think...

by rob.downer＠orbitalsystems.co.uk

Gluster fails with vdo: ERROR - Device /dev/sdb excluded by a filter.\n", however I have run [root@ovirt1 ~]# vdo create --name=vdo1 --device=/dev/sdb --force Creating VDO vdo1 Starting VDO vdo1 Starting compression on VDO vdo1 VDO instance 1 volume is ready at /dev/mapper/vdo1 [root@ovirt1 ~]# there are no filters in lvm.conf I have run wipefs -a /dev/sdb —force on all hosts before start

5 years, 7 months

5
9
0 / 0

Re: Cannot obtain information from export domain

by Strahil

Hi can you describe your actions? Usually the export is like this: 1. You make a backup of the VM 2. You migrate the disks to the export storage domain 3. You shut down the VM 4. Set the storage domain in maintenance and then detach it from the oVirt 5. You atttach it to the new oVirt 6. Once the domain is active - click on import VM tab and import all VMs (defining the cluster you want them to be running on) 7. Power up VM and then migrate the disks to the permanent storage. Best Regards, Strahil NikolovOn Nov 26, 2019 19:41, Arthur Rodrigues Stilben <arthur.stilben(a)gmail.com> wrote: > > Hello everyone, > > I'm trying to export a virtual machine, but I'm getting the following error: > > 2019-11-26 16:30:06,250-02 ERROR > [org.ovirt.engine.core.bll.exportimport.GetVmsFromExportDomainQuery] > (default task-22) [b9a0b9d5-2127-4002-9cee-2e3525bccc89] Exception: > org.ovirt.engine.core.common.errors.EngineException: EngineException: > org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: > IRSGenericException: IRSErrorException: Failed to GetVmsInfoVDS, error = > Storage domain does not exist: > (u'5ac6c35d-0406-4a06-a682-ed8fb2d1933f',), code = 358 (Failed with > error StorageDomainDoesNotExist and code 358) > > 2019-11-26 16:30:06,249-02 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (default task-22) [b9a0b9d5-2127-4002-9cee-2e3525bccc89] EVENT_ID: > IMPORTEXPORT_GET_VMS_INFO_FAILED(200), Correlation ID: null, Call Stack: > null, Custom ID: null, Custom Event ID: -1, Message: Failed to retrieve > VM/Templates information from export domain BackupMV > > The version of the oVirt that I am using is 4.1. > > Att, > > -- > Arthur Rodrigues Stilben > _______________________________________________ > Users mailing list -- users(a)ovirt.org > To unsubscribe send an email to users-leave(a)ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z6FQ45UVQOD...

5 years, 7 months

2
1
0 / 0

Cannot activate/deactivate storage domain

by Albl, Oliver

Hi all, I run an oVirt 4.3.6.7-1.el7 installation (50+ hosts, 40+ FC storage domains on two all-flash arrays) and experienced a problem accessing single storage domains. As a result, hosts were taken "not operational" because they could not see all storage domains, SPM started to move around the hosts. oVirt messages start with: 2019-11-04 15:10:22.739+01 | VDSM HOST082 command SpmStatusVDS failed: (-202, 'Sanlock resource read failure', 'IO timeout') 2019-11-04 15:10:22.781+01 | Invalid status on Data Center <name>. Setting Data Center status to Non Responsive (On host HOST82, Error: General Exception). ... 2019-11-04 15:13:58.836+01 | Host HOST017 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational. 2019-11-04 15:13:58.85+01 | Host HOST005 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational. 2019-11-04 15:13:58.85+01 | Host HOST012 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational. 2019-11-04 15:13:58.851+01 | Host HOST002 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational. 2019-11-04 15:13:58.851+01 | Host HOST010 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational. 2019-11-04 15:13:58.851+01 | Host HOST011 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational. 2019-11-04 15:13:58.852+01 | Host HOST004 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational. 2019-11-04 15:13:59.011+01 | Host HOST017 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational. 2019-11-04 15:13:59.238+01 | Host HOST004 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational. 2019-11-04 15:13:59.249+01 | Host HOST005 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational. 2019-11-04 15:13:59.255+01 | Host HOST012 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational. 2019-11-04 15:13:59.273+01 | Host HOST002 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational. 2019-11-04 15:13:59.279+01 | Host HOST010 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational. 2019-11-04 15:13:59.386+01 | Host HOST011 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational. 2019-11-04 15:15:14.145+01 | Storage domain HOST_LUN_221 experienced a high latency of 9.60953 seconds from host HOST038. This may cause performance and functional issues. Please consult your Storage Administrator. The problem mainly affected two storage domains (on the same array) but I also saw single messages for other storage domains (one the other array as well). Storage domains stayed available to the hosts, all VMs continued to run. When constantly reading from the storage domains (/bin/dd iflag=direct if=<metadata> bs=4096 count=1 of=/dev/null) we got expected 20+ MBytes/s on all but some storage domains. One of them showed "transfer rates" around 200 Bytes/s, but went up to normal performance from time to time. Transfer rate to this domain was different between the hosts. /var/log/messages contain qla2xxx abort messages on almost all hosts. There are no errors on SAN switches or storage array (but vendor is still investigating). I did not see high load on the storage array. The system seemed to stabilize when I stopped all VMs on the affected storage domain and this storage domain became "inactive". Currently, this storage domain still is inactive and we cannot place it in maintenance mode ("Failed to deactivate Storage Domain") nor activate it. OVF Metadata seems to be corrupt as well (failed to update OVF disks <id>, OVF data isn't updated on those OVF stores). The first six 512 byte blocks of /dev/<id>/metadata seem to contain only zeros. Any advice on how to proceed here? Is there a way to recover this storage domain? All the best, Oliver

5 years, 7 months

4
13
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Users November 2019