September 2021 - Users - oVirt List Archives

Cinderlib RBD ceph template issues
by Sketch 01 Feb '22

01 Feb '22

This is on oVirt 4.4.8, engine on CS8, hosts on C8, cluster and DC are both set to 4.6. With a newly configured cinderlib/ceph RBD setup. I can create new VM images, and copy existing VM images, but I can't copy existing template images to RBD. When I do, I try, I get this error in cinderlib.log (see below), which sounds like the disk already exists there, but it definitely does not. This leaves me unable to create new VMs on RBD, only migrate existing VM disks. 2021-09-01 04:31:05,881 - cinder.volume.driver - INFO - Driver hasn't implemented _init_vendor_properties() 2021-09-01 04:31:05,882 - cinderlib-client - INFO - Creating volume '0e8b9aca-1eb1-4837-ac9e-cb3d8f4c1676', with size '500' GB [5c5d0a6b] 2021-09-01 04:31:05,943 - cinderlib-client - ERROR - Failure occurred when trying to run command 'create_volume': Entity '<class 'cinder.db.sqlalchemy.models.Volume'>' has no property 'glance_metadata' [5c5d0a6b] 2021-09-01 04:31:05,944 - cinder - CRITICAL - Unhandled error Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/cinderlib/objects.py", line 455, in create self._raise_with_resource() File "/usr/lib/python3.6/site-packages/cinderlib/objects.py", line 222, in _raise_with_resource six.reraise(*exc_info) File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/usr/lib/python3.6/site-packages/cinderlib/objects.py", line 448, in create model_update = self.backend.driver.create_volume(self._ovo) File "/usr/lib/python3.6/site-packages/cinder/volume/drivers/rbd.py", line 986, in create_volume features=client.features) File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit result = proxy_call(self._autowrap, f, *args, **kwargs) File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call rv = execute(f, *args, **kwargs) File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute six.reraise(c, e, tb) File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker rv = meth(*args, **kwargs) File "rbd.pyx", line 629, in rbd.RBD.create rbd.ImageExists: [errno 17] RBD image already exists (error creating image) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/base.py", line 399, in _entity_descriptor return getattr(entity, key) AttributeError: type object 'Volume' has no attribute 'glance_metadata' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./cinderlib-client.py", line 170, in main args.command(args) File "./cinderlib-client.py", line 208, in create_volume backend.create_volume(int(args.size), id=args.volume_id) File "/usr/lib/python3.6/site-packages/cinderlib/cinderlib.py", line 175, in create_volume vol.create() File "/usr/lib/python3.6/site-packages/cinderlib/objects.py", line 457, in create self.save() File "/usr/lib/python3.6/site-packages/cinderlib/objects.py", line 628, in save self.persistence.set_volume(self) File "/usr/lib/python3.6/site-packages/cinderlib/persistence/dbms.py", line 254, in set_volume self.db.volume_update(objects.CONTEXT, volume.id, changed) File "/usr/lib/python3.6/site-packages/cinder/db/sqlalchemy/api.py", line 236, in wrapper return f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/cinder/db/sqlalchemy/api.py", line 184, in wrapper return f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/cinder/db/sqlalchemy/api.py", line 2570, in volume_update result = query.filter_by(id=volume_id).update(values) File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3818, in update update_op.exec_() File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1670, in exec_ self._do_pre_synchronize() File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1743, in _do_pre_synchronize self._additional_evaluators(evaluator_compiler) File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1912, in _additional_evaluators values = self._resolved_values_keys_as_propnames File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1831, in _resolved_values_keys_as_propnames for k, v in self._resolved_values: File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1818, in _resolved_values desc = _entity_descriptor(self.mapper, k) File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/base.py", line 402, in _entity_descriptor "Entity '%s' has no property '%s'" % (description, key) sqlalchemy.exc.InvalidRequestError: Entity '<class 'cinder.db.sqlalchemy.models.Volume'>' has no property 'glance_metadata' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./cinderlib-client.py", line 390, in <module> sys.exit(main(sys.argv[1:])) File "./cinderlib-client.py", line 176, in main sys.stderr.write(traceback.format_exc(e)) File "/usr/lib64/python3.6/traceback.py", line 167, in format_exc return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain)) File "/usr/lib64/python3.6/traceback.py", line 121, in format_exception type(value), value, tb, limit=limit).format(chain=chain)) File "/usr/lib64/python3.6/traceback.py", line 498, in __init__ _seen=_seen) File "/usr/lib64/python3.6/traceback.py", line 498, in __init__ _seen=_seen) File "/usr/lib64/python3.6/traceback.py", line 509, in __init__ capture_locals=capture_locals) File "/usr/lib64/python3.6/traceback.py", line 338, in extract if limit >= 0: TypeError: '>=' not supported between instances of 'InvalidRequestError' and 'int'

3 4

Import an exported VM using Ansible
by paolo＠airaldi.it 25 Jan '22

25 Jan '22

Hello everybody! I'm trying to automate a copy of a VM from one Datacenter to another using an Ansible.playbook. I'm able to: - Create a snapshot of the source VM - create a clone from the snapshot - remove the snapshot - attach an Export Domain - export the clone to the Export Domain - remove the clone - detach the Export domain from the source Datacenter and attach to the destination. Unfortunately I cannot find a module to: - import the VM from the Export Domain - delete the VM image from the Export Domain. Any hint on how to do that? Thanks in advance. Cheers. Paolo PS: if someone is interested I can share the playbook.

4 3

did 4.3.9 reset bug https://bugzilla.redhat.com/show_bug.cgi?id=1590266
by kelley bryan 10 Jan '22

10 Jan '22

I am experiencing the error message in the ovirt-hosted-engine-setup-ansible-create_target_vm log {2020-05-06 14:15:30,024-0500 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u"Fail if Engine IP is different from engine's he_fqdn resolved IP", 'ansible_result': u'type: <type \'dict\'>\nstr: {\'msg\': u"Engine VM IP address is while the engine\'s he_fqdn ovirt1-engine.kelleykars.org resolves to 192.168.122.2. If you are using DHCP, check your DHCP reservation configuration", \'changed\': False, \'_ansible_no_log\': False}', 'task_duration': 1, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'}}:Q! The bug 1590266 says it should report the engine VM IP address xxx.xxx.xxx.xxx while the Engines he_fqdn is xxxxxxxxx I need to see what it thins is wrong as both dig fqdn engine name and dig -x ip return the correct information. Now this bug looks like it may play but I don't see the failed rediness check in the this log https://access.redhat.com/solutions/4462431 or is it because the vm fails or dies or ???

3 7

Lots of storage.MailBox.SpmMailMonitor
by Fabrice Bacchella 06 Jan '22

06 Jan '22

My vdsm log files are huge: -rw-r--r-- 1 vdsm kvm 1.8G Nov 22 11:32 vdsm.log And this is juste half an hour of logs: $ head -1 vdsm.log 2018-11-22 11:01:12,132+0100 ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 2 checksum failed, not clearing mailbox, clearing new mail (data='...lots of data', expected='\xa4\x06\x08\x00') (mailbox:612) I just upgraded vdsm: $ rpm -qi vdsm Name : vdsm Version : 4.20.43

3 5

How to renew vmconsole-proxy* certificates
by capelle＠labri.fr 19 Nov '21

19 Nov '21

Hi, Since a few weeks, we are not able to connect to the vmconsole proxy: $ ssh -t -p 2222 ovirt-vmconsole@ovirt ovirt-vmconsole@ovirt: Permission denied (publickey). Last successful login record: Mar 29 11:31:32 First login failure record: Mar 31 17:28:51 We tracked the issue to the following log in /var/log/ovirt-engine/engine.log: ERROR [org.ovirt.engine.core.services.VMConsoleProxyServlet] (default task-11) [] Error validating ticket: : sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target Indeed, certificate /etc/pki/ovirt-engine/certs/vmconsole-proxy-helper.cer and others did expire: -- # grep 'Not After' /etc/pki/ovirt-engine/certs/vmconsole-proxy-* /etc/pki/ovirt-engine/certs/vmconsole-proxy-helper.cer: Not After : Mar 31 13:18:44 2021 GMT /etc/pki/ovirt-engine/certs/vmconsole-proxy-host.cer: Not After : Mar 31 13:18:44 2021 GMT /etc/pki/ovirt-engine/certs/vmconsole-proxy-user.cer: Not After : Mar 31 13:18:44 2021 GMT -- But we did not manage to found how to renew them. Any advice ? -- Benoît

3 2

Snapshot and disk size allocation
by jorgevisentini＠gmail.com 28 Oct '21

28 Oct '21

Hello everyone. I would like to know how disk size and snapshot allocation works, because every time I create a new snapshot, it increases 1 GB in the VM's disk size, and when I remove the snap, that space is not returned to Domain Storage. I'm using the oVirt 4.3.10 How do I reprovision the VM disk? Thank you all.

5 7

fresh hyperconverged Gluster setup failed in ovirt 4.4.8
by dhanaraj.ramesh＠yahoo.com 20 Oct '21

20 Oct '21

Hi Team I'm trying to setup 3 node Gluster + ovirt setup with latest stable 4.4.8 version but while deploying the gluster from cokpit getting below error what could be the reason TASK [gluster.infra/roles/backend_setup : Set Gluster specific SeLinux context on the bricks] *** failed: [beclovkvma03.bec. lab ] (item={'path': '/gluster_bricks/engine', 'lvname': 'gluster_lv_engine', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_engine", "path": "/gluster_bricks/engine", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma01.bec. lab ] (item={'path': '/gluster_bricks/engine', 'lvname': 'gluster_lv_engine', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_engine", "path": "/gluster_bricks/engine", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma02.bec. lab ] (item={'path': '/gluster_bricks/engine', 'lvname': 'gluster_lv_engine', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_engine", "path": "/gluster_bricks/engine", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma03.bec. lab ] (item={'path': '/gluster_bricks/data', 'lvname': 'gluster_lv_data', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_data", "path": "/gluster_bricks/data", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma01.bec. lab ] (item={'path': '/gluster_bricks/data', 'lvname': 'gluster_lv_data', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_data", "path": "/gluster_bricks/data", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma02.bec. lab ] (item={'path': '/gluster_bricks/data', 'lvname': 'gluster_lv_data', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_data", "path": "/gluster_bricks/data", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma03.bec. lab ] (item={'path': '/gluster_bricks/vmstore', 'lvname': 'gluster_lv_vmstore', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_vmstore", "path": "/gluster_bricks/vmstore", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma01.bec. lab ] (item={'path': '/gluster_bricks/vmstore', 'lvname': 'gluster_lv_vmstore', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_vmstore", "path": "/gluster_bricks/vmstore", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} failed: [beclovkvma02.bec. lab ] (item={'path': '/gluster_bricks/vmstore', 'lvname': 'gluster_lv_vmstore', 'vgname': 'gluster_vg_sde'}) => {"ansible_loop_var": "item", "changed": false, "item": {"lvname": "gluster_lv_vmstore", "path": "/gluster_bricks/vmstore", "vgname": "gluster_vg_sde"}, "msg": "ValueError: Type glusterd_brick_t is invalid, must be a file or device type\n"} NO MORE HOSTS LEFT ************************************************************* NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* beclovkvma01.bec. lab : ok=53 changed=14 unreachable=0 failed=1 skipped=116 rescued=0 ignored=1 beclovkvma02.bec. lab : ok=52 changed=13 unreachable=0 failed=1 skipped=116 rescued=0 ignored=1 beclovkvma03.bec. lab : ok=52 changed=13 unreachable=0 failed=1 skipped=116 rescued=0 ignored=1

6 11

UEFI Guest can only be started on UEFI host (4.4)
by nroach44＠nroach44.id.au 20 Oct '21

20 Oct '21

Hi All, A problem I've just "dealt with" over the past months is that the two UEFI VMs I have installed (One Windows 10, one RHEL8) will only start on the oVirt Node (4.4.x, still an issue on 4.4.8) hosts that have been installed using UEFI. In the case of both guests, they will "start" but get stuck on a small 640x480-ish black screen, with no CPU or disk activity. It looks as if the VM has been started with "Start paused" enabled, but the VM is not paused. I've noticed that this matches the normal startup of the guest, although it only spends a second or two like that before TianoCore takes over. Occasionally, I'm able to migrate the VM to a BIOS host. When it fails, the following is seen on the /sending/ host: 2021-09-21 20:09:42,915+0800 ERROR (migsrc/86df93bc) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') internal error: qemu unexpectedly closed the monitor: 2021-09-21T12:08:57.355188Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 1 should be encrypted 2021-09-21T12:08:57.393585Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 3 should be encrypted 2021-09-21T12:08:57.393805Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 4 should be encrypted 2021-09-21T12:08:57.393960Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 2 should be encrypted 2021-09-21T12:09:40.799119Z qemu-kvm: warning: TSC frequency mismatch between VM (3099980 kHz) and host (3392282 kHz), and TSC scaling unavailable 2021-09-21T12:09:40.799228Z qemu-kvm: error: failed to set MSR 0x204 to 0x1000000000 qemu-kvm: ../target/i386/kvm/kvm.c:2778: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. (migration:331) 2021-09-21 20:09:42,938+0800 INFO (migsrc/86df93bc) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Switching from State.STARTED to State.FAILED (migration:234) 2021-09-21 20:09:42,938+0800 ERROR (migsrc/86df93bc) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Failed to migrate (migration:503) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 477, in _regular_run time.time(), machineParams File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 578, in _startUnderlyingMigration self._perform_with_conv_schedule(duri, muri) File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 667, in _perform_with_conv_schedule self._perform_migration(duri, muri) File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 596, in _perform_migration self._migration_flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 159, in call return getattr(self._vm._dom, name)(*a, **kw) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2126, in migrateToURI3 raise libvirtError('virDomainMigrateToURI3() failed') libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2021-09-21T12:08:57.355188Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 1 should be encrypted 2021-09-21T12:08:57.393585Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 3 should be encrypted 2021-09-21T12:08:57.393805Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 4 should be encrypted 2021-09-21T12:08:57.393960Z qemu-kvm: warning: Spice: reds.c:2305:reds_handle_read_link_done: spice channels 2 should be encrypted 2021-09-21T12:09:40.799119Z qemu-kvm: warning: TSC frequency mismatch between VM (3099980 kHz) and host (3392282 kHz), and TSC scaling unavailable 2021-09-21T12:09:40.799228Z qemu-kvm: error: failed to set MSR 0x204 to 0x1000000000 qemu-kvm: ../target/i386/kvm/kvm.c:2778: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. The receiving host simply sees 2021-09-21 20:09:42,840+0800 INFO (libvirt/events) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') underlying process disconnected (vm:1135) 2021-09-21 20:09:42,840+0800 INFO (libvirt/events) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Release VM resources (vm:5325) 2021-09-21 20:09:42,840+0800 INFO (libvirt/events) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Stopping connection (guestagent:438) 2021-09-21 20:09:42,840+0800 INFO (libvirt/events) [vdsm.api] START teardownImage(sdUUID='3f46f0f3-1cbb-4154-8af5-dcc3a09c6177', spUUID='924e5fbe-beba-11ea-b679-00163e03ad3e', imgUUID='d91282d3-2552-44d3-aa0f-84f7330be4ce', volUUID=None) from=internal, task_id=51eb32fc-1167-4c4c-bea8-4664c92d15e9 (api:48) 2021-09-21 20:09:42,841+0800 INFO (libvirt/events) [storage.StorageDomain] Removing image rundir link '/run/vdsm/storage/3f46f0f3-1cbb-4154-8af5-dcc3a09c6177/d91282d3-2552-44d3-aa0f-84f7330be4ce' (fileSD:601) 2021-09-21 20:09:42,841+0800 INFO (libvirt/events) [vdsm.api] FINISH teardownImage return=None from=internal, task_id=51eb32fc-1167-4c4c-bea8-4664c92d15e9 (api:54) 2021-09-21 20:09:42,841+0800 INFO (libvirt/events) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Stopping connection (guestagent:438) 2021-09-21 20:09:42,841+0800 INFO (libvirt/events) [vdsm.api] START inappropriateDevices(thiefId='86df93bc-3304-4002-8939-cbefdea4cc60') from=internal, task_id=1e3aafc2-62c7-4fe5-a807-69942709e936 (api:48) 2021-09-21 20:09:42,842+0800 INFO (libvirt/events) [vdsm.api] FINISH inappropriateDevices return=None from=internal, task_id=1e3aafc2-62c7-4fe5-a807-69942709e936 (api:54) 2021-09-21 20:09:42,847+0800 WARN (vm/86df93bc) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Couldn't destroy incoming VM: Domain not found: no domain with matching uuid '86df93bc-3304-4002-8939-cbefdea4cc60' (vm:4073) 2021-09-21 20:09:42,847+0800 INFO (vm/86df93bc) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Changed state to Down: VM destroyed during the startup (code=10) (vm:1921) 2021-09-21 20:09:42,849+0800 INFO (vm/86df93bc) [virt.vm] (vmId='86df93bc-3304-4002-8939-cbefdea4cc60') Stopping connection (guestagent:438) 2021-09-21 20:09:42,856+0800 INFO (jsonrpc/3) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:10.1.2.30,59424, flow_id=47e0a91b, vmId=86df93bc-3304-4002-8939-cbefdea4cc60 (api:48) 2021-09-21 20:09:42,917+0800 INFO (jsonrpc/5) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:10.1.2.7,50798, vmId=86df93bc-3304-4002-8939-cbefdea4cc60 (api:48) The Data center is configured with BIOS as a default. As an aside, *all* hosts have the following cmdline set: (to allow nested virt) intel_iommu=on kvm-intel.nested=1 kvm.ignore_msrs=1 Any suggestions?

2 1

HA VM and vm leases usage with site failure
by Gianluca Cecchi 17 Oct '21

17 Oct '21

Hello, supposing latest 4.4.7 environment installed with an external engine and two hosts, one in one site and one in another site. For storage I have one FC storage domain. I try to simulate a sort of "site failure scenario" to see what kind of HA I should expect. The 2 hosts have power mgmt configured through fence_ipmilan. I have 2 VMs, one configured as HA with lease on storage (Resume Behavior: kill) and one not marked as HA. Initially host1 is SPM and it is the host that runs the two VMs. Fencing of host1 from host2 initially works ok. I can test also from command line: # fence_ipmilan -a 10.10.193.152 -P -l my_fence_user -A password -L operator -S /usr/local/bin/pwd.sh -o status Status: ON On host2 I then prevent reaching host1 iDRAC: firewall-cmd --direct --add-rule ipv4 filter OUTPUT 0 -d 10.10.193.152 -p udp --dport 623 -j DROP firewall-cmd --direct --add-rule ipv4 filter OUTPUT 1 -j ACCEPT so that: # fence_ipmilan -a 10.10.193.152 -P -l my_fence_user -A password -L operator -S /usr/local/bin/pwd.sh -o status 2021-08-05 15:06:07,254 ERROR: Failed: Unable to obtain correct plug status or plug is not available On host1 I generate panic: # date ; echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger Thu Aug 5 15:06:24 CEST 2021 host1 correctly completes its crash dump (kdump integration is enabled) and reboots, but I stop it at grub prompt so that host1 is unreachable from host2 point of view and also power fencing not determined At this point I thought that VM lease functionality would have come in place and host2 would be able to re-start the HA VM, as it is able to see that the lease is not taken from the other host and so it can acquire the lock itself.... Instead it goes through the attempt to power fence loop I wait about 25 minutes without any effect but continuous attempts. After 2 minutes host2 correctly becomes SPM and VMs are marked as unknown At a certain point after the failures in power fencing host1, I see the event: Failed to power fence host host1. Please check the host status and it's power management settings, and then manually reboot it and click "Confirm Host Has Been Rebooted" If I select host and choose "Confirm Host Has Been Rebooted", then the two VMs are marked as down and the HA one is correctly booted by host2. But this requires my manual intervention. Is the behavior above the expected one or the use of VM leases should have allowed host2 to bypass fencing inability and start the HA VM with lease? Otherwise I don't understand the reason to have the lease itself at all.... Thanks, Gianluca

3 6

COarse-grained LOck-stepping for oVirt
by Harry O 05 Oct '21

05 Oct '21

Hi, Will COLO be implemented in oVirt? Is it possible to do it myself? I see qemu-kvm and lots of other qemu installed on my oVirt nodes. It's in Qemu upstream (v4.0) https://wiki.qemu.org/Features/COLO

2 3