December 2019 - Users - oVirt List Archives

Self hosted engine to hci
by Ernest Clyde Chua 18 Dec '19

18 Dec '19

Hello currently i have a server running glusterfs on distributed 1 and a self hosted engine. And planning to add two servers for hci. Do i add new hosts and change gluster volume type to replicated or backup all the vms then start from scratch? Is there any recommendation on this?

1 0

NMA nodes
by suporte＠logicworks.pt 18 Dec '19

18 Dec '19

Hi, My host only shows one NUMA node. It means that I cannot setup a high performence VM? Thanks -- Jose Ferradeira http://www.logicworks.pt

2 3

ovirt hosted-engine on iSCSI offering one target
by wodel youchi 18 Dec '19

18 Dec '19

Hi, We have an oVirt Platforme using the 4.1 version. when the platforme was installed, it was made of : - Two HP Proliant DL380 G9 as hypervisors - One HP MSA1040 for iSCSI - One Synology for NFS - Two switches, one for network/vm traffic, the second for storage traffic. The problem : the hosted-engine domain was created using iSCSI on the HP MSA. The problem is that this disk array does not give the possibility to create different targets, it presents just one target. At that time we create both the hosted-engine and the first data domain using the same target, and we didn't pay attention to the information saying "i*f you are using iSCSI storage, do not use the same iSCSI target for the shared storage domain and data storage domain*". Question : - what problems can be generated by this (mis-)configuration? - Is this a must to do (correct) configuration. Regards.

3 2

Updating oVirt with gluster-ansible and Gluster 7?
by Victor Hooi 18 Dec '19

18 Dec '19

Hi, I'm planning to bring up a 3-node Ovirt/Gluster cluster, to do some comparison against an existing 3-node Proxmox/Ceph cluster we have. However, after reading the documentation Firstly, the oVirt documentation <https://www.ovirt.org/documentation/gluster-hyperconverged/chap-Single_node…> mentions that it uses gDeploy to setup Gluster. However, I saw from this commit <https://github.com/gluster/gdeploy/commit/57c19a4c233cccef0ccda468f697f1564…> that gDeploy has been deprecated since August 2018, in favour of gluster-ansible <https://github.com/gluster/gluster-ansible>. Is there a plan to move to gluster-ansible? Secondly, I saw there was a recently a new major Gluster release - Gluster 7 <https://www.gluster.org/announcing-gluster-7-0/>. Are there any plans to add this to oVirt? (I saw there was some performance improvements, hoping this will help with VM hosting). Thanks, Victor

1 0

Cannot start VM 2 hosts
by Stefan Wolf 17 Dec '19

17 Dec '19

I ve got 4 Hosts after changeing the harddrive on every Host and normal updates I am not able anymore to start VM's on 2 of these 4 hosts. I am also not able to migrate a running vm to these hosts. This is the error log for start up 2019-12-17 16:01:28,326+01 INFO [org.ovirt.engine.core.bll.RunVmOnceCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] Lock Acquired to object 'EngineLock:{exclusiveLocks='[40cf6c27-6464-4fa7-bc01-9158cb03328b=VM]', sharedLocks=''}' 2019-12-17 16:01:28,609+01 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='40cf6c27-6464-4fa7-bc01-9158cb03328b'}), log id: 5706d380 2019-12-17 16:01:28,611+01 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 5706d380 2019-12-17 16:01:28,854+01 INFO [org.ovirt.engine.core.bll.RunVmOnceCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] Running command: RunVmOnceCommand internal: false. Entities affected : ID: 40cf6c27-6464-4fa7-bc01-9158cb03328b Type: VMAction group RUN_VM with role type USER, ID: 40cf6c27-6464-4fa7-bc01-9158cb03328b Type: VMAction group EDIT_ADMIN_VM_PROPERTIES with role type ADMIN 2019-12-17 16:01:28,999+01 INFO [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] START, UpdateVmDynamicDataVDSCommand( UpdateVmDynamicDataVDSCommandParameters:{hostId='null', vmId='40cf6c27-6464-4fa7-bc01-9158cb03328b', vmDynamic='org.ovirt.engine.core.common.businessentities.VmDynamic@147620b'}), log id: 612dabdf 2019-12-17 16:01:29,097+01 INFO [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] FINISH, UpdateVmDynamicDataVDSCommand, return: , log id: 612dabdf 2019-12-17 16:01:29,106+01 INFO [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] START, CreateVDSCommand( CreateVDSCommandParameters:{hostId='d38bae4c-8494-4861-ae5a-38992db338e5', vmId='40cf6c27-6464-4fa7-bc01-9158cb03328b', vm='VM [z-push]'}), log id: 56abde11 2019-12-17 16:01:29,110+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] START, CreateBrokerVDSCommand(HostName = kvm360.durchhalten.intern, CreateVDSCommandParameters:{hostId='d38bae4c-8494-4861-ae5a-38992db338e5', vmId='40cf6c27-6464-4fa7-bc01-9158cb03328b', vm='VM [z-push]'}), log id: 36d088ab 2019-12-17 16:01:29,163+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0"> <name>z-push</name> <uuid>40cf6c27-6464-4fa7-bc01-9158cb03328b</uuid> <memory>4194304</memory> <currentMemory>4194304</currentMemory> <iothreads>1</iothreads> <maxMemory slots="16">16777216</maxMemory> <vcpu current="4">16</vcpu> <sysinfo type="smbios"> <system> <entry name="manufacturer">oVirt</entry> <entry name="product">OS-NAME:</entry> <entry name="version">OS-VERSION:</entry> <entry name="serial">HOST-SERIAL:</entry> <entry name="uuid">40cf6c27-6464-4fa7-bc01-9158cb03328b</entry> </system> </sysinfo> <clock offset="variable" adjustment="0"> <timer name="rtc" tickpolicy="catchup"/> <timer name="pit" tickpolicy="delay"/> <timer name="hpet" present="no"/> </clock> <features> <acpi/> </features> <cpu match="exact"> <model>SandyBridge</model> <feature name="pcid" policy="require"/> <feature name="spec-ctrl" policy="require"/> <feature name="ssbd" policy="require"/> <topology cores="1" threads="1" sockets="16"/> <numa> <cell id="0" cpus="0,1,2,3" memory="4194304"/> </numa> </cpu> <cputune/> <devices> <input type="mouse" bus="ps2"/> <channel type="unix"> <target type="virtio" name="ovirt-guest-agent.0"/> <source mode="bind" path="/var/lib/libvirt/qemu/channels/40cf6c27-6464-4fa7-bc01-9158cb03328b.ovirt-guest-agent.0"/> </channel> <channel type="unix"> <target type="virtio" name="org.qemu.guest_agent.0"/> <source mode="bind" path="/var/lib/libvirt/qemu/channels/40cf6c27-6464-4fa7-bc01-9158cb03328b.org.qemu.guest_agent.0"/> </channel> <rng model="virtio"> <backend model="random">/dev/urandom</backend> <alias name="ua-375334ff-0392-4314-879d-d89d81af5b69"/> </rng> <watchdog model="i6300esb" action="none"> <alias name="ua-5a4b6796-47bd-40e4-99ac-81b9546e6824"/> <address bus="0x00" domain="0x0000" function="0x0" slot="0x08" type="pci"/> </watchdog> <memballoon model="virtio"> <stats period="5"/> <alias name="ua-6d47e6f5-fd4f-452d-bcca-895b08d896f8"/> <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"/> </memballoon> <video> <model type="qxl" vram="32768" heads="1" ram="65536" vgamem="16384"/> <alias name="ua-79117281-c935-4cfc-aa21-9f028d817d79"/> <address bus="0x00" domain="0x0000" function="0x0" slot="0x02" type="pci"/> </video> <controller type="usb" model="piix3-uhci" index="0"> <address bus="0x00" domain="0x0000" function="0x2" slot="0x01" type="pci"/> </controller> <controller type="virtio-serial" index="0" ports="16"> <alias name="ua-b82337cf-7f59-4390-ad5b-4c940e205fae"/> <address bus="0x00" domain="0x0000" function="0x0" slot="0x05" type="pci"/> </controller> <graphics type="spice" port="-1" autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" tlsPort="-1"> <channel name="main" mode="secure"/> <channel name="inputs" mode="secure"/> <channel name="cursor" mode="secure"/> <channel name="playback" mode="secure"/> <channel name="record" mode="secure"/> <channel name="display" mode="secure"/> <channel name="smartcard" mode="secure"/> <channel name="usbredir" mode="secure"/> <listen type="network" network="vdsm-ovirtmgmt"/> </graphics> <channel type="spicevmc"> <target type="virtio" name="com.redhat.spice.0"/> </channel> <interface type="bridge"> <model type="virtio"/> <link state="up"/> <source bridge="ovirtmgmt"/> <driver queues="4" name="vhost"/> <alias name="ua-db1017dd-9b76-4245-85fd-7490185434ac"/> <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/> <boot order="2"/> <mac address="56:6f:c1:d7:00:02"/> <mtu size="1500"/> <filterref filter="vdsm-no-mac-spoofing"/> <bandwidth/> </interface> <disk type="file" device="cdrom" snapshot="no"> <driver name="qemu" type="raw" error_policy="report"/> <source file="" startupPolicy="optional"> <seclabel model="dac" type="none" relabel="no"/> </source> <target dev="hdc" bus="ide"/> <readonly/> <alias name="ua-c2fa60b6-be89-487c-b01f-ed14c2d4624f"/> <address bus="1" controller="0" unit="0" type="drive" target="0"/> </disk> <disk snapshot="no" type="file" device="disk"> <target dev="sda" bus="scsi"/> <source file="/rhev/data-center/0e279f02-0c73-11e9-bc47-00163e150480/80364cc8-8afd-4482-884d-d63d5a7988df/images/38f2e3bb-687f-4f01-b769-3a5403136181/1a9627d8-181b-4790-aa47-569dd1e6e9fb"> <seclabel model="dac" type="none" relabel="no"/> </source> <driver name="qemu" io="threads" type="qcow2" error_policy="stop" cache="none"/> <alias name="ua-38f2e3bb-687f-4f01-b769-3a5403136181"/> <address bus="0" controller="0" unit="0" type="drive" target="0"/> <boot order="1"/> <serial>38f2e3bb-687f-4f01-b769-3a5403136181</serial> </disk> </devices> <pm> <suspend-to-disk enabled="no"/> <suspend-to-mem enabled="no"/> </pm> <os> <type arch="x86_64" machine="pc-i440fx-rhel7.6.0">hvm</type> <smbios mode="sysinfo"/> <bootmenu enable="yes" timeout="30000"/> </os> <metadata> <ovirt-tune:qos/> <ovirt-vm:vm> <ovirt-vm:minGuaranteedMemoryMb type="int">4096</ovirt-vm:minGuaranteedMemoryMb> <ovirt-vm:clusterVersion>4.3</ovirt-vm:clusterVersion> <ovirt-vm:custom/> <ovirt-vm:device mac_address="56:6f:c1:d7:00:02"> <ovirt-vm:custom/> </ovirt-vm:device> <ovirt-vm:device devtype="disk" name="sda"> <ovirt-vm:poolID>0e279f02-0c73-11e9-bc47-00163e150480</ovirt-vm:poolID> <ovirt-vm:volumeID>1a9627d8-181b-4790-aa47-569dd1e6e9fb</ovirt-vm:volumeID> <ovirt-vm:imageID>38f2e3bb-687f-4f01-b769-3a5403136181</ovirt-vm:imageID> <ovirt-vm:domainID>80364cc8-8afd-4482-884d-d63d5a7988df</ovirt-vm:domainID> </ovirt-vm:device> <ovirt-vm:launchPaused>false</ovirt-vm:launchPaused> <ovirt-vm:resumeBehavior>auto_resume</ovirt-vm:resumeBehavior> </ovirt-vm:vm> </metadata> </domain> 2019-12-17 16:01:29,198+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] FINISH, CreateBrokerVDSCommand, return: , log id: 36d088ab 2019-12-17 16:01:29,252+01 INFO [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] FINISH, CreateVDSCommand, return: WaitForLaunch, log id: 56abde11 2019-12-17 16:01:29,252+01 INFO [org.ovirt.engine.core.bll.RunVmOnceCommand] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] Lock freed to object 'EngineLock:{exclusiveLocks='[40cf6c27-6464-4fa7-bc01-9158cb03328b=VM]', sharedLocks=''}' 2019-12-17 16:01:29,319+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [3beff980-0f1a-461b-a6da-89c6f4dea6b3] EVENT_ID: USER_STARTED_VM(153), VM z-push was started by admin@internal-authz (Host: kvm360.durchhalten.intern). 2019-12-17 16:01:30,412+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [] Fetched 1 VMs from VDS 'd38bae4c-8494-4861-ae5a-38992db338e5' 2019-12-17 16:01:30,620+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [] EVENT_ID: VM_CONSOLE_DISCONNECTED(168), User <UNKNOWN> got disconnected from VM z-push. 2019-12-17 16:01:30,766+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [] START, DumpXmlsVDSCommand(HostName = kvm360.durchhalten.intern, Params:{hostId='d38bae4c-8494-4861-ae5a-38992db338e5', vmIds='[40cf6c27-6464-4fa7-bc01-9158cb03328b]'}), log id: 4be2467a 2019-12-17 16:01:30,774+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [] FINISH, DumpXmlsVDSCommand, return: {40cf6c27-6464-4fa7-bc01-9158cb03328b=<?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0"><name>z-push</name><uuid>40cf6c27-6464-4fa7-bc01-9158cb03328b</uuid><memory>4194304</memory><currentMemory>4194304</currentMemory><iothreads>1</iothreads><maxMemory slots="16">16777216</maxMemory><vcpu current="4">16</vcpu><sysinfo type="smbios"><system><entry name="manufacturer">oVirt</entry><entry name="product">OS-NAME:</entry><entry name="version">OS-VERSION:</entry><entry name="serial">HOST-SERIAL:</entry><entry name="uuid">40cf6c27-6464-4fa7-bc01-9158cb03328b</entry></system></sysinfo><clock offset="variable" adjustment="0"><timer name="rtc" tickpolicy="catchup"></timer><timer name="pit" tickpolicy="delay"></timer ><timer name="hpet" present="no"></timer></clock><features><acpi></acpi></features><cpu match="exact"><model>SandyBridge</model><feature name="pcid" policy="require"></feature><feature name="spec-ctrl" policy="require"></feature><feature name="ssbd" policy="require"></feature><topology cores="1" threads="1" sockets="16"></topology><numa><cell id="0" cpus="0,1,2,3" memory="4194304"></cell></numa></cpu><cputune></cputune><devices><input type="mouse" bus="ps2"></input><channel type="unix"><target type="virtio" name="ovirt-guest-agent.0"></target><source mode="bind" path="/var/lib/libvirt/qemu/channels/40cf6c27-6464-4fa7-bc01-9158cb03328b.ovirt-guest-agent.0"></source></channel><channel type="unix"><target type="virtio" name="org.qemu.guest_agent.0"></target><source mode="bind" path="/var/lib/libvirt/qemu/channels/40cf6c27-6464-4fa7-bc01-9158cb03328b.org.qemu.guest_agent.0"></source></channel><rng model="virtio"><backend model="random">/dev/urandom</backend><alias name="ua-375334ff-0392 -4314-879d-d89d81af5b69"></alias></rng><watchdog model="i6300esb" action="none"><alias name="ua-5a4b6796-47bd-40e4-99ac-81b9546e6824"></alias><address bus="0x00" domain="0x0000" function="0x0" slot="0x08" type="pci"></address></watchdog><memballoon model="virtio"><stats period="5"></stats><alias name="ua-6d47e6f5-fd4f-452d-bcca-895b08d896f8"></alias><address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"></address></memballoon><video><model type="qxl" vram="32768" heads="1" ram="65536" vgamem="16384"></model><alias name="ua-79117281-c935-4cfc-aa21-9f028d817d79"></alias><address bus="0x00" domain="0x0000" function="0x0" slot="0x02" type="pci"></address></video><controller type="usb" model="piix3-uhci" index="0"><address bus="0x00" domain="0x0000" function="0x2" slot="0x01" type="pci"></address></controller><controller type="virtio-serial" index="0" ports="16"><alias name="ua-b82337cf-7f59-4390-ad5b-4c940e205fae"></alias><address bus="0x00" domain="0x0000" function=" 0x0" slot="0x05" type="pci"></address></controller><graphics type="spice" port="-1" autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" tlsPort="-1"><channel name="main" mode="secure"></channel><channel name="inputs" mode="secure"></channel><channel name="cursor" mode="secure"></channel><channel name="playback" mode="secure"></channel><channel name="record" mode="secure"></channel><channel name="display" mode="secure"></channel><channel name="smartcard" mode="secure"></channel><channel name="usbredir" mode="secure"></channel><listen type="network" network="vdsm-ovirtmgmt"></listen></graphics><channel type="spicevmc"><target type="virtio" name="com.redhat.spice.0"></target></channel><interface type="bridge"><model type="virtio"></model><link state="up"></link><source bridge="ovirtmgmt"></source><driver queues="4" name="vhost"></driver><alias name="ua-db1017dd-9b76-4245-85fd-7490185434ac"></alias><address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci">< /address><boot order="2"></boot><mac address="56:6f:c1:d7:00:02"></mac><mtu size="1500"></mtu><filterref filter="vdsm-no-mac-spoofing"></filterref><bandwidth></bandwidth></interface><disk type="file" device="cdrom" snapshot="no"><driver name="qemu" type="raw" error_policy="report"></driver><source file="" startupPolicy="optional"><seclabel model="dac" type="none" relabel="no"></seclabel></source><target dev="hdc" bus="ide"></target><readonly></readonly><alias name="ua-c2fa60b6-be89-487c-b01f-ed14c2d4624f"></alias><address bus="1" controller="0" unit="0" type="drive" target="0"></address></disk><disk snapshot="no" type="file" device="disk"><target dev="sda" bus="scsi"></target><source file="/rhev/data-center/0e279f02-0c73-11e9-bc47-00163e150480/80364cc8-8afd-4482-884d-d63d5a7988df/images/38f2e3bb-687f-4f01-b769-3a5403136181/1a9627d8-181b-4790-aa47-569dd1e6e9fb"><seclabel model="dac" type="none" relabel="no"></seclabel></source><driver name="qemu" io="threads" type="qcow2" error_polic y="stop" cache="none"></driver><alias name="ua-38f2e3bb-687f-4f01-b769-3a5403136181"></alias><address bus="0" controller="0" unit="0" type="drive" target="0"></address><boot order="1"></boot><serial>38f2e3bb-687f-4f01-b769-3a5403136181</serial></disk></devices><pm><suspend-to-disk enabled="no"></suspend-to-disk><suspend-to-mem enabled="no"></suspend-to-mem></pm><os><type arch="x86_64" machine="pc-i440fx-rhel7.6.0">hvm</type><smbios mode="sysinfo"></smbios><bootmenu enable="yes" timeout="30000"></bootmenu></os><metadata><ovirt-tune:qos></ovirt-tune:qos><ovirt-vm:vm><ovirt-vm:minGuaranteedMemoryMb type="int">4096</ovirt-vm:minGuaranteedMemoryMb><ovirt-vm:clusterVersion>4.3</ovirt-vm:clusterVersion><ovirt-vm:custom></ovirt-vm:custom><ovirt-vm:device mac_address="56:6f:c1:d7:00:02"><ovirt-vm:custom></ovirt-vm:custom></ovirt-vm:device><ovirt-vm:device devtype="disk" name="sda"><ovirt-vm:poolID>0e279f02-0c73-11e9-bc47-00163e150480</ovirt-vm:poolID><ovirt-vm:volumeID>1a9627d8-181b-4790-aa4 7-569dd1e6e9fb</ovirt-vm:volumeID><ovirt-vm:imageID>38f2e3bb-687f-4f01-b769-3a5403136181</ovirt-vm:imageID><ovirt-vm:domainID>80364cc8-8afd-4482-884d-d63d5a7988df</ovirt-vm:domainID></ovirt-vm:device><ovirt-vm:launchPaused>false</ovirt-vm:launchPaused><ovirt-vm:resumeBehavior>auto_resume</ovirt-vm:resumeBehavior></ovirt-vm:vm></metadata></domain>}, log id: 4be2467a 2019-12-17 16:01:30,918+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [] VM '40cf6c27-6464-4fa7-bc01-9158cb03328b' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='5f17a444-36e8-4854-a51b-f355aab2d8a8', vmId='40cf6c27-6464-4fa7-bc01-9158cb03328b'}', device='virtio-scsi', type='CONTROLLER', specParams='[ioThreadId=1]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}' 2019-12-17 16:01:32,762+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-12) [] VM '40cf6c27-6464-4fa7-bc01-9158cb03328b' was reported as Down on VDS 'd38bae4c-8494-4861-ae5a-38992db338e5'(kvm360.durchhalten.intern) 2019-12-17 16:01:32,764+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-12) [] START, DestroyVDSCommand(HostName = kvm360.durchhalten.intern, DestroyVmVDSCommandParameters:{hostId='d38bae4c-8494-4861-ae5a-38992db338e5', vmId='40cf6c27-6464-4fa7-bc01-9158cb03328b', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 3478fcba 2019-12-17 16:01:33,283+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-12) [] FINISH, DestroyVDSCommand, return: , log id: 3478fcba 2019-12-17 16:01:33,283+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-12) [] VM '40cf6c27-6464-4fa7-bc01-9158cb03328b'(z-push) moved from 'WaitForLaunch' --> 'Down' 2019-12-17 16:01:33,510+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-12) [] EVENT_ID: VM_DOWN_ERROR(119), VM z-push is down with error. Exit message: The name org.fedoraproject.FirewallD1 was not provided by any .service files. 2019-12-17 16:01:33,512+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-12) [] add VM '40cf6c27-6464-4fa7-bc01-9158cb03328b'(z-push) to rerun treatment 2019-12-17 16:01:33,743+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-12) [] Rerun VM '40cf6c27-6464-4fa7-bc01-9158cb03328b'. Called from VDS 'kvm360.durchhalten.intern' 2019-12-17 16:01:33,877+01 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-753) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM z-push on Host kvm360.durchhalten.intern. 2019-12-17 16:01:34,057+01 INFO [org.ovirt.engine.core.bll.RunVmOnceCommand] (EE-ManagedThreadFactory-engine-Thread-753) [] Lock Acquired to object 'EngineLock:{exclusiveLocks='[40cf6c27-6464-4fa7-bc01-9158cb03328b=VM]', sharedLocks=''}' 2019-12-17 16:01:34,085+01 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (EE-ManagedThreadFactory-engine-Thread-753) [] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='40cf6c27-6464-4fa7-bc01-9158cb03328b'}), log id: 4074b804 2019-12-17 16:01:34,086+01 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (EE-ManagedThreadFactory-engine-Thread-753) [] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 4074b804 2019-12-17 16:01:34,134+01 WARN [org.ovirt.engine.core.bll.RunVmOnceCommand] (EE-ManagedThreadFactory-engine-Thread-753) [] Validation of action 'RunVmOnce' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_NO_HOSTS 2019-12-17 16:01:34,135+01 INFO [org.ovirt.engine.core.bll.RunVmOnceCommand] (EE-ManagedThreadFactory-engine-Thread-753) [] Lock freed to object 'EngineLock:{exclusiveLocks='[40cf6c27-6464-4fa7-bc01-9158cb03328b=VM]', sharedLocks=''}' 2019-12-17 16:01:34,300+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-753) [] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM z-push (User: admin@internal-authz). 2019-12-17 16:01:34,314+01 INFO [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-754) [2ecacb63] Running command: ProcessDownVmCommand internal: true. 2019-12-17 16:01:46,128+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (EE-ManagedThreadFactory-engineScheduled-Thread-9) [] Fetched 0 VMs from VDS 'd38bae4c-8494-4861-ae5a-38992db338e5' and this for migration 2019-12-17 16:05:42,411+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-12) [ec2a5fc9-20e5-45f8-a38b-2a5dc96dadfe] Lock Acquired to object 'EngineLock:{exclusiveLocks='[46755b18-ac33-4e62-b1f0-a746bfeed59b=VM]', sharedLocks=''}' 2019-12-17 16:05:42,685+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-12) [ec2a5fc9-20e5-45f8-a38b-2a5dc96dadfe] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 46755b18-ac33-4e62-b1f0-a746bfeed59b Type: VMAction group MIGRATE_VM with role type USER 2019-12-17 16:05:43,043+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default task-12) [ec2a5fc9-20e5-45f8-a38b-2a5dc96dadfe] START, MigrateVDSCommand( MigrateVDSCommandParameters:{hostId='7895de21-d174-471f-a0e5-340d59be1bb0', vmId='46755b18-ac33-4e62-b1f0-a746bfeed59b', srcHost='kvm380.durchhalten.intern', dstVdsId='d104ef2f-98a4-4a99-b28d-2725294f8626', dstHost='kvm320.durchhalten.intern:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='62', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {li mit=-1, action={name=abort, params=[]}}]]', dstQemu='192.168.200.231'}), log id: 57fb24a3 2019-12-17 16:05:43,048+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (default task-12) [ec2a5fc9-20e5-45f8-a38b-2a5dc96dadfe] START, MigrateBrokerVDSCommand(HostName = kvm380.durchhalten.intern, MigrateVDSCommandParameters:{hostId='7895de21-d174-471f-a0e5-340d59be1bb0', vmId='46755b18-ac33-4e62-b1f0-a746bfeed59b', srcHost='kvm380.durchhalten.intern', dstVdsId='d104ef2f-98a4-4a99-b28d-2725294f8626', dstHost='kvm320.durchhalten.intern:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='62', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400] }}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]', dstQemu='192.168.200.231'}), log id: 6765eb33 2019-12-17 16:05:44,068+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (default task-12) [ec2a5fc9-20e5-45f8-a38b-2a5dc96dadfe] FINISH, MigrateBrokerVDSCommand, return: , log id: 6765eb33 2019-12-17 16:05:44,126+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default task-12) [ec2a5fc9-20e5-45f8-a38b-2a5dc96dadfe] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 57fb24a3 2019-12-17 16:05:44,205+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [ec2a5fc9-20e5-45f8-a38b-2a5dc96dadfe] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: TestVM, Source: kvm380.durchhalten.intern, Destination: kvm320.durchhalten.intern, User: admin@internal-authz). 2019-12-17 16:05:44,588+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM '46755b18-ac33-4e62-b1f0-a746bfeed59b' was reported as Down on VDS 'd104ef2f-98a4-4a99-b28d-2725294f8626'(kvm320.durchhalten.intern) 2019-12-17 16:05:44,591+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] START, DestroyVDSCommand(HostName = kvm320.durchhalten.intern, DestroyVmVDSCommandParameters:{hostId='d104ef2f-98a4-4a99-b28d-2725294f8626', vmId='46755b18-ac33-4e62-b1f0-a746bfeed59b', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 1afbc395 2019-12-17 16:05:44,853+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] Failed to destroy VM '46755b18-ac33-4e62-b1f0-a746bfeed59b' because VM does not exist, ignoring 2019-12-17 16:05:44,854+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] FINISH, DestroyVDSCommand, return: , log id: 1afbc395 2019-12-17 16:05:44,854+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM '46755b18-ac33-4e62-b1f0-a746bfeed59b'(TestVM) was unexpectedly detected as 'Down' on VDS 'd104ef2f-98a4-4a99-b28d-2725294f8626'(kvm320.durchhalten.intern) (expected on '7895de21-d174-471f-a0e5-340d59be1bb0') 2019-12-17 16:05:44,854+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] Migration of VM 'TestVM' to host 'kvm320.durchhalten.intern' failed: VM destroyed during the startup. 2019-12-17 16:05:44,861+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [] VM '46755b18-ac33-4e62-b1f0-a746bfeed59b'(TestVM) moved from 'MigratingFrom' --> 'Up' 2019-12-17 16:05:44,862+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-9) [] Adding VM '46755b18-ac33-4e62-b1f0-a746bfeed59b'(TestVM) to re-run list 2019-12-17 16:05:44,927+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-9) [] Rerun VM '46755b18-ac33-4e62-b1f0-a746bfeed59b'. Called from VDS 'kvm380.durchhalten.intern' 2019-12-17 16:05:44,944+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-879) [] START, MigrateStatusVDSCommand(HostName = kvm380.durchhalten.intern, MigrateStatusVDSCommandParameters:{hostId='7895de21-d174-471f-a0e5-340d59be1bb0', vmId='46755b18-ac33-4e62-b1f0-a746bfeed59b'}), log id: 127dcc70 2019-12-17 16:05:44,952+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-879) [] FINISH, MigrateStatusVDSCommand, return: , log id: 127dcc70 2019-12-17 16:05:45,228+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-879) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: TestVM, Source: kvm380.durchhalten.intern, Destination: kvm320.durchhalten.intern). 2019-12-17 16:05:45,329+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (EE-ManagedThreadFactory-engine-Thread-879) [] Lock freed to object 'EngineLock:{exclusiveLocks='[46755b18-ac33-4e62-b1f0-a746bfeed59b=VM]', sharedLocks=''}' maybe someone can give me a hint thx shb

1 1

Re: Ovirt OVN help needed
by Strahil 16 Dec '19

16 Dec '19

Hi Dominik, All, I've checked 'https://lists.ovirt.org/archives/list/users@ovirt.org/thread/W6U4XJHNMYMD3W…' and the user managed to clear up and start over. I have removed the ovn-external-provider from UI, but I forgot to copy the data from the fields. Do you know any refference guide (or any tips & tricks) for adding OVN ? Thanks in advance. Best Regards, Strahil NikolovOn Dec 12, 2019 20:49, Strahil <hunter86_bg(a)yahoo.com> wrote: > > Hi Dominik, > > Thanks for the reply. > > Sadly the openstack module is missing on the engine and I have to figure it out. > > Can't I just undeploy the ovn and then redeploy it back ? > > Best Regards, > Strahil Nikolov > > On Dec 12, 2019 09:32, Dominik Holler <dholler(a)redhat.com> wrote: >> >> The cleanest way to clean up is to remove all entities on the OpenStack Network API on ovirt-provider-ovn, e.g. by something like >> https://gist.github.com/dominikholler/19bcdc5f14f42ab5f069086fd2ff5e37#file… >> This should work, if not, please report a bug. >> >> To bypass the ovirt-provider-ovn, which is not recommended and might end in an inconsistent state, you could use ovn-nbctl . >> >> >> >> On Thu, Dec 12, 2019 at 3:33 AM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote: >>> >>> Hi Community, >>> >>> can someone hint me how to get rid of some ports? I just want to 'reset' my ovn setup. >>> >>> Here is what I have so far: >>> >>> [root@ovirt1 openvswitch]# ovs-vsctl list interface >>> _uuid : be89c214-10e4-4a97-a9eb-1b82bc433a24 >>> admin_state : up >>> bfd : {} >>> bfd_status : {} >>> cfm_fault : [] >>> cfm_fault_status : [] >>> cfm_flap_count : [] >>> cfm_health : [] >>> cfm_mpid : [] >>> cfm_remote_mpids : [] >>> cfm_remote_opstate : [] >>> duplex : [] >>> error : [] >>> external_ids : {} >>> ifindex : 35 >>> ingress_policing_burst: 0 >>> ingress_policing_rate: 0 >>> lacp_current : [] >>> link_resets : 0 >>> link_speed : [] >>> link_state : up >>> lldp : {} >>> mac : [] >>> mac_in_use : "7a:7d:1d:a7:43:1d" >>> mtu : [] >>> mtu_request : [] >>> name : "ovn-25cc77-0" >>> ofport : 6 >>> ofport_request : [] >>> options : {csum="true", key=flow, remote_ip="192.168.1.64"} >>> other_config : {} >>> statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} >>> status : {tunnel_egress_iface=ovirtmgmt, tunnel_egress_iface_carrier=up} >>> type : geneve >>> >>> _uuid : ec6a6688-e5d6-4346-ac47-ece1b8379440 >>> admin_state : down >>> bfd : {} >>> bfd_status : {} >>> cfm_fault : [] >>> cfm_fault_status : [] >>> cfm_flap_count : [] >>> cfm_health : [] >>> cfm_mpid : [] >>> cfm_remote_mpids : [] >>> cfm_remote_opstate : [] >>> duplex : [] >>> error : [] >>> external_ids : {} >>> ifindex : 13 >>> ingress_policing_burst: 0 >>> ingress_policing_rate: 0 >>> lacp_current : [] >>> link_resets : 0 >>> link_speed : [] >>> link_state : down >>> lldp : {} >>> mac : [] >>> mac_in_use : "66:36:dd:63:dc:48" >>> mtu : 1500 >>> mtu_request : [] >>> name : br-int >>> ofport : 65534 >>> ofport_request : [] >>> options : {} >>> other_config : {} >>> statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} >>> status : {driver_name=openvswitch} >>> type : internal >>> >>> _uuid : 1e511b4d-f7c2-499f-bd8c-07236e7bb7af >>> admin_state : up >>> bfd : {} >>> bfd_status : {} >>> cfm_fault : [] >>> cfm_fault_status : [] >>> cfm_flap_count : [] >>> cfm_health : [] >>> cfm_mpid : [] >>> cfm_remote_mpids : [] >>> cfm_remote_opstate : [] >>> duplex : [] >>> error : [] >>> external_ids : {} >>> ifindex : 35 >>> ingress_policing_burst: 0 >>> ingress_policing_rate: 0 >>> lacp_current : [] >>> link_resets : 0 >>> link_speed : [] >>> link_state : up >>> lldp : {} >>> mac : [] >>> mac_in_use : "1a:85:d1:d9:e2:a5" >>> mtu : [] >>> mtu_request : [] >>> name : "ovn-566849-0" >>> ofport : 5 >>> ofport_request : [] >>> options : {csum="true", key=flow, remote_ip="192.168.1.41"} >>> other_config : {} >>> statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} >>> status : {tunnel_egress_iface=ovirtmgmt, tunnel_egress_iface_carrier=up} >>> type : geneve >>> >>> >>> When I try to remove a port - it never ends (just hanging): >>> >>> [root@ovirt1 openvswitch]# ovs-vsctl --dry-run del-port br-int ovn-25cc77-0 >>> In journal I see only this: >>> дек 12 04:13:57 ovirt1.localdomain ovs-vsctl[22030]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --dry-run del-port br-int ovn-25cc77-0 >>> >>> The stranger part to me is the log output: >>> >>> [root@ovirt1 openvswitch]# grep ovn-25cc77-0 /var/log/openvswitch/*.log >>> /var/log/openvswitch/ovs-vswitchd.log:2019-12-12T01:26:28.642Z|00032|bridge|INFO|bridge br-int: added interface ovn-25cc77-0 on port 14 >>> /var/log/openvswitch/ovs-vswitchd.log:2019-12-12T01:45:15.646Z|00113|bridge|INFO|bridge br-int: deleted interface ovn-25cc77-0 on port 14 >>> /var/log/openvswitch/ovs-vswitchd.log:2019-12-12T01:45:15.861Z|00116|bridge|INFO|bridge br-int: added interface ovn-25cc77-0 on port 2 >>> /var/log/openvswitch/ovs-vswitchd.log:2019-12-12T01:50:36.678Z|00118|bridge|INFO|bridge br-int: deleted interface ovn-25cc77-0 on port 2 >>> /var/log/openvswitch/ovs-vswitchd.log:2019-12-12T01:52:31.180Z|00121|bridge|INFO|bridge br-int: added interface ovn-25cc77-0 on port 3 >>> /var/log/openvswitch/ovs-vswitchd.log:2019-12-12T01:55:09.734Z|00125|bridge|INFO|bridge br-int: deleted interface ovn-25cc77-0 on port 3 >>> /var/log/openvswitch/ovs-vswitchd.log:2019-12-12T01:58:15.138Z|00127|bridge|INFO|bridge br-int: added interface ovn-25cc77-0 on port 6 >>> >>> I'm also attaching the verbose output of the dryrun. >>> >>> Thanks in advance. >>> >>> Best Regards, >>> Strahil Nikolov >>> _______________________________________________ >>> Users mailing list -- users(a)ovirt.org >>> To unsubscribe send an email to users-leave(a)ovirt.org >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/IG7YJPINVAF2B…

3 3

Cannot forward traffic through VXLAN
by k.betsis＠gmail.com 16 Dec '19

16 Dec '19

Hi all I have a VM network created with some hosts and I have included a vyos router acting as a Layer 2 extension to another hypervisor through VXLAN. I can see traffic reaching VMs from the other hypervisor to the ovirt hosted VMs. I can see traffic leaving the VMs hosted on the ovirt hypervisor. However, i do not see return traffic reaching the vyos VXLAN hosted on ovirt. I believe the VM network drops return traffic based on the destination MAC address. However, i have created the VM Network with security disabled. Can you please assist on how to troubleshoot?

2 9

Postgres stuck on 100% after Engine upgrade from 4.2.8 to 4.3.7
by Vrgotic, Marko 16 Dec '19

16 Dec '19

Hi oVirt, Since oVirt upgrade from 4.2.8 to 4.3.7, SHE postgres is running 100% CPU with 4 processes, related to ovirt_engine_history DB: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2617 postgres 20 0 372668 30876 26352 R 100.0 0.2 63:40.19 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50226) INSERT 3554 postgres 20 0 368680 15120 11716 R 100.0 0.1 56:45.72 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50448) BIND 2623 postgres 20 0 368068 17640 14896 R 99.3 0.1 42:35.36 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50228) BIND 2880 postgres 20 0 368680 20404 17004 R 99.0 0.1 65:13.96 postgres: ovirt_engine_history ovirt_engine_history 127.0.0.1(50252) BIND Output of select * from pg_stat_activity; is giving me following: 16401 | ovirt_engine_history | 2617 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50226 | 2019-12-09 13:37:07.779261+01 | 2019-12-09 13:37:09.555704+01 | 2019-12-09 13:39:04.915431+01 | 2019-12-09 13:39:04.915432+01 | | | active | 260332629 | 260332629 | INSERT INTO vm_disks_usage_samples_history (history_datetime,vm_id,disks_usage) VALUES ($1,$2,$3) | client backend 16401 | ovirt_engine_history | 2623 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50228 | 2019-12-09 13:37:07.820982+01 | 2019-12-09 14:00:00.042448+01 | 2019-12-09 14:00:00.997228+01 | 2019-12-09 14:00:00.997229+01 | | | active | 260333458 | 260332629 | SELECT +| client backend | | | | | | | | | | | | | | | | | | history_datetime, +| | | | | | | | | | | | | | | | | | | vm_id, +| | | | | | | | | | | | | | | | | | | disks_usage +| | | | | | | | | | | | | | | | | | | FROM vm_disks_usage_samples_history +| | | | | | | | | | | | | | | | | | | WHERE history_datetime >= '2019-12-09 07:00:00.000000+0100' +| | | | | | | | | | | | | | | | | | | AND history_datetime < '2019-12-09 08:00:00.000000+0100' +| | | | | | | | | | | | | | | | | | | ORDER BY history_id DESC, +| | | | | | | | | | | | | | | | | | | vm_id | 16401 | ovirt_engine_history | 2624 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50230 | 2019-12-09 13:37:07.826391+01 | | | 2019-12-09 13:37:07.835886+01 | Client | ClientRead | idle | | | | client backend 16401 | ovirt_engine_history | 2625 | 16386 | ovirt_engine_history | | 127.0.0.1 | | 50232 | 2019-12-09 13:37:07.847262+01 | | | 2019-12-09 13:37:07.850961+01 | Client | ClientRead | idle | | | | client backend 13214 | postgres | 6275 | 10 | postgres | psql | | | -1 | 2019-12-09 14:37:13.890996+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439887+01 | 2019-12-09 14:46:36.439897+01 | | | active | | 260332629 | select * from pg_stat_activity; | client backend [A screenshot of a computer Description automatically generated] Any ideas why this could be happening? Kindly awaiting your reply. ----- kind regards/met vrindelijke groet Marko Vrgotic ActiveVideo

2 3

Re: Problems after 4.3.8 update
by Strahil 15 Dec '19

15 Dec '19

I didn't report it as nobody has mentioned it and I thought it's a onetime issue. I am,now, quite confident that is a bug. Are you using the gluster fuse mounts ( the ones in /rhev...) or libgfapi ? Can you open a case in the bugzilla.redhat.com ? Best Regards, Strahil NikolovOn Dec 15, 2019 13:16, Jayme <jaymef(a)gmail.com> wrote: > > I compared each file across my nodes and synced them. It seems to have resolved my issue. > > I wonder if there is a problem with 6.5 to 6.6 upgrade that is causing the problem? It’s strange that it seems to have happened to more than one person. I was also following proper upgrade procedure. > > > > On Sun, Dec 15, 2019 at 3:09 AM <hunter86_bg(a)yahoo.com> wrote: >> >> I don't know. I had the same issues when I migrated my gluster from v6.5 to 6.6 (currently running v7.0). >> Just get the newest file and rsync it to the rest of the bricks. It will solve the '?????? ??????' problem. >> >> Best Regards, >> Strahil Nikolov >> В неделя, 15 декември 2019 г., 3:49:27 ч. Гринуич+2, Jayme <jaymef(a)gmail.com> написа: >> >> >> on that page it says to check open bugs and the migration bug you mention does not appear to be on the list. Has it been resolved or is it just missing from this page? >> >> On Sat, Dec 14, 2019 at 7:53 PM Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote: >>> >>> Nah... this is not gonna fix your issue and is unnecessary. >>> Just compare the data from all bricks ... most probably the 'Last Updated' is different and the gfid of the file is different. >>> Find the brick that has the most fresh data, and replace (move away as a backup and rsync) the file from last good copy to the other bricks. >>> You can also run a 'full heal'. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> В събота, 14 декември 2019 г., 21:18:44 ч. Гринуич+2, Jayme <jaymef(a)gmail.com> написа: >>> >>> >>> *Update* >>> >>> Situation has improved. All VMs and engine are running. I'm left right now with about 2 heal entries in each glusterfs storage volume that will not heal. >>> >>> In all cases each heal entry is related to an OVF_STORE image and the problem appears to be an issue with the gluster metadata for those ovf_store images. When I look at the files shown in gluster volume heal info output I'm seeing question marks on the meta files which indicates an attribute/gluster problem (even though there is no split-brain). And I get input/output error when attempting to do anything with the files. >>> >>> If I look at the files on each host in /gluster_bricks they all look fine. I only see question marks on the meta files when look at the file in /rhev mounts >>> >>> Does anyone know how I can correct the attributes on these OVF_STORE files? I've tried putting each host in maintenance and re-activating to re-mount gluster volumes. I've also stopped and started all gluster volumes. >>> >>> I'm thinking I might be able to solve this by shutting down all VMs and placing all hosts in maintenance and safely restarting the entire cluster.. but that may not be necessary? >>> >>> On Fri, Dec 13, 2019 at 12:59 AM Jayme <jaymef(a)gmail.com> wrote: >>>> >>>> I believe I was able to get past this by stopping the engine volume then unmounting the glusterfs engine mount on all hosts and re-starting the volume. I was able to start hostedengine on host0. >>>> >>>> I'm still facing a few problems: >>>> >>>> 1. I'm still seeing this issue in each host's logs: >>>> >>>> Dec 13 00:57:54 orchard0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning for OVF_STORE due to Command Volume.getInfo with args {'storagepoolID': '00000000-0000-0000-0000-000000000000', 'storagedomainID': 'd70b171e-7488-4d52-8cad-bbc581dbf16e', 'volumeID': u'2632f423-ed89-43d9-93a9-36738420b866', 'imageID': u'd909dc74-5bbd-4e39-b9b5-755c167a6ee8'} failed:#012(code=201, message=Volume does not exist: (u'2632f423-ed89-43d9-93a9-36738420b866',)) >>>> Dec 13 00:57:54 orchard0 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs >>>> >>>> >>>> 2. Most of my gluster volumes still have un-healed entires which I can't seem to heal. I'm not sure what the answer is here. >>>> >>>> On Fri, Dec 13, 2019 at 12:33 AM Jayme <jaymef(a)gmail.com> wrote: >>>>> >>>>> I was able to get the hosted engine started manually via Virsh after re-creating a missing symlink in /var/run/vdsm/storage -- I later shut it down and am still having the same problem with ha broker starting. It appears that the problem *might* be with a corrupt ha metadata file, although gluster is not stating there is split brain on the engine volume >>>>> >>>>> I'm seeing this: >>>>> >>>>> ls -al /rhev/data-center/mnt/glusterSD/orchard0\:_engine/d70b171e-7488-4d52-8cad-bbc581dbf16e/ha_agent/ >>>>> ls: cannot access /rhev/data-center/mnt/glusterSD/orchard0:_engine/d70b171e-7488-4d52-8cad-bbc581dbf16e/ha_agent/hosted-engine.metadata: Input/output error >>>>> total 0 >>>>> drwxr-xr-x. 2 vdsm kvm 67 Dec 13 00:30 . >>>>> drwxr-xr-x. 6 vdsm kvm 64 Aug 6 2018 .. >>>>> lrwxrwxrwx. 1 vdsm kvm 132 Dec 13 00:30 hosted-engine.lockspace -> /var/run/vdsm/storage/d70b171e-7488-4d52-8cad-bbc581dbf16e/03a8ee8e-91f5-4e06-904b-9ed92a9706eb/db2699ce-6349-4020-b52d-8ab11d01e26d >>>>> l?????????? ? ? ? ? ? hosted-engine.metadata >>>>> >>>>> Clearly showing some sort of issue with hosted-engine.metadata on the client mount. >>>>> >>>>> on each node in /gluster_bricks I see this: >>>>> >>>>> # ls -al /gluster_bricks/engine/engine/d70b171e-7488-4d52-8cad-bbc581dbf16e/ha_agent/ >>>>> total 0 >>>>> drwxr-xr-x. 2 vdsm kvm 67 Dec 13 00:31 . >>>>> drwxr-xr-x. 6 vdsm kvm 64 Aug 6 2018 .. >>>>> lrwxrwxrwx. 2 vdsm kvm 132 Dec 13 00:31 hosted-engine.lockspace -> /var/run/vdsm/storage/d70b171e-7488-4d52-8cad-bbc581dbf16e/03a8ee8e-91f5-4e06-904b-9ed92a9706eb/db2699ce-6349-4020-b52d-8ab11d01e26d >>>>> lrwxrwxrwx. 2 vdsm kvm 132 Dec 12 16:30 hosted-engine.metadata -> /var/run/vdsm/storage/d70b171e-7488-4d52-8cad-bbc581dbf16e/66bf05fa-bf50-45ec-98d8-d00002040317/a2250415-5ff0-42ab-8071-cd9d67c3048c >>>>> >>>>> ls -al /var/run/vdsm/storage/d70b171e-7488-4d52-8cad-bbc581dbf16e/66bf05fa-bf50-45ec-98d8-d00002040317/a2250415-5ff0-42ab-8071-cd9d67c3048c >>>>> -rw-rw----. 1 vdsm kvm 1073741824 Dec 12 16:48 /var/run/vdsm/storage/d70b171e-7488-4d52-8cad-bbc581dbf16e/66bf05fa-bf50-45ec-98d8-d00002040317/a2250415-5ff0-42ab-8071-cd9d67c3048c >>>>> >>>>> >>>>> I'm not sure how to proceed at this point. Do I have data corruption, a gluster split-brain issue or something else? Maybe I just need to re-generate metadata for the hosted engine? >>>>> >>>>> On Thu, Dec 12, 2019 at 6:36 PM Jayme <jaymef(a)gmail.com> wrote: >>>>>> >>>>>> I'm running a three server HCI. Up and running on 4.3.7 with no problems. Today I updated to 4.3.8. Engine upgraded fine, rebooted. First host updated fine, rebooted and let all gluster volumes heal. Put second host in maintenance, upgraded successfully, rebooted. Waited for gluster volumes to heal for over an hour but the heal process was not completing. I tried restarting gluster services as well as the host with no success. >>>>>> >>>>>> I'm in a state right now where there are pending heals on almost all of my volumes. Nothing is reporting split-brain, but the heals are not completing. >>>>>> >>>>>> All vms are still currently running except hosted engine. Hosted engine was running but on the 2nd host I upgraded I was seeing errors such as: >>>>>> >>>>>> Dec 12 16:34:39 orchard2 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning for OVF_STORE due to Command Volume.getInfo with args {'storagepoolID': '00000000-0000-0000-0000-000000000000', 'storagedomainID': 'd70b171e-7488-4d52-8cad-bbc581dbf16e', 'volumeID': u'2632f423-ed89-43d9-93a9-36738420b866', 'imageID': u'd909dc74-5bbd-4e39-b9b5-755c167a6ee8'} failed:#012(code=201, message=Volume does not exist: (u'2632f423-ed89-43d9-93a9-36738420b866',)) >>>>>> >>>>>> I shut down the engine VM and attempted a manual heal on the engine volume. I cannot start the engine on any host now. I get: >>>>>> >>>>>> The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. >>>>>> >>>>>> I'm seeing ovirt-ha-agent crashing on all three nodes: >>>>>> >>>>>> Dec 12 18:30:48 orchard0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' >>>>>> Dec 12 18:30:48 orchard0 abrt-server: Duplicate: core backtrace >>>>>> Dec 12 18:30:48 orchard0 abrt-server: DUP_OF_DIR: /var/tmp/abrt/Python-2019-03-14-21:02:52-44318 >>>>>> Dec 12 18:30:48 orchard0 abrt-server: Deleting problem directory Python-2019-12-12-18:30:48-23193 (dup of Python-2019-03-14-21:02:52-44318) >>>>>> Dec 12 18:30:49 orchard0 vdsm[6087]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? >>>>>> Dec 12 18:30:49 orchard0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE >>>>>> Dec 12 18:30:49 orchard0 systemd: Unit ovirt-ha-broker.service entered failed state. >>>>>> Dec 12 18:30:49 orchard0 systemd: ovirt-ha-broker.service failed. >>>>>> Dec 12 18:30:49 orchard0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. >>>>>> Dec 12 18:30:49 orchard0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. >>>>>> Dec 12 18:30:49 orchard0 systemd: Stopped oVirt Hosted Engine High Availability Communications Broker. >>>>>> >>>>>> >>>>>> Here is what gluster volume heal info on engine looks like, it's similar on other volumes as well (although more heals pending on some of those): >>>>>> >>>>>> gluster volume heal engine info >>>>>> Brick gluster0:/gluster_bricks/engine/engine >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/d909dc74-5bbd-4e39-b9b5-755c167a6ee8/2632f423-ed89-43d9-93a9-36738420b866.meta >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/053171e4-f782-42d7-9115-c602beb3c826/627b8f93-5373-48bb-bd20-a308a455e082.meta >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/master/tasks/a9b11e33-9b93-46a0-a36e-85063fd53ebe.backup >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/dom_md/ids >>>>>> Status: Connected >>>>>> Number of entries: 4 >>>>>> >>>>>> Brick gluster1:/gluster_bricks/engine/engine >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/d909dc74-5bbd-4e39-b9b5-755c167a6ee8/2632f423-ed89-43d9-93a9-36738420b866.meta >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/master/tasks/a9b11e33-9b93-46a0-a36e-85063fd53ebe.backup >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/053171e4-f782-42d7-9115-c602beb3c826/627b8f93-5373-48bb-bd20-a308a455e082.meta >>>>>> /d70b171e-7488-4d52-8cad-bbc581dbf16e/dom_md/ids >>>>>> Status: Connected >>>>>> Number of entries: 4 >>>>>> >>>>>> Brick gluster2:/gluster_bricks/engine/engine >>>>>> Status: Connected >>>>>> Number of entries: 0 >>>>>> >>>>>> I don't see much in vdsm.log and gluster logs look fairly normal to me, I'm not seeing any obvious errors in the gluster logs. >>>>>> >>>>>> As far as I can tell the underlying storage is fine. Why are my gluster volumes not healing and why is self-hosted engine failing to start? >>>>>> >>>>>> More agent and broker logs: >>>>>> >>>>>> ==> agent.log <== >>>>>> MainThread::ERROR::2019-12-12 18:36:09,056::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors >>>>>> MainThread::ERROR::2019-12-12 18:36:09,058::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent >>>>>> return action(he) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper >>>>>> return he.start_monitoring() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring >>>>>> self._initialize_broker() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker >>>>>> m.get('options', {})) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor >>>>>> ).format(t=type, o=options, e=e) >>>>>> RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] >>>>>> >>>>>> MainThread::ERROR::2019-12-12 18:36:09,058::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent >>>>>> MainThread::ERROR::2019-12-12 18:36:19,619::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors >>>>>> MainThread::ERROR::2019-12-12 18:36:19,619::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent >>>>>> return action(he) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper >>>>>> return he.start_monitoring() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring >>>>>> self._initialize_broker() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker >>>>>> m.get('options', {})) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor >>>>>> ).format(t=type, o=options, e=e) >>>>>> RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] >>>>>> >>>>>> MainThread::ERROR::2019-12-12 18:36:19,619::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent >>>>>> MainThread::ERROR::2019-12-12 18:36:30,568::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors >>>>>> MainThread::ERROR::2019-12-12 18:36:30,570::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent >>>>>> return action(he) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper >>>>>> return he.start_monitoring() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring >>>>>> self._initialize_broker() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker >>>>>> m.get('options', {})) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor >>>>>> ).format(t=type, o=options, e=e) >>>>>> RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] >>>>>> >>>>>> MainThread::ERROR::2019-12-12 18:36:30,570::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent >>>>>> MainThread::ERROR::2019-12-12 18:36:41,581::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors >>>>>> MainThread::ERROR::2019-12-12 18:36:41,583::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent >>>>>> return action(he) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper >>>>>> return he.start_monitoring() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring >>>>>> self._initialize_broker() >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker >>>>>> m.get('options', {})) >>>>>> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor >>>>>> ).format(t=type, o=options, e=e) >>>>>> RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] >>>>>> >>>>>> MainThread::ERROR::2019-12-12 18:36:41,583::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent >>>>>> >>>>>> >>> _______________________________________________ >>> Users mailing list -- users(a)ovirt.org >>> To unsubscribe send an email to users-leave(a)ovirt.org >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/U5YFDWCQJYNAL…

2 1

Problems after 4.3.8 update
by Jayme 15 Dec '19

15 Dec '19

I'm running a three server HCI. Up and running on 4.3.7 with no problems. Today I updated to 4.3.8. Engine upgraded fine, rebooted. First host updated fine, rebooted and let all gluster volumes heal. Put second host in maintenance, upgraded successfully, rebooted. Waited for gluster volumes to heal for over an hour but the heal process was not completing. I tried restarting gluster services as well as the host with no success. I'm in a state right now where there are pending heals on almost all of my volumes. Nothing is reporting split-brain, but the heals are not completing. All vms are still currently running except hosted engine. Hosted engine was running but on the 2nd host I upgraded I was seeing errors such as: Dec 12 16:34:39 orchard2 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning for OVF_STORE due to Command Volume.getInfo with args {'storagepoolID': '00000000-0000-0000-0000-000000000000', 'storagedomainID': 'd70b171e-7488-4d52-8cad-bbc581dbf16e', 'volumeID': u'2632f423-ed89-43d9-93a9-36738420b866', 'imageID': u'd909dc74-5bbd-4e39-b9b5-755c167a6ee8'} failed:#012(code=201, message=Volume does not exist: (u'2632f423-ed89-43d9-93a9-36738420b866',)) I shut down the engine VM and attempted a manual heal on the engine volume. I cannot start the engine on any host now. I get: The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. I'm seeing ovirt-ha-agent crashing on all three nodes: Dec 12 18:30:48 orchard0 python: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Dec 12 18:30:48 orchard0 abrt-server: Duplicate: core backtrace Dec 12 18:30:48 orchard0 abrt-server: DUP_OF_DIR: /var/tmp/abrt/Python-2019-03-14-21:02:52-44318 Dec 12 18:30:48 orchard0 abrt-server: Deleting problem directory Python-2019-12-12-18:30:48-23193 (dup of Python-2019-03-14-21:02:52-44318) Dec 12 18:30:49 orchard0 vdsm[6087]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Dec 12 18:30:49 orchard0 systemd: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE Dec 12 18:30:49 orchard0 systemd: Unit ovirt-ha-broker.service entered failed state. Dec 12 18:30:49 orchard0 systemd: ovirt-ha-broker.service failed. Dec 12 18:30:49 orchard0 systemd: ovirt-ha-broker.service holdoff time over, scheduling restart. Dec 12 18:30:49 orchard0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Dec 12 18:30:49 orchard0 systemd: Stopped oVirt Hosted Engine High Availability Communications Broker. Here is what gluster volume heal info on engine looks like, it's similar on other volumes as well (although more heals pending on some of those): gluster volume heal engine info Brick gluster0:/gluster_bricks/engine/engine /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/d909dc74-5bbd-4e39-b9b5-755c167a6ee8/2632f423-ed89-43d9-93a9-36738420b866.meta /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/053171e4-f782-42d7-9115-c602beb3c826/627b8f93-5373-48bb-bd20-a308a455e082.meta /d70b171e-7488-4d52-8cad-bbc581dbf16e/master/tasks/a9b11e33-9b93-46a0-a36e-85063fd53ebe.backup /d70b171e-7488-4d52-8cad-bbc581dbf16e/dom_md/ids Status: Connected Number of entries: 4 Brick gluster1:/gluster_bricks/engine/engine /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/d909dc74-5bbd-4e39-b9b5-755c167a6ee8/2632f423-ed89-43d9-93a9-36738420b866.meta /d70b171e-7488-4d52-8cad-bbc581dbf16e/master/tasks/a9b11e33-9b93-46a0-a36e-85063fd53ebe.backup /d70b171e-7488-4d52-8cad-bbc581dbf16e/images/053171e4-f782-42d7-9115-c602beb3c826/627b8f93-5373-48bb-bd20-a308a455e082.meta /d70b171e-7488-4d52-8cad-bbc581dbf16e/dom_md/ids Status: Connected Number of entries: 4 Brick gluster2:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 I don't see much in vdsm.log and gluster logs look fairly normal to me, I'm not seeing any obvious errors in the gluster logs. As far as I can tell the underlying storage is fine. Why are my gluster volumes not healing and why is self-hosted engine failing to start? More agent and broker logs: ==> agent.log <== MainThread::ERROR::2019-12-12 18:36:09,056::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2019-12-12 18:36:09,058::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] MainThread::ERROR::2019-12-12 18:36:09,058::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::ERROR::2019-12-12 18:36:19,619::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2019-12-12 18:36:19,619::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] MainThread::ERROR::2019-12-12 18:36:19,619::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::ERROR::2019-12-12 18:36:30,568::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2019-12-12 18:36:30,570::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] MainThread::ERROR::2019-12-12 18:36:30,570::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::ERROR::2019-12-12 18:36:41,581::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2019-12-12 18:36:41,583::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': '10.11.0.254'}] MainThread::ERROR::2019-12-12 18:36:41,583::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent

3 7