VM HostedEngine is down with error
by souvaliotimaria@mail.com
Hello everyone,
I have a replica 2 + arbiter installation and this morning the Hosted Engine gave the following error on the UI and resumed on a different node (node3) than the one it was originally running(node1). (The original node has more memory than the one it ended up, but it had a better memory usage percentage at the time). Also, the only way I discovered the migration had happened and there was an Error in Events, was because I logged in the web interface of ovirt for a routine inspection. Βesides that, everything was working properly and still is.
The error that popped is the following:
VM HostedEngine is down with error. Exit message: internal error: qemu unexpectedly closed the monitor:
2020-09-01T06:49:20.749126Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2020-09-01T06:49:20.927274Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,id=ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,bootindex=1,write-cache=on: Failed to get "write" lock
Is another process using the image?.
Which from what I could gather concerns the following snippet from the HostedEngine.xml and it's the virtio disk of the Hosted Engine:
<disk type='file' device='disk' snapshot='no'>
<driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads' iothread='1'/>
<source file='/var/run/vdsm/storage/80f6e393-9718-4738-a14a-64cf43c3d8c2/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7'>
<seclabel model='dac' relabel='no'/>
</source>
<target dev='vda' bus='virtio'/>
<serial>d5de54b6-9f8e-4fba-819b-ebf6780757d2</serial>
<alias name='ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
I've tried looking into the logs and the sar command but I couldn't find anything to relate with the above errors and determining the reason for it to happen. Is this a Gluster or a QEMU problem?
The Hosted Engine was manually migrated five days before on node1.
Is there a standard practice I could follow to determine what happened and secure my system?
Thank you very much for your time,
Maria Souvalioti
2 years, 7 months
LACP across multiple switches
by Jorge Visentini
Hi all.
Is it possible to configure oVirt for work with two NICs in bond/LACP
across two switches, according to the image below?
[image: LACP_Across_Two_Switchs.png]
Thank you all.
You guys do a wonderful job.
--
Att,
Jorge Visentini
+55 55 98432-9868
2 years, 7 months
OVS switch type for hosted-engine
by Devin A. Bougie
Is it possible to setup a hosted engine using the OVS switch type instead of Legacy? If it's not possible to start out as OVS, instructions for switching from Legacy to OVS after the fact would be greatly appreciated.
Many thanks,
Devin
2 years, 7 months
Zanata user request
by Temuri Doghonadze
Hello,
My name is Temuri and I'd like to ask for account for Zanata to help
with translation of ovirt to Georgian.
I've done translation of zoiper, FreeCAD, part of gitlab, protonmail,
parts of gnome and KDE, etc etc etc.
In case of questions feel free to ask.
BR, Temuri
2 years, 8 months
USB3 redirection
by Rik Theys
Hi,
I'm trying to assign a USB3 controller to a CentOS 7.4 VM in oVirt 4.1
with USB redirection enabled.
I've created the following file in /etc/ovirt-engine/osinfo.conf.d:
01-usb.properties with content
os.other.devices.usb.controller.value = nec-xhci
and have restarted ovirt-engine.
If I disable USB-support in the web interface for the VM, the xhci
controller is added to the VM (I can see it in the qemu-kvm
commandline), but usb redirection is not available.
If I enable USB-support in the UI, no xhci controller is added (only 4
uhci controllers).
Is there a way to make the controllers for usb redirection xhci controllers?
Regards,
Rik
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>
2 years, 8 months
OVN routing and firewalling in oVirt
by Gianluca Cecchi
Hello,
how do we manage routing between different OVN networks in oVirt?
And between OVN networks and physical ones?
Based on architecture read here:
http://openvswitch.org/support/dist-docs/ovn-architecture.7.html
I see terms for logical routers and gateway routers respectively but how to
apply to oVirt configuration?
Do I have to choose between setting up a specialized VM or a physical one:
is it applicable/advisable to put on oVirt host itself the gateway
functionality?
Is there any security policy (like security groups in Openstack) to
implement?
Thanks,
Gianluca
2 years, 8 months
VM failed to start when host's network is down
by lizhijian@fujitsu.com
Post again after subscribing the mail list.
Hi guys
I have an all in one ovirt environment which node installed both
vdsm and ovirt-engine.
I have setup the ovirt environment and it could work well.
For some reasons, i have to use this ovirt with node's networking down(i unplugged the network cable)
In such case, I noticed that i cannot start a VM anymore.
I wonder if there is a configuration switch to enable a ovirt to work with node's networking down ?
if not, may i possible to make it work by a easy way ?
When i tried to start VM with ovirt API, it responses with:
```bash
[root@74d2ab9cb0 ~]# sh start.sh
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
<async>false</async>
<fault>
<detail>[Cannot run VM. Unknown Data Center status.]</detail>
<reason>Operation Failed</reason>
</fault>
<status>failed</status>
</action>
[root@74d2ab9cb0 ~]# sh start.sh
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
<async>false</async>
<fault>
<detail>[Cannot run VM. Unknown Data Center status.]</detail>
<reason>Operation Failed</reason>
</fault>
<status>failed</status>
</action>
[root@74d2ab9cb0 ~]#
```
Attached the vdsm and ovirt-engine
Thanks
Zhijian
2 years, 8 months
Wait for the engine to come up on the target vm
by Vladimir Belov
I'm trying to deploy oVirt from a self-hosted engine, but at the last step I get an engine startup error.
[ INFO ] TASK [Wait for the engine to come up on the target VM]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.181846", "end": "2022-03-28 15:41:28.853150", "rc": 0, "start": "2022-03-28 15:41:28.671304", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=5537 (Mon Mar 28 15:41:20 2022)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=5537 (Mon Mar 28 15:41:20 2022)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"v2.test.ru\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"4d2eeaea\", \"local_conf_timestamp\": 5537, \"host-ts\": 5537}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=5537 (Mon Mar 28 15:41:20 2022)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=5537 (Mon Mar 28 15:41:20 2022)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"v2.test.ru\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"4d2eeaea\", \"local_conf_timestamp\": 5537, \"host-ts\": 5537}, \"global_maintenance\": false}"]}
Аfter the installation is completed, the condition of the engine is as follows:
Engine status: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
After reading the vdsm.logs, I found that qemu-guest-agent failed to connect to the engine for some reason.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5400, in qemuGuestAgentShutdown
self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2517, in shutdownFlags
if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed', dom=self)
libvirtError: Guest agent is not responding: QEMU guest agent is not connected
During the installation phase, qemu-guest-agent on the guest VM is running.
Setting a temporary password (hosted-engine --add-console-password --password) and connecting via VNC also failed.
Using "hosted-engine --console" also failed to connect
The engine VM is running on this host
Connected to HostedEngine domain
Escaping character: ^]
error: internal error: character device <null> not found
The network settings are configured using static addressing, without DHCP.
It seems to me that this is due to the fact that the engine receives an IP address that does not match the entry in /etc/hosts, but I do not know how to fix it. Any help is welcome, I will provide the necessary logs. Thanks
2 years, 8 months
Gluster storage and TRIM VDO
by Oleh Horbachov
Hello everyone. I have a Gluster distributed replication cluster deployed. The cluster - store for ovirt. For bricks - VDO over a raw disk. When discarding via 'fstrim -av' the storage hangs for a few seconds and the connection is lost. Does anyone know the best practices for using TRIM with VDO in the context of ovirt?
ovirt - v4.4.10
gluster - v8.6
2 years, 8 months
Mac addresses pool issues
by Nicolas MAIRE
Hi,
We're encountering some issues on one of our production clusters running oVirt 4.2. We've had an incident with the engine's database a few weeks back that we were able to recover from, however since then we've been having a bunch of weird issues, mostly around MACs.
It started off with the engine being unable to find a free MAC when creating a VM, despite there being significantly less virtual interfaces (around 250) than the total number of MACs in the default pool (default configuration, so 65536 addresses) and escalated into creating duplicate MACs (despite the pool not allowing it) and now we can't even modify the pool or remove VMs (since deleting the attached vnics fail), so we're kinda stuck with a cluster that has running VMs which are fine as long as we don't touch them, but on which we can't create new VMs (or modify the existing ones).
In the engine's log we can see that we've had an "Unable to initialize MAC pool due to existing duplicates (Failed with error MAC_POOL_INITIALIZATION_FAILED and code 5010)" error when we tried to reconfigure the pool this morning (see the full error stack here : https://pastebin.com/6bKMfbLn) and now whenever we try to delete a VM or reconfigure the pool we have a 'Pool for id="58ca604b-017d-0374-0220-00000000014e" does not exist' error (see the full error stack here: https://pastebin.com/Huy91iig), but, if we check the engine's mac_pool table we can see that it's there :
engine=# select * from mac_pools;
id | name | description | allow_duplicate_mac_addresses | default_pool
--------------------------------------+---------+------------------+-------------------------------+--------------
58ca604b-017d-0374-0220-00000000014e | Default | Default MAC pool | f | t
(1 row)
engine=# select * from mac_pool_ranges;
mac_pool_id | from_mac | to_mac
--------------------------------------+-------------------+-------------------
58ca604b-017d-0374-0220-00000000014e | 56:6f:1a:1a:00:00 | 56:6f:1a:1a:ff:ff
(1 row)
I found this bugzilla that seems to somehow apply https://bugzilla.redhat.com/show_bug.cgi?id=1554180 however I don't really know how to "reinitialize engine", especially considering that the mac pool was not configured to allow duplicate macs to begin with, and I've no idea what the impact of that reinitialization would be on the current VMs.
I'm quite new to oVirt (only been using it for one year) so any help would be greatly appreciated.
2 years, 8 months