vdsm hook after node upgrade
by Nathanaël Blanchet
Hi,
I've upgraded my hosts from 4.4.9 to 4.4.10 and none of my vdsm hooks
are present anymore... i believed those additionnal personnal data were
persistent across update...
--
Nathanaël Blanchet
Supervision réseau
SIRE
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
2 years
ovirt-dr generate
by Colin Coe
Hi all
I'm trying to run ovirt-dr generate but its failing:
/usr/share/ansible/collections/ansible_collections/redhat/rhv/roles/disaster_recovery/files/ovirt-dr
generate
Log file: '/tmp/ovirt-dr-1649673243333.log'
[Generate Mapping File] Connection to setup has failed. Please check your
credentials:
URL: https://server.fqdn/ovirt-engine/api
user: admin@internal
CA file: ./ca.pem
[Generate Mapping File] Failed to generate var file.
When I examine the log file:
2022-04-11 18:34:03,332 INFO Start generate variable mapping file for oVirt
ansible disaster recovery
2022-04-11 18:34:03,333 INFO Site address:
https://server.fqdn/ovirt-engine/api
username: admin@internal
password: *******
ca file location: ./ca.pem
output file location: ./disaster_recovery_vars.yml
ansible play location: ./dr_play.yml
2022-04-11 18:34:03,343 ERROR Connection to setup has failed. Please check
your credentials:
URL: https://server.fqdn/ovirt-engine/api
user: admin@internal
CA file: ./ca.pem
2022-04-11 18:34:03,343 ERROR Error: Error while sending HTTP request: (60,
'SSL certificate problem: unable to get local issuer certificate')
2022-04-11 18:34:03,343 ERROR Failed to generate var file.
My suspicion is that the script doesn't like third party certs.
Has anyone got this working with third party certs? If so, what did you
need to do?
Thanks
2 years
Couldn't connect to VDSM within 60 seconds
by pasquale.borrelli@synlab.it
Hi,
we have 3 hosts and a self-hosted engine VM (ovirt version 4.4.).
After rebooting all the hosts, we are unable to start the VM hosted-engine. In particular, the output of 'hosted-engine --vm-start' is as follows:
"The hosted engine configuration has not been retrieved from shared storage yet,
for more details please check sanlock status."
We checked the sanlock, ovirt-ha-agent and broker status but all are "active (running)".
Checking the log files, all share the common error "RuntimeError: Couldn't connect to VDSM within 60 second", returned in loop.
In the vdsm.log we found this error:
"
2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:48)
2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] FINISH repoStats return={} from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:54)
2022-04-05 16:20:11,789+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168)
"
We tried googling to resolve this issue but, unfortunately, unsuccessfully.
Can someone help us to solve our critical issue?
Bests,
Pasquale
2 years
Ovirt 4.4.10 lab unstable on ESXi
by Mohamed Roushdy
Hello,
I’m used to having a nested-ESXi on another ESXi, but we are testing Ovirt these days, and we are facing a strange behavior with the engine VM. I have a hyper-converged Ovirt cluster with a hosted-engine, and whenever I reboot/shutdown one of the nodes for testing (a node that is not hosting the engine VM of course) the engine goes crazy and too slow, and starts losing network connecting to the remaining (healthy) nodes in the cluster, and I can hardly SSH into it. Ping also has too many drops while another node is down. Once that faulty node is up again the engine works very fine, but this is really limiting my ability to make further Ovirt HA tests before moving to production. I though maybe the promiscuous mode on the physical host causes this, but never faced this with nester-virtualization with VMware, and turning-off promiscuous mode isn’t an option either… what do you think please?
Mohamed Roushdy
Team Member – Systems Administrator
M: +31 61 55 94 300
[signature_514057058]<https://peopleintouch.com/>[signature_1294661742]<https://www.linkedin.com/authwall?trk=bf&trkInfo=AQGK3xaGp-v5DAAAAX16b8yY...>
2 years
VM failed to start when host's network is down
by lizhijian@fujitsu.com
Post again after subscribing the mail list.
Hi guys
I have an all in one ovirt environment which node installed both
vdsm and ovirt-engine.
I have setup the ovirt environment and it could work well.
For some reasons, i have to use this ovirt with node's networking down(i unplugged the network cable)
In such case, I noticed that i cannot start a VM anymore.
I wonder if there is a configuration switch to enable a ovirt to work with node's networking down ?
if not, may i possible to make it work by a easy way ?
When i tried to start VM with ovirt API, it responses with:
```bash
[root@74d2ab9cb0 ~]# sh start.sh
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
<async>false</async>
<fault>
<detail>[Cannot run VM. Unknown Data Center status.]</detail>
<reason>Operation Failed</reason>
</fault>
<status>failed</status>
</action>
[root@74d2ab9cb0 ~]# sh start.sh
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
<async>false</async>
<fault>
<detail>[Cannot run VM. Unknown Data Center status.]</detail>
<reason>Operation Failed</reason>
</fault>
<status>failed</status>
</action>
[root@74d2ab9cb0 ~]#
```
Attached the vdsm and ovirt-engine
Thanks
Zhijian
2 years
Wait for the engine to come up on the target vm
by Vladimir Belov
I'm trying to deploy oVirt from a self-hosted engine, but at the last step I get an engine startup error.
[ INFO ] TASK [Wait for the engine to come up on the target VM]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.181846", "end": "2022-03-28 15:41:28.853150", "rc": 0, "start": "2022-03-28 15:41:28.671304", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=5537 (Mon Mar 28 15:41:20 2022)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=5537 (Mon Mar 28 15:41:20 2022)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"v2.test.ru\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"4d2eeaea\", \"local_conf_timestamp\": 5537, \"host-ts\": 5537}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=5537 (Mon Mar 28 15:41:20 2022)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=5537 (Mon Mar 28 15:41:20 2022)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"v2.test.ru\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"4d2eeaea\", \"local_conf_timestamp\": 5537, \"host-ts\": 5537}, \"global_maintenance\": false}"]}
Аfter the installation is completed, the condition of the engine is as follows:
Engine status: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
After reading the vdsm.logs, I found that qemu-guest-agent failed to connect to the engine for some reason.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5400, in qemuGuestAgentShutdown
self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2517, in shutdownFlags
if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed', dom=self)
libvirtError: Guest agent is not responding: QEMU guest agent is not connected
During the installation phase, qemu-guest-agent on the guest VM is running.
Setting a temporary password (hosted-engine --add-console-password --password) and connecting via VNC also failed.
Using "hosted-engine --console" also failed to connect
The engine VM is running on this host
Connected to HostedEngine domain
Escaping character: ^]
error: internal error: character device <null> not found
The network settings are configured using static addressing, without DHCP.
It seems to me that this is due to the fact that the engine receives an IP address that does not match the entry in /etc/hosts, but I do not know how to fix it. Any help is welcome, I will provide the necessary logs. Thanks
2 years
Gluster storage and TRIM VDO
by Oleh Horbachov
Hello everyone. I have a Gluster distributed replication cluster deployed. The cluster - store for ovirt. For bricks - VDO over a raw disk. When discarding via 'fstrim -av' the storage hangs for a few seconds and the connection is lost. Does anyone know the best practices for using TRIM with VDO in the context of ovirt?
ovirt - v4.4.10
gluster - v8.6
2 years
Mac addresses pool issues
by Nicolas MAIRE
Hi,
We're encountering some issues on one of our production clusters running oVirt 4.2. We've had an incident with the engine's database a few weeks back that we were able to recover from, however since then we've been having a bunch of weird issues, mostly around MACs.
It started off with the engine being unable to find a free MAC when creating a VM, despite there being significantly less virtual interfaces (around 250) than the total number of MACs in the default pool (default configuration, so 65536 addresses) and escalated into creating duplicate MACs (despite the pool not allowing it) and now we can't even modify the pool or remove VMs (since deleting the attached vnics fail), so we're kinda stuck with a cluster that has running VMs which are fine as long as we don't touch them, but on which we can't create new VMs (or modify the existing ones).
In the engine's log we can see that we've had an "Unable to initialize MAC pool due to existing duplicates (Failed with error MAC_POOL_INITIALIZATION_FAILED and code 5010)" error when we tried to reconfigure the pool this morning (see the full error stack here : https://pastebin.com/6bKMfbLn) and now whenever we try to delete a VM or reconfigure the pool we have a 'Pool for id="58ca604b-017d-0374-0220-00000000014e" does not exist' error (see the full error stack here: https://pastebin.com/Huy91iig), but, if we check the engine's mac_pool table we can see that it's there :
engine=# select * from mac_pools;
id | name | description | allow_duplicate_mac_addresses | default_pool
--------------------------------------+---------+------------------+-------------------------------+--------------
58ca604b-017d-0374-0220-00000000014e | Default | Default MAC pool | f | t
(1 row)
engine=# select * from mac_pool_ranges;
mac_pool_id | from_mac | to_mac
--------------------------------------+-------------------+-------------------
58ca604b-017d-0374-0220-00000000014e | 56:6f:1a:1a:00:00 | 56:6f:1a:1a:ff:ff
(1 row)
I found this bugzilla that seems to somehow apply https://bugzilla.redhat.com/show_bug.cgi?id=1554180 however I don't really know how to "reinitialize engine", especially considering that the mac pool was not configured to allow duplicate macs to begin with, and I've no idea what the impact of that reinitialization would be on the current VMs.
I'm quite new to oVirt (only been using it for one year) so any help would be greatly appreciated.
2 years