Ovirt 4.4.10 lab unstable on ESXi
by Mohamed Roushdy
Hello,
I’m used to having a nested-ESXi on another ESXi, but we are testing Ovirt these days, and we are facing a strange behavior with the engine VM. I have a hyper-converged Ovirt cluster with a hosted-engine, and whenever I reboot/shutdown one of the nodes for testing (a node that is not hosting the engine VM of course) the engine goes crazy and too slow, and starts losing network connecting to the remaining (healthy) nodes in the cluster, and I can hardly SSH into it. Ping also has too many drops while another node is down. Once that faulty node is up again the engine works very fine, but this is really limiting my ability to make further Ovirt HA tests before moving to production. I though maybe the promiscuous mode on the physical host causes this, but never faced this with nester-virtualization with VMware, and turning-off promiscuous mode isn’t an option either… what do you think please?
Mohamed Roushdy
Team Member – Systems Administrator
M: +31 61 55 94 300
[signature_514057058]<https://peopleintouch.com/>[signature_1294661742]<https://www.linkedin.com/authwall?trk=bf&trkInfo=AQGK3xaGp-v5DAAAAX16b8yY...>
3 years
VM failed to start when host's network is down
by lizhijian@fujitsu.com
Post again after subscribing the mail list.
Hi guys
I have an all in one ovirt environment which node installed both
vdsm and ovirt-engine.
I have setup the ovirt environment and it could work well.
For some reasons, i have to use this ovirt with node's networking down(i unplugged the network cable)
In such case, I noticed that i cannot start a VM anymore.
I wonder if there is a configuration switch to enable a ovirt to work with node's networking down ?
if not, may i possible to make it work by a easy way ?
When i tried to start VM with ovirt API, it responses with:
```bash
[root@74d2ab9cb0 ~]# sh start.sh
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
<async>false</async>
<fault>
<detail>[Cannot run VM. Unknown Data Center status.]</detail>
<reason>Operation Failed</reason>
</fault>
<status>failed</status>
</action>
[root@74d2ab9cb0 ~]# sh start.sh
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
<async>false</async>
<fault>
<detail>[Cannot run VM. Unknown Data Center status.]</detail>
<reason>Operation Failed</reason>
</fault>
<status>failed</status>
</action>
[root@74d2ab9cb0 ~]#
```
Attached the vdsm and ovirt-engine
Thanks
Zhijian
3 years
Wait for the engine to come up on the target vm
by Vladimir Belov
I'm trying to deploy oVirt from a self-hosted engine, but at the last step I get an engine startup error.
[ INFO ] TASK [Wait for the engine to come up on the target VM]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.181846", "end": "2022-03-28 15:41:28.853150", "rc": 0, "start": "2022-03-28 15:41:28.671304", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=5537 (Mon Mar 28 15:41:20 2022)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=5537 (Mon Mar 28 15:41:20 2022)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"v2.test.ru\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"4d2eeaea\", \"local_conf_timestamp\": 5537, \"host-ts\": 5537}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=5537 (Mon Mar 28 15:41:20 2022)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=5537 (Mon Mar 28 15:41:20 2022)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"v2.test.ru\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"4d2eeaea\", \"local_conf_timestamp\": 5537, \"host-ts\": 5537}, \"global_maintenance\": false}"]}
Аfter the installation is completed, the condition of the engine is as follows:
Engine status: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
After reading the vdsm.logs, I found that qemu-guest-agent failed to connect to the engine for some reason.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5400, in qemuGuestAgentShutdown
self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2517, in shutdownFlags
if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed', dom=self)
libvirtError: Guest agent is not responding: QEMU guest agent is not connected
During the installation phase, qemu-guest-agent on the guest VM is running.
Setting a temporary password (hosted-engine --add-console-password --password) and connecting via VNC also failed.
Using "hosted-engine --console" also failed to connect
The engine VM is running on this host
Connected to HostedEngine domain
Escaping character: ^]
error: internal error: character device <null> not found
The network settings are configured using static addressing, without DHCP.
It seems to me that this is due to the fact that the engine receives an IP address that does not match the entry in /etc/hosts, but I do not know how to fix it. Any help is welcome, I will provide the necessary logs. Thanks
3 years
Gluster storage and TRIM VDO
by Oleh Horbachov
Hello everyone. I have a Gluster distributed replication cluster deployed. The cluster - store for ovirt. For bricks - VDO over a raw disk. When discarding via 'fstrim -av' the storage hangs for a few seconds and the connection is lost. Does anyone know the best practices for using TRIM with VDO in the context of ovirt?
ovirt - v4.4.10
gluster - v8.6
3 years
Mac addresses pool issues
by Nicolas MAIRE
Hi,
We're encountering some issues on one of our production clusters running oVirt 4.2. We've had an incident with the engine's database a few weeks back that we were able to recover from, however since then we've been having a bunch of weird issues, mostly around MACs.
It started off with the engine being unable to find a free MAC when creating a VM, despite there being significantly less virtual interfaces (around 250) than the total number of MACs in the default pool (default configuration, so 65536 addresses) and escalated into creating duplicate MACs (despite the pool not allowing it) and now we can't even modify the pool or remove VMs (since deleting the attached vnics fail), so we're kinda stuck with a cluster that has running VMs which are fine as long as we don't touch them, but on which we can't create new VMs (or modify the existing ones).
In the engine's log we can see that we've had an "Unable to initialize MAC pool due to existing duplicates (Failed with error MAC_POOL_INITIALIZATION_FAILED and code 5010)" error when we tried to reconfigure the pool this morning (see the full error stack here : https://pastebin.com/6bKMfbLn) and now whenever we try to delete a VM or reconfigure the pool we have a 'Pool for id="58ca604b-017d-0374-0220-00000000014e" does not exist' error (see the full error stack here: https://pastebin.com/Huy91iig), but, if we check the engine's mac_pool table we can see that it's there :
engine=# select * from mac_pools;
id | name | description | allow_duplicate_mac_addresses | default_pool
--------------------------------------+---------+------------------+-------------------------------+--------------
58ca604b-017d-0374-0220-00000000014e | Default | Default MAC pool | f | t
(1 row)
engine=# select * from mac_pool_ranges;
mac_pool_id | from_mac | to_mac
--------------------------------------+-------------------+-------------------
58ca604b-017d-0374-0220-00000000014e | 56:6f:1a:1a:00:00 | 56:6f:1a:1a:ff:ff
(1 row)
I found this bugzilla that seems to somehow apply https://bugzilla.redhat.com/show_bug.cgi?id=1554180 however I don't really know how to "reinitialize engine", especially considering that the mac pool was not configured to allow duplicate macs to begin with, and I've no idea what the impact of that reinitialization would be on the current VMs.
I'm quite new to oVirt (only been using it for one year) so any help would be greatly appreciated.
3 years
How to create User in web ui?
by jihwahn1018@naver.com
Hello,
in ovirt guide, we need to create user by ovirt-aaa-jdbc-tool
and we can add only users created by ovirt-aaa-jdbc-tool.
Is there any way to create user in web ui?
and if not, is there any reason to block creating user in web ui?
Thank you.
3 years
Duplicate nameserver in Host causing unassigned state when adding. possible bug?
by ravi k
Hello all,
We are running oVirt 4.3.10.4-1.0.22.el7. I noticed an interesting issue or a possible bug yesterday. I was trying to add a host when I noticed that it was failing and the host status was going into 'unassigned' state.
I saw the below error in the engine log.
/var/log/ovirt-engine/engine.log
2022-04-07 15:17:07,739+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (EE-ManagedThreadFactory-engine-Thread-24723) [4917a348] HostName = olvsrv005u
2022-04-07 15:17:07,739+04 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (EE-ManagedThreadFactory-engine-Thread-24723) [4917a348] Failed in 'CollectVdsNetworkDataAfterInstallationVDS' method, for vds: 'olvsrv005u'; host: '10.119.6.232': CallableStatementCallback; SQL [{call insertnameserver(?, ?, ?)}ERROR: duplicate key value violates unique constraint "name_server_pkey"
Detail: Key (dns_resolver_configuration_id, address)=(459b68e6-b684-4cf6-8834-755249a6bd3a, 10.119.10.212) already exists.
Where: SQL statement "INSERT INTO
name_server(
address,
position,
dns_resolver_configuration_id)
VALUES (
v_address,
v_position,
v_dns_resolver_configuration_id)"
PL/pgSQL function insertnameserver(uuid,character varying,smallint) line 3 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "name_server_pkey"
Detail: Key (dns_resolver_configuration_id, address)=(459b68e6-b684-4cf6-8834-755249a6bd3a, 10.119.10.212) already exists.
Then I checked the resolv.conf on the host
[root@olvsrv005u ~]# cat /etc/resolv.conf
# Version: 1.00
search uat.abc.com
nameserver 10.119.10.212
nameserver 10.119.10.212
Well, ideally it's of no use having duplicate nameserver. But it was not affecting the functionality of the host. However it was failing the addition of the host, probably because it was failing when updating the host's config in the engine DB due to the duplicate nameserver.
To test this I commented the duplicate value and checked. The host is now added successfully.
2022-04-07 15:33:37,301+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] START, GetHardwareInfoAsyncVDSCommand(HostName = olvsrv005u, VdsIdA
ndVdsVDSCommandParametersBase:{hostId='459b68e6-b684-4cf6-8834-755249a6bd3a', vds='Host[olvsrv005u,459b68e6-b684-4cf6-8834-755249a6bd3a]'}), log id: 52e7ec52
2022-04-07 15:33:37,301+04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] FINISH, GetHardwareInfoAsyncVDSCommand, return: , log id: 52e7ec52
2022-04-07 15:33:37,356+04 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [3de72cb7] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :
ID: 459b68e6-b684-4cf6-8834-755249a6bd3a Type: VDS
2022-04-07 15:33:37,360+04 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [3de72cb7] START, SetVdsStatusVDSCommand(HostName = olvsrv005u, SetVdsStatusVDSCommandPa
rameters:{hostId='459b68e6-b684-4cf6-8834-755249a6bd3a', status='NonOperational', nonOperationalReason='NETWORK_UNREACHABLE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 1bfc90a3
2022-04-07 15:33:37,363+04 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [3de72cb7] FINISH, SetVdsStatusVDSCommand, return: , log id: 1bfc90a3
2022-04-07 15:33:37,404+04 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [3de72cb7] Host 'olvsrv005u' is set to Non-Operational, it is missing the following networks
Should I raise this as a bug? I'm of the opinion that it should be because if it's not breaking the host's functionality then it should be ok.
Regards,
Ravi
3 years
10Gbps iSCSI Bonding issue on HPE Gen10 server
by michael.li@hactlsolutions.com
Hi support,
I have an issue on configure 10Gbps iSCSI port on HPE Gen 10 Server in oVirt 4.4.10. The behavior as below
1. Two 10Gbps ports are running on 10000Mb/s and up verified ethtool
2. configure iSCSI active-standby bonding (bond0) on oVirt manager.
3. When reboot the server, one of port is down (no carrier).
Temp. solution: I manually up the port by "nmcli conn up "port name" to resume bonding. Anyone know how to resolve it? Let me know for any information needed to investigate.
I enclose my configuration as below.
[root@ovrphv04 network-scripts]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: ens1f1np1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
Slave Interface: ens1f1np1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: f4:03:43:e7:da:38
Slave queue ID: 0
Slave Interface: ens4f1np1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: f4:03:43:e7:d7:68
Slave queue ID: 0
[root@ovrphv04 network-scripts]# ip link |grep ens1f1np1
7: ens1f1np1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
[root@ovrphv04 network-scripts]# ip link |grep ens4f1np1
10: ens4f1np1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9000 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
[root@ovrphv04 network-scripts]# cat ifcfg-ens1f1np1
TYPE=Ethernet
MTU=9000
SRIOV_TOTAL_VFS=0
NAME=ens1f1np1
UUID=6072cd16-a45a-433e-bc9e-817557706fb2
DEVICE=ens1f1np1
ONBOOT=yes
LLDP=no
ETHTOOL_OPTS="speed 10000 duplex full autoneg on"
MASTER_UUID=77e39d1b-cf14-406f-87f6-fe954fde40f0
MASTER=bond0
SLAVE=yes
[root@ovrphv04 network-scripts]# cat ifcfg-ens4f1np1
TYPE=Ethernet
MTU=9000
SRIOV_TOTAL_VFS=0
NAME=ens4f1np1
UUID=cf6b28e7-8fdb-472f-bbe9-0eecc95c5c3c
DEVICE=ens4f1np1
ONBOOT=yes
LLDP=no
ETHTOOL_OPTS="speed 10000 duplex full autoneg on"
MASTER_UUID=77e39d1b-cf14-406f-87f6-fe954fde40f0
MASTER=bond0
SLAVE=yes
[root@ovrphv04 network-scripts]# cat ifcfg-bond0
BONDING_OPTS="mode=active-backup miimon=100"
TYPE=Bond
BONDING_MASTER=yes
HWADDR=
MTU=9000
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=172.26.14.139
PREFIX=24
DEFROUTE=yes
DHCP_CLIENT_ID=mac
IPV4_FAILURE_FATAL=no
IPV6_DISABLED=yes
IPV6INIT=no
DHCPV6_DUID=ll
DHCPV6_IAID=mac
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=bond0
UUID=77e39d1b-cf14-406f-87f6-fe954fde40f0
DEVICE=bond0
ONBOOT=yes
AUTOCONNECT_SLAVES=yes
LLDP=no
3 years