May 2020 - Users - oVirt List Archives

supervdsm failing during network_caps
by Alan G 24 Mar '21

24 Mar '21

Hi, I have issues with one host where supervdsm is failing in network_caps. I see the following trace in the log. MainProcess|jsonrpc/1::ERROR::2020-01-06 03:01:05,558::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) Error in network_caps Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 98, in wrapper res = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 56, in network_caps return netswitch.configurator.netcaps(compatibility=30600) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 317, in netcaps net_caps = netinfo(compatibility=compatibility) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 325, in netinfo _netinfo = netinfo_get(vdsmnets, compatibility) File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 150, in get return _stringify_mtus(_get(vdsmnets)) File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 59, in _get ipaddrs = getIpAddrs() File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/addresses.py", line 72, in getIpAddrs for addr in nl_addr.iter_addrs(): File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/addr.py", line 33, in iter_addrs with _nl_addr_cache(sock) as addr_cache: File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/__init__.py", line 92, in _cache_manager cache = cache_allocator(sock) File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/libnl.py", line 469, in rtnl_addr_alloc_cache raise IOError(-err, nl_geterror(err)) IOError: [Errno 16] Message sequence number mismatch A restart of supervdsm will resolve the issue for a period, maybe 24 hours, then it will occur again. So I'm thinking it's resource exhaustion or a leak of some kind? Running 4.2.8.2 with VDSM at 4.20.46. I've had a look through the bugzilla and can't find an exact match, closest was this one https://bugzilla.redhat.com/show_bug.cgi?id=1666123 which seems to be a RHV only fix. Thanks, Alan

2 3

OVN and change of mgmt network
by Gianluca Cecchi 26 Jan '21

26 Jan '21

Hello, I previously had OVN running on engine (as OVN provider with northd and northbound and southbound DBs) and hosts (with OVN controller). After changing mgmt ip of hosts (engine has retained instead the same ip), I executed again on them the command: vdsm-tool ovn-config <ip_of_engine> <nel_local_ip_of_host> Now I think I have to clean up some things, eg: 1) On engine where I get these lines below systemctl status ovn-northd.service -l . . . Sep 29 14:41:42 ovmgr1 ovsdb-server[940]: ovs|00005|reconnect|ERR|tcp: 10.4.167.40:37272: no response to inactivity probe after 5 seconds, disconnecting Oct 03 11:52:00 ovmgr1 ovsdb-server[940]: ovs|00006|reconnect|ERR|tcp: 10.4.167.41:52078: no response to inactivity probe after 5 seconds, disconnecting The two IPs are the old ones of two hosts It seems that a restart of the services has fixed... Can anyone confirm if I have to do anything else? 2) On hosts (there are 3 hosts with OVN on ip 10.4.192.32/33/34) where I currently have this output [root@ov301 ~]# ovs-vsctl show 3a38c5bb-0abf-493d-a2e6-345af8aedfe3 Bridge br-int fail_mode: secure Port "ovn-1dce5b-0" Interface "ovn-1dce5b-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.192.32"} Port "ovn-ddecf0-0" Interface "ovn-ddecf0-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.192.33"} Port "ovn-fd413b-0" Interface "ovn-fd413b-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.168.74"} Port br-int Interface br-int type: internal ovs_version: "2.7.2" [root@ov301 ~]# The IPs of kind 10.4.192.x are ok. But there is a left-over of an old host I initially used for tests, corresponding to 10.4.168.74, that now doesn't exist anymore How can I clean records for 1) and 2)? Thanks, Gianluca

3 4

encrypted GENEVE traffic
by Pavel Nakonechnyi 18 Jan '21

18 Jan '21

Dear oVirt Community, From my understanding oVirt does not support Open vSwitch IPSEC tunneling for GENEVE traffic (which is described on pages http://docs.openvswitch.org/en/latest/howto/ipsec/ and http://docs.openvswitch.org/en/latest/tutorials/ipsec/) Are there plans to introduce such support? (or explicitly not to..) Is it possible to somehow manually configure such tunneling for existing virtual networks? (even in a limited way) Alternatively, is it possible to deploy oVirt on top of the tunneled (i.e. via VXLAN, IPSec) interfaces? This will allow to encrypt all management traffic. Such requirement arises when using oVirt deployment on third-party premises with untrusted network. Thank in advance for any clarifications. :) -- WBR, Pavel +32478910884

2 7

oVirt 4.4: Self-hosted engine deployment fails with backup restore from 4.3 engine
by Oliver Leinfelder 11 Dec '20

11 Dec '20

Hi there, I'm a bit puzzled about an possible upgrade paths from a 4.3 cluster to version 4.4 in a self-hosted engine environment. My idea was: Set up a new host with a clean ovirt node 4.4 installation, then deploy the hosted engine on this with a restored backup from the production cluster and go from there. This however fails with the following error: 2020-05-27 00:17:08,886+0200 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': 'non-zero return code', 'cmd': ['engine-setup', '--accept-defaults', '--config-append=/root/ovirt-engine-answers'], 'stdout': "[ INFO ] Stage: Initializing\n[ INFO ] Stage: Environment setup\n C onfiguration files: /etc/ovirt-engine-setup.conf.d/10-packaging-jboss.conf, /etc/ovirt-engine-setup.conf.d/10-packaging.conf, /etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf, /root/ovirt-engine-answers\n Log file: /var/log/ovirt-engine/setup/ovirt-engine-setup-20200527001657-fyeueu.log\n Version: otop i-1.9.1 (otopi-1.9.1-1.el8)\n[ INFO ] DNF Downloading 1 files, 0.00KB\n[ INFO ] DNF Downloaded CentOS-8 - AppStream\n[ INFO ] DNF Downloading 1 files, 0.00KB\n[ INFO ] DNF Downloaded CentOS-8 - Base\n[ INFO ] DNF Downloading 1 files, 0.00KB\n [...] ... anwsers from backup config follow .... [...] 2020-05-27 00:17:12,396+0200 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 403, in _closeup r = ah.run() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 229, in run raise RuntimeError(_('Failed executing ansible-playbook')) Is this approach (restoring from 4.3) generally supposed to work? If not, what is the appropriate upgrade path? Thank you! Regards Oli

3 7

Can't import some VMs after storage domain detach and reattach to new datacenter.
by m black 10 Dec '20

10 Dec '20

7 14

"gluster-ansible-roles is not installed on Host" error on Cockpit
by Hesham Ahmed 26 Nov '20

26 Nov '20

On a new 4.3.1 oVirt Node installation, when trying to deploy HCI (also when trying adding a new gluster volume to existing clusters) using Cockpit, an error is displayed "gluster-ansible-roles is not installed on Host. To continue deployment, please install gluster-ansible-roles on Host and try again". There is no package named gluster-ansible-roles in the repositories: [root@localhost ~]# yum install gluster-ansible-roles Loaded plugins: enabled_repos_upload, fastestmirror, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager, vdsmupgrade This system is not registered with an entitlement server. You can use subscription-manager to register. Loading mirror speeds from cached hostfile * ovirt-4.3-epel: mirror.horizon.vn No package gluster-ansible-roles available. Error: Nothing to do Uploading Enabled Repositories Report Cannot upload enabled repos report, is this client registered? This is due to check introduced here: https://gerrit.ovirt.org/#/c/98023/1/dashboard/src/helpers/AnsibleUtil.js Changing the line from: [ "rpm", "-qa", "gluster-ansible-roles" ], { "superuser":"require" } to [ "rpm", "-qa", "gluster-ansible" ], { "superuser":"require" } resolves the issue. The above code snippet is installed at /usr/share/cockpit/ovirt-dashboard/app.js on oVirt node and can be patched by running "sed -i 's/gluster-ansible-roles/gluster-ansible/g' /usr/share/cockpit/ovirt-dashboard/app.js && systemctl restart cockpit"

2 1

Error exporting into ova
by Gianluca Cecchi 31 Aug '20

31 Aug '20

Hello, I'm playing with export_vm_as_ova.py downloaded from the examples github: https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/export_v… My environment is oVirt 4.3.3.7 with iSCSI storage domain. It fails leaving an ova.tmp file In webadmin gui: Starting to export Vm enginecopy1 as a Virtual Appliance 7/19/1911:55:12 AM VDSM ov301 command TeardownImageVDS failed: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume fa33df49-b09d-4f86-9719-ede649542c21/0420ef47-0ad0-4cf9-babd-d89383f7536b in use.\']\\nfa33df49-b09d-4f86-9719-ede649542c21/[\'a7480dc5-b5ca-4cb3-986d-77bc12165be4\', \'0420ef47-0ad0-4cf9-babd-d89383f7536b\']",)',) 7/19/1912:25:36 PM Failed to export Vm enginecopy1 as a Virtual Appliance to path /save_ova/base/dump/myvm2.ova on Host ov301 7/19/1912:25:37 PM During export I have this qemu-img process creating the disk over the loop device: root 30878 30871 0 11:55 pts/2 00:00:00 su -p -c qemu-img convert -T none -O qcow2 '/rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b' '/dev/loop1' vdsm vdsm 30882 30878 10 11:55 ? 00:00:00 qemu-img convert -T none -O qcow2 /rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b /dev/loop1 The ova.tmp file is getting filled while command runs eg: [root@ov301 ]# du -sh /save_ova/base/dump/myvm2.ova.tmp 416M /save_ova/base/dump/myvm2.ova.tmp [root@ov301 sysctl.d]# [root@ov301 sysctl.d]# du -sh /save_ova/base/dump/myvm2.ova.tmp 911M /save_ova/base/dump/myvm2.ova.tmp [root@ov301 ]# and the final generated / not completed file is in this state: [root@ov301 ]# qemu-img info /save_ova/base/dump/myvm2.ova.tmp image: /save_ova/base/dump/myvm2.ova.tmp file format: raw virtual size: 30G (32217446400 bytes) disk size: 30G [root@ov301 sysctl.d]# But I notice that the timestamp of the file is about 67 minutes after start of job and well after the notice of its failure.... [root@ov301 sysctl.d]# ll /save_ova/base/dump/ total 30963632 -rw-------. 1 root root 32217446400 Jul 19 13:02 myvm2.ova.tmp [root@ov301 sysctl.d]# [root@ov301 sysctl.d]# du -sh /save_ova/base/dump/myvm2.ova.tmp 30G /save_ova/base/dump/myvm2.ova.tmp [root@ov301 sysctl.d]# In engine.log the first error I see is 30 minutes after start 2019-07-19 12:25:31,563+02 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] Ansible playbook execution failed: Timeout occurred while executing Ansible playbook. 2019-07-19 12:25:31,563+02 INFO [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] Ansible playbook command has exited with value: 1 2019-07-19 12:25:31,564+02 ERROR [org.ovirt.engine.core.bll.CreateOvaCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] Failed to create OVA. Please check logs for more details: /var/log/ovirt-engine/ova/ovirt-export-ova-ansible-20190719115531-ov301-2001ddf4.log 2019-07-19 12:25:31,565+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.TeardownImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] START, TeardownImageVDSCommand(HostName = ov301, ImageActionsVDSCommandParameters:{hostId='8ef1ce6f-4e38-486c-b3a4-58235f1f1d06'}), log id: 3d2246f7 2019-07-19 12:25:36,569+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov301 command TeardownImageVDS failed: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume fa33df49-b09d-4f86-9719-ede649542c21/0420ef47-0ad0-4cf9-babd-d89383f7536b in use.\']\\nfa33df49-b09d-4f86-9719-ede649542c21/[\'a7480dc5-b5ca-4cb3-986d-77bc12165be4\', \'0420ef47-0ad0-4cf9-babd-d89383f7536b\']",)',) In ansible playbook suggested log file I don't see anything useful. It ends with timestamps when the script has been launched. Last lines are: 2019-07-19 11:55:33,877 p=5699 u=ovirt | TASK [ovirt-ova-export-pre-pack : Retrieving the temporary path for the OVA file] *** 2019-07-19 11:55:34,198 p=5699 u=ovirt | changed: [ov301] => { "changed": true, "dest": "/save_ova/base/dump/myvm2.ova.tmp", "gid": 0, "group": "root", "mode": "0600", "owner": "root", "secontext": "system_u:object_r:nfs_t:s0", "size": 32217446912, "state": "file", "uid": 0 } 2019-07-19 11:55:34,204 p=5699 u=ovirt | TASK [ovirt-ova-pack : Run packing script] ************************************* It seems 30 minutes... for timeout? About what, ansible job? Or possibly implicit user session created when running the python script? The snapshot has been correctly deleted (as I see also in engine.log), I don't see it in webadmin gui. Any known problem? Just for test I executed again at 14:24 and I see same Ansible error at 14:54 The snapshot gets deleted, while the qemu-img command still continues.... [root@ov301 sysctl.d]# ps -ef | grep qemu-img root 13504 13501 0 14:24 pts/1 00:00:00 su -p -c qemu-img convert -T none -O qcow2 '/rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b' '/dev/loop0' vdsm vdsm 13505 13504 3 14:24 ? 00:01:26 qemu-img convert -T none -O qcow2 /rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b /dev/loop0 root 17587 24530 0 15:05 pts/0 00:00:00 grep --color=auto qemu-img [root@ov301 sysctl.d]# [root@ov301 sysctl.d]# du -sh /save_ova/base/dump/myvm2.ova.tmp 24G /save_ova/base/dump/myvm2.ova.tmp [root@ov301 sysctl.d]# ll /save_ova/base/dump/myvm2.ova.tmp -rw-------. 1 root root 32217446400 Jul 19 15:14 /save_ova/base/dump/myvm2.ova.tmp [root@ov301 sysctl.d]# and then continues until image copy completes, but at this time the job has already aborted and so the completion of the ova composition doesn't go ahead... and I remain with the ova.tmp file... How to extend timeout? Thanks in advance, Gianluca

4 11

deprecating export domain?
by Charles Kozler 30 Aug '20

30 Aug '20

Hello, I recently read on this list from a redhat member that export domain is either being deprecated or looking at being deprecated To that end, can you share details? Can you share any notes/postings/bz's that document this? I would imagine something like this would be discussed in larger audience This seems like a somewhat significant change to make and I am curious where this is scheduled? Currently, a lot of my backups rely explicitly on an export domain for online snapshots, so I'd like to plan accordingly Thanks!

11 21

Upgrade Memory of oVirt Nodes
by souvaliotimaria＠mail.com 19 Aug '20

19 Aug '20

Hello everyone, I have an oVirt 4.3.2.5 hyperconverged 3 node production environment and we want to add some RAM to it. Can I upgrade the RAM without my users noticing any disruptions and keep the VMs running? The way I thought I should do it was to migrate any running VMs to the other nodes, then set one node in maintenance mode, shut it down, place the new memory, bring it back up, remove it from maintenance mode and see how the installation reacts and repeat for the other two nodes. Is this correct or should I follow another way? Will there be a problem during the time when the nodes will not be identical in their resources? Thank you for your time, Souvalioti Maria

4 5

Support for Shared SAS storage
by Vinícius Ferrão 13 Aug '20

13 Aug '20

Hello, I’ve two compute nodes with SAS Direct Attached sharing the same disks. Looking at the supported types I can’t see this on the documentation: https://www.ovirt.org/documentation/admin-guide/chap-Storage.html There’s is local storage on this documentation, but my case is two machines, both using SAS, connected to the same machines. It’s the VRTX hardware from Dell. Is there any support for this? It should be just like Fibre Channel and iSCSI, but with SAS instead. Thanks,

7 11