Lots of storage.MailBox.SpmMailMonitor
by Fabrice Bacchella
My vdsm log files are huge:
-rw-r--r-- 1 vdsm kvm 1.8G Nov 22 11:32 vdsm.log
And this is juste half an hour of logs:
$ head -1 vdsm.log
2018-11-22 11:01:12,132+0100 ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 2 checksum failed, not clearing mailbox, clearing new mail (data='...lots of data', expected='\xa4\x06\x08\x00') (mailbox:612)
I just upgraded vdsm:
$ rpm -qi vdsm
Name : vdsm
Version : 4.20.43
1 year, 2 months
Template for Ubuntu 18.04 Server Issues
by jeremy_tourville@hotmail.com
I have built a system as a template on oVirt. Specifically, Ubuntu 18.04 server.
I am noticing an issue when creating new vms from that template. I used the check box for "seal template" when creating the template.
When I create a new Ubuntu VM I am getting duplicate IP addresses for all the machines created from the template.
It seems like the checkbox doesn't fully function as intended. I would need to do further manual steps to clear up this issue.
Has anyone else noticed this behavior? Is this expected or have I missed something?
Thanks for your input!
1 year, 7 months
Libgfapi considerations
by Jayme
Are there currently any known issues with using libgfapi in the latest
stable version of ovirt in hci deployments? I have recently enabled it and
have noticed a significant (over 4x) increase in io performance on my vms.
I’m concerned however since it does not seem to be an ovirt default
setting. Is libgfapi considered safe and stable to use in ovirt 4.3 hci?
1 year, 9 months
poweroff and reboot with ovirt_vm ansible module
by Nathanaël Blanchet
Hello, is there a way to poweroff or reboot (without stopped and running
state) a vm with the ovirt_vm ansible module?
--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
1 year, 10 months
supervdsm failing during network_caps
by Alan G
Hi,
I have issues with one host where supervdsm is failing in network_caps.
I see the following trace in the log.
MainProcess|jsonrpc/1::ERROR::2020-01-06 03:01:05,558::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) Error in network_caps
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 98, in wrapper
res = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 56, in network_caps
return netswitch.configurator.netcaps(compatibility=30600)
File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 317, in netcaps
net_caps = netinfo(compatibility=compatibility)
File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 325, in netinfo
_netinfo = netinfo_get(vdsmnets, compatibility)
File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 150, in get
return _stringify_mtus(_get(vdsmnets))
File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 59, in _get
ipaddrs = getIpAddrs()
File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/addresses.py", line 72, in getIpAddrs
for addr in nl_addr.iter_addrs():
File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/addr.py", line 33, in iter_addrs
with _nl_addr_cache(sock) as addr_cache:
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/__init__.py", line 92, in _cache_manager
cache = cache_allocator(sock)
File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/libnl.py", line 469, in rtnl_addr_alloc_cache
raise IOError(-err, nl_geterror(err))
IOError: [Errno 16] Message sequence number mismatch
A restart of supervdsm will resolve the issue for a period, maybe 24 hours, then it will occur again. So I'm thinking it's resource exhaustion or a leak of some kind?
Running 4.2.8.2 with VDSM at 4.20.46.
I've had a look through the bugzilla and can't find an exact match, closest was this one https://bugzilla.redhat.com/show_bug.cgi?id=1666123 which seems to be a RHV only fix.
Thanks,
Alan
2 years
OVN and change of mgmt network
by Gianluca Cecchi
Hello,
I previously had OVN running on engine (as OVN provider with northd and
northbound and southbound DBs) and hosts (with OVN controller).
After changing mgmt ip of hosts (engine has retained instead the same ip),
I executed again on them the command:
vdsm-tool ovn-config <ip_of_engine> <nel_local_ip_of_host>
Now I think I have to clean up some things, eg:
1) On engine
where I get these lines below
systemctl status ovn-northd.service -l
. . .
Sep 29 14:41:42 ovmgr1 ovsdb-server[940]: ovs|00005|reconnect|ERR|tcp:
10.4.167.40:37272: no response to inactivity probe after 5 seconds,
disconnecting
Oct 03 11:52:00 ovmgr1 ovsdb-server[940]: ovs|00006|reconnect|ERR|tcp:
10.4.167.41:52078: no response to inactivity probe after 5 seconds,
disconnecting
The two IPs are the old ones of two hosts
It seems that a restart of the services has fixed...
Can anyone confirm if I have to do anything else?
2) On hosts (there are 3 hosts with OVN on ip 10.4.192.32/33/34)
where I currently have this output
[root@ov301 ~]# ovs-vsctl show
3a38c5bb-0abf-493d-a2e6-345af8aedfe3
Bridge br-int
fail_mode: secure
Port "ovn-1dce5b-0"
Interface "ovn-1dce5b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.32"}
Port "ovn-ddecf0-0"
Interface "ovn-ddecf0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.33"}
Port "ovn-fd413b-0"
Interface "ovn-fd413b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.168.74"}
Port br-int
Interface br-int
type: internal
ovs_version: "2.7.2"
[root@ov301 ~]#
The IPs of kind 10.4.192.x are ok.
But there is a left-over of an old host I initially used for tests,
corresponding to 10.4.168.74, that now doesn't exist anymore
How can I clean records for 1) and 2)?
Thanks,
Gianluca
2 years, 2 months
encrypted GENEVE traffic
by Pavel Nakonechnyi
Dear oVirt Community,
From my understanding oVirt does not support Open vSwitch IPSEC tunneling for GENEVE traffic (which is described on pages http://docs.openvswitch.org/en/latest/howto/ipsec/ and http://docs.openvswitch.org/en/latest/tutorials/ipsec/).
Are there plans to introduce such support? (or explicitly not to..)
Is it possible to somehow manually configure such tunneling for existing virtual networks? (even in a limited way)
Alternatively, is it possible to deploy oVirt on top of the tunneled (i.e. via VXLAN, IPSec) interfaces? This will allow to encrypt all management traffic.
Such requirement arises when using oVirt deployment on third-party premises with untrusted network.
Thank in advance for any clarifications. :)
--
WBR, Pavel
+32478910884
2 years, 2 months
"gluster-ansible-roles is not installed on Host" error on Cockpit
by Hesham Ahmed
On a new 4.3.1 oVirt Node installation, when trying to deploy HCI
(also when trying adding a new gluster volume to existing clusters)
using Cockpit, an error is displayed "gluster-ansible-roles is not
installed on Host. To continue deployment, please install
gluster-ansible-roles on Host and try again". There is no package
named gluster-ansible-roles in the repositories:
[root@localhost ~]# yum install gluster-ansible-roles
Loaded plugins: enabled_repos_upload, fastestmirror, imgbased-persist,
package_upload, product-id, search-disabled-repos,
subscription-manager, vdsmupgrade
This system is not registered with an entitlement server. You can use
subscription-manager to register.
Loading mirror speeds from cached hostfile
* ovirt-4.3-epel: mirror.horizon.vn
No package gluster-ansible-roles available.
Error: Nothing to do
Uploading Enabled Repositories Report
Cannot upload enabled repos report, is this client registered?
This is due to check introduced here:
https://gerrit.ovirt.org/#/c/98023/1/dashboard/src/helpers/AnsibleUtil.js
Changing the line from:
[ "rpm", "-qa", "gluster-ansible-roles" ], { "superuser":"require" }
to
[ "rpm", "-qa", "gluster-ansible" ], { "superuser":"require" }
resolves the issue. The above code snippet is installed at
/usr/share/cockpit/ovirt-dashboard/app.js on oVirt node and can be
patched by running "sed -i 's/gluster-ansible-roles/gluster-ansible/g'
/usr/share/cockpit/ovirt-dashboard/app.js && systemctl restart
cockpit"
2 years, 4 months
Error exporting into ova
by Gianluca Cecchi
Hello,
I'm playing with export_vm_as_ova.py downloaded from the examples github:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/export...
My environment is oVirt 4.3.3.7 with iSCSI storage domain.
It fails leaving an ova.tmp file
In webadmin gui:
Starting to export Vm enginecopy1 as a Virtual Appliance
7/19/1911:55:12 AM
VDSM ov301 command TeardownImageVDS failed: Cannot deactivate Logical
Volume: ('General Storage Exception: ("5 [] [\' Logical volume
fa33df49-b09d-4f86-9719-ede649542c21/0420ef47-0ad0-4cf9-babd-d89383f7536b
in
use.\']\\nfa33df49-b09d-4f86-9719-ede649542c21/[\'a7480dc5-b5ca-4cb3-986d-77bc12165be4\',
\'0420ef47-0ad0-4cf9-babd-d89383f7536b\']",)',)
7/19/1912:25:36 PM
Failed to export Vm enginecopy1 as a Virtual Appliance to path
/save_ova/base/dump/myvm2.ova on Host ov301
7/19/1912:25:37 PM
During export I have this qemu-img process creating the disk over the loop
device:
root 30878 30871 0 11:55 pts/2 00:00:00 su -p -c qemu-img convert
-T none -O qcow2
'/rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b'
'/dev/loop1' vdsm
vdsm 30882 30878 10 11:55 ? 00:00:00 qemu-img convert -T none -O
qcow2
/rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b
/dev/loop1
The ova.tmp file is getting filled while command runs
eg:
[root@ov301 ]# du -sh /save_ova/base/dump/myvm2.ova.tmp
416M /save_ova/base/dump/myvm2.ova.tmp
[root@ov301 sysctl.d]#
[root@ov301 sysctl.d]# du -sh /save_ova/base/dump/myvm2.ova.tmp
911M /save_ova/base/dump/myvm2.ova.tmp
[root@ov301 ]#
and the final generated / not completed file is in this state:
[root@ov301 ]# qemu-img info /save_ova/base/dump/myvm2.ova.tmp
image: /save_ova/base/dump/myvm2.ova.tmp
file format: raw
virtual size: 30G (32217446400 bytes)
disk size: 30G
[root@ov301 sysctl.d]#
But I notice that the timestamp of the file is about 67 minutes after start
of job and well after the notice of its failure....
[root@ov301 sysctl.d]# ll /save_ova/base/dump/
total 30963632
-rw-------. 1 root root 32217446400 Jul 19 13:02 myvm2.ova.tmp
[root@ov301 sysctl.d]#
[root@ov301 sysctl.d]# du -sh /save_ova/base/dump/myvm2.ova.tmp
30G /save_ova/base/dump/myvm2.ova.tmp
[root@ov301 sysctl.d]#
In engine.log the first error I see is 30 minutes after start
2019-07-19 12:25:31,563+02 ERROR
[org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor]
(EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] Ansible
playbook execution failed: Timeout occurred while executing Ansible
playbook.
2019-07-19 12:25:31,563+02 INFO
[org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor]
(EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] Ansible
playbook command has exited with value: 1
2019-07-19 12:25:31,564+02 ERROR
[org.ovirt.engine.core.bll.CreateOvaCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] Failed to
create OVA. Please check logs for more details:
/var/log/ovirt-engine/ova/ovirt-export-ova-ansible-20190719115531-ov301-2001ddf4.log
2019-07-19 12:25:31,565+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.TeardownImageVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] START,
TeardownImageVDSCommand(HostName = ov301,
ImageActionsVDSCommandParameters:{hostId='8ef1ce6f-4e38-486c-b3a4-58235f1f1d06'}),
log id: 3d2246f7
2019-07-19 12:25:36,569+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-64) [2001ddf4] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov301 command TeardownImageVDS
failed: Cannot deactivate Logical Volume: ('General Storage Exception: ("5
[] [\' Logical volume
fa33df49-b09d-4f86-9719-ede649542c21/0420ef47-0ad0-4cf9-babd-d89383f7536b
in
use.\']\\nfa33df49-b09d-4f86-9719-ede649542c21/[\'a7480dc5-b5ca-4cb3-986d-77bc12165be4\',
\'0420ef47-0ad0-4cf9-babd-d89383f7536b\']",)',)
In ansible playbook suggested log file I don't see anything useful.
It ends with timestamps when the script has been launched.
Last lines are:
2019-07-19 11:55:33,877 p=5699 u=ovirt | TASK [ovirt-ova-export-pre-pack :
Retrieving the temporary path for the OVA file] ***
2019-07-19 11:55:34,198 p=5699 u=ovirt | changed: [ov301] => {
"changed": true,
"dest": "/save_ova/base/dump/myvm2.ova.tmp",
"gid": 0,
"group": "root",
"mode": "0600",
"owner": "root",
"secontext": "system_u:object_r:nfs_t:s0",
"size": 32217446912,
"state": "file",
"uid": 0
}
2019-07-19 11:55:34,204 p=5699 u=ovirt | TASK [ovirt-ova-pack : Run
packing script] *************************************
It seems 30 minutes... for timeout? About what, ansible job?
Or possibly implicit user session created when running the python script?
The snapshot has been correctly deleted (as I see also in engine.log), I
don't see it in webadmin gui.
Any known problem?
Just for test I executed again at 14:24 and I see same Ansible error at
14:54
The snapshot gets deleted, while the qemu-img command still continues....
[root@ov301 sysctl.d]# ps -ef | grep qemu-img
root 13504 13501 0 14:24 pts/1 00:00:00 su -p -c qemu-img convert
-T none -O qcow2
'/rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b'
'/dev/loop0' vdsm
vdsm 13505 13504 3 14:24 ? 00:01:26 qemu-img convert -T none -O
qcow2
/rhev/data-center/mnt/blockSD/fa33df49-b09d-4f86-9719-ede649542c21/images/59a4a324-4c99-4ff5-abb1-e9bbac83292a/0420ef47-0ad0-4cf9-babd-d89383f7536b
/dev/loop0
root 17587 24530 0 15:05 pts/0 00:00:00 grep --color=auto qemu-img
[root@ov301 sysctl.d]#
[root@ov301 sysctl.d]# du -sh /save_ova/base/dump/myvm2.ova.tmp
24G /save_ova/base/dump/myvm2.ova.tmp
[root@ov301 sysctl.d]# ll /save_ova/base/dump/myvm2.ova.tmp
-rw-------. 1 root root 32217446400 Jul 19 15:14
/save_ova/base/dump/myvm2.ova.tmp
[root@ov301 sysctl.d]#
and then continues until image copy completes, but at this time the job has
already aborted and so the completion of the ova composition doesn't go
ahead... and I remain with the ova.tmp file...
How to extend timeout?
Thanks in advance,
Gianluca
2 years, 7 months