supervdsm failing during network_caps
by Alan G
Hi,
I have issues with one host where supervdsm is failing in network_caps.
I see the following trace in the log.
MainProcess|jsonrpc/1::ERROR::2020-01-06 03:01:05,558::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) Error in network_caps
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 98, in wrapper
res = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 56, in network_caps
return netswitch.configurator.netcaps(compatibility=30600)
File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 317, in netcaps
net_caps = netinfo(compatibility=compatibility)
File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 325, in netinfo
_netinfo = netinfo_get(vdsmnets, compatibility)
File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 150, in get
return _stringify_mtus(_get(vdsmnets))
File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 59, in _get
ipaddrs = getIpAddrs()
File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/addresses.py", line 72, in getIpAddrs
for addr in nl_addr.iter_addrs():
File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/addr.py", line 33, in iter_addrs
with _nl_addr_cache(sock) as addr_cache:
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/__init__.py", line 92, in _cache_manager
cache = cache_allocator(sock)
File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/libnl.py", line 469, in rtnl_addr_alloc_cache
raise IOError(-err, nl_geterror(err))
IOError: [Errno 16] Message sequence number mismatch
A restart of supervdsm will resolve the issue for a period, maybe 24 hours, then it will occur again. So I'm thinking it's resource exhaustion or a leak of some kind?
Running 4.2.8.2 with VDSM at 4.20.46.
I've had a look through the bugzilla and can't find an exact match, closest was this one https://bugzilla.redhat.com/show_bug.cgi?id=1666123 which seems to be a RHV only fix.
Thanks,
Alan
2 years, 6 months
OVN and change of mgmt network
by Gianluca Cecchi
Hello,
I previously had OVN running on engine (as OVN provider with northd and
northbound and southbound DBs) and hosts (with OVN controller).
After changing mgmt ip of hosts (engine has retained instead the same ip),
I executed again on them the command:
vdsm-tool ovn-config <ip_of_engine> <nel_local_ip_of_host>
Now I think I have to clean up some things, eg:
1) On engine
where I get these lines below
systemctl status ovn-northd.service -l
. . .
Sep 29 14:41:42 ovmgr1 ovsdb-server[940]: ovs|00005|reconnect|ERR|tcp:
10.4.167.40:37272: no response to inactivity probe after 5 seconds,
disconnecting
Oct 03 11:52:00 ovmgr1 ovsdb-server[940]: ovs|00006|reconnect|ERR|tcp:
10.4.167.41:52078: no response to inactivity probe after 5 seconds,
disconnecting
The two IPs are the old ones of two hosts
It seems that a restart of the services has fixed...
Can anyone confirm if I have to do anything else?
2) On hosts (there are 3 hosts with OVN on ip 10.4.192.32/33/34)
where I currently have this output
[root@ov301 ~]# ovs-vsctl show
3a38c5bb-0abf-493d-a2e6-345af8aedfe3
Bridge br-int
fail_mode: secure
Port "ovn-1dce5b-0"
Interface "ovn-1dce5b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.32"}
Port "ovn-ddecf0-0"
Interface "ovn-ddecf0-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.192.33"}
Port "ovn-fd413b-0"
Interface "ovn-fd413b-0"
type: geneve
options: {csum="true", key=flow, remote_ip="10.4.168.74"}
Port br-int
Interface br-int
type: internal
ovs_version: "2.7.2"
[root@ov301 ~]#
The IPs of kind 10.4.192.x are ok.
But there is a left-over of an old host I initially used for tests,
corresponding to 10.4.168.74, that now doesn't exist anymore
How can I clean records for 1) and 2)?
Thanks,
Gianluca
2 years, 8 months
CentOS Stream support
by Michal Skrivanek
Hi all,
we would like to ask about interest in community about oVirt moving to CentOS Stream.
There were some requests before but it’s hard to see how many people would really like to see that.
With CentOS releases lagging behind RHEL for months it’s interesting to consider moving to CentOS Stream as it is much more up to date and allows us to fix bugs faster, with less workarounds and overhead for maintaining old code. E.g. our current integration tests do not really pass on CentOS 8.1 and we can’t really do much about that other than wait for more up to date packages. It would also bring us closer to make oVirt run smoothly on RHEL as that is also much closer to Stream than it is to outdated CentOS.
So..would you like us to support CentOS Stream?
We don’t really have capacity to run 3 different platforms, would you still want oVirt to support CentOS Stream if it means “less support” for regular CentOS?
There are some concerns about Stream being a bit less stable, do you share those concerns?
Thank you for your comments,
michal
2 years, 8 months
encrypted GENEVE traffic
by Pavel Nakonechnyi
Dear oVirt Community,
From my understanding oVirt does not support Open vSwitch IPSEC tunneling for GENEVE traffic (which is described on pages http://docs.openvswitch.org/en/latest/howto/ipsec/ and http://docs.openvswitch.org/en/latest/tutorials/ipsec/).
Are there plans to introduce such support? (or explicitly not to..)
Is it possible to somehow manually configure such tunneling for existing virtual networks? (even in a limited way)
Alternatively, is it possible to deploy oVirt on top of the tunneled (i.e. via VXLAN, IPSec) interfaces? This will allow to encrypt all management traffic.
Such requirement arises when using oVirt deployment on third-party premises with untrusted network.
Thank in advance for any clarifications. :)
--
WBR, Pavel
+32478910884
2 years, 8 months
oVirt 4.4: Self-hosted engine deployment fails with backup restore from 4.3 engine
by Oliver Leinfelder
Hi there,
I'm a bit puzzled about an possible upgrade paths from a 4.3 cluster to
version 4.4 in a self-hosted engine environment.
My idea was:
Set up a new host with a clean ovirt node 4.4 installation, then deploy the
hosted engine on this with a restored backup from the production cluster
and go from there.
This however fails with the following error:
2020-05-27 00:17:08,886+0200 DEBUG
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:103 {'msg': 'non-zero return code', 'cmd':
['engine-setup', '--accept-defaults',
'--config-append=/root/ovirt-engine-answers'], 'stdout': "[ INFO ] Stage:
Initializing\n[ INFO ] Stage: Environment setup\n C
onfiguration files: /etc/ovirt-engine-setup.conf.d/10-packaging-jboss.conf,
/etc/ovirt-engine-setup.conf.d/10-packaging.conf,
/etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf,
/root/ovirt-engine-answers\n Log file:
/var/log/ovirt-engine/setup/ovirt-engine-setup-20200527001657-fyeueu.log\n
Version: otop
i-1.9.1 (otopi-1.9.1-1.el8)\n[ INFO ] DNF Downloading 1 files, 0.00KB\n[
INFO ] DNF Downloaded CentOS-8 - AppStream\n[ INFO ] DNF Downloading 1
files, 0.00KB\n[ INFO ] DNF Downloaded CentOS-8 - Base\n[ INFO ] DNF
Downloading 1 files, 0.00KB\n
[...]
... anwsers from backup config follow ....
[...]
2020-05-27 00:17:12,396+0200 DEBUG otopi.context context._executeMethod:145
method exception
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in
_executeMethod
method['method']()
File
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py",
line 403, in _closeup
r = ah.run()
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py",
line 229, in run
raise RuntimeError(_('Failed executing ansible-playbook'))
Is this approach (restoring from 4.3) generally supposed to work? If not,
what is the appropriate upgrade path?
Thank you!
Regards
Oli
2 years, 9 months
"gluster-ansible-roles is not installed on Host" error on Cockpit
by Hesham Ahmed
On a new 4.3.1 oVirt Node installation, when trying to deploy HCI
(also when trying adding a new gluster volume to existing clusters)
using Cockpit, an error is displayed "gluster-ansible-roles is not
installed on Host. To continue deployment, please install
gluster-ansible-roles on Host and try again". There is no package
named gluster-ansible-roles in the repositories:
[root@localhost ~]# yum install gluster-ansible-roles
Loaded plugins: enabled_repos_upload, fastestmirror, imgbased-persist,
package_upload, product-id, search-disabled-repos,
subscription-manager, vdsmupgrade
This system is not registered with an entitlement server. You can use
subscription-manager to register.
Loading mirror speeds from cached hostfile
* ovirt-4.3-epel: mirror.horizon.vn
No package gluster-ansible-roles available.
Error: Nothing to do
Uploading Enabled Repositories Report
Cannot upload enabled repos report, is this client registered?
This is due to check introduced here:
https://gerrit.ovirt.org/#/c/98023/1/dashboard/src/helpers/AnsibleUtil.js
Changing the line from:
[ "rpm", "-qa", "gluster-ansible-roles" ], { "superuser":"require" }
to
[ "rpm", "-qa", "gluster-ansible" ], { "superuser":"require" }
resolves the issue. The above code snippet is installed at
/usr/share/cockpit/ovirt-dashboard/app.js on oVirt node and can be
patched by running "sed -i 's/gluster-ansible-roles/gluster-ansible/g'
/usr/share/cockpit/ovirt-dashboard/app.js && systemctl restart
cockpit"
2 years, 10 months
ovirt-imageio-proxy not working after updating SSL certificates with a wildcard cert issued by AlphaSSL (intermediate)
by Lynn Dixon
All,
I recently bought a wildcard certificate for my lab domain (shadowman.dev)
and I replaced all the certs on my RHV4.3 machine per our documentation.
The WebUI presents the certs successfully and without any issues, and
everything seemed to be fine, until I tried to upload a disk image (or an
ISO) to my storage domain. I get this error in the events tab:
https://share.getcloudapp.com/p9uPvegx
[image: image.png]
I also see that the disk is showing up in my storage domain, but its
showing "Paused by System" and I can't do anything with it. I cant even
delete it!
I have tried following this document to fix the issue, but it didn't work:
https://access.redhat.com/solutions/4148361
I am seeing this error pop into my engine.log:
https://pastebin.com/kDLSEq1A
And I see this error in my image-proxy.log:
WARNING 2020-07-24 15:26:34,802 web:137:web:(log_error) ERROR [172.17.0.30]
PUT /tickets/ [403] Error verifying signed ticket: Invalid ovirt ticket
(data='------my_ticket_data-----', reason=Untrusted certificate)
[request=0.002946/1]
Now, when I bought my wildcard, I was given a root certificate for the CA,
as well as a separate intermediate CA certificate from the provider.
Likewise, they gave me a certificate and a private key of course. The root
and intermediate CA's certificates have been added
to /etc/pki/ca-trust/source/anchors/ and I did an update-ca-trust.
I also started experiencing issues with the ovpn network provider at the
same time I replaced the SSL certs, but I disregarded it at the time, but
now I am thinking its related. Any advice on what to look for to fix the
ovirt-imageio-proxy?
Thanks!
*Lynn Dixon* | Red Hat Certified Architect #100-006-188
*Solutions Architect* | NA Commercial
Google Voice: 423-618-1414
Cell/Text: 423-774-3188
Click here to view my Certification Portfolio <http://red.ht/1XMX2Mi>
2 years, 10 months
VM with illegal snapshots
by Giorgio Biacchi
Hi,
due to a bug in our Ovirt integrated backup system now we have some VMs
with snapshots in illegal state.
It seems that there's an inconsistency between the db and the real
status of images on disk.
Let me show an example:
engine=# select
image_guid,parentid,imagestatus,vm_snapshot_id,volume_type,volume_format,active
from images where image_group_id='e34f77cb-54d5-40d0-b539-e0a5fd512d2d';
image_guid | parentid |
imagestatus | vm_snapshot_id | volume_type |
volume_format | active
--------------------------------------+--------------------------------------+-------------+--------------------------------------+-------------+---------------+--------
a107b6c4-842e-4b40-9215-c965431a0c0f |
00000000-0000-0000-0000-000000000000 | 4 |
d19d6ca3-1989-4c67-8ee7-c0c43b3e6d74 | 2 | 4 | f
a4c86a68-9123-454c-b417-1b15038a4bf2 |
a107b6c4-842e-4b40-9215-c965431a0c0f | 1 |
e7a405ee-8fd4-4733-ae9c-5252bf07c9d3 | 2 | 4 | f
f6a61f2e-26bd-4b63-97c6-d66913ce48c5 |
a4c86a68-9123-454c-b417-1b15038a4bf2 | 1 |
9d0958b9-4995-4e11-a027-a32d4bac52e4 | 2 | 4 | t
(3 rows)
[root@host02 ~]# lvs -o+lv_tags |grep e34f77cb-54d5-40d0-b539-e0a5fd512d2d
a107b6c4-842e-4b40-9215-c965431a0c0f
459011cf-ebb6-46ff-831d-8ccfafd82c8a -wi------- 149.50g
IU_e34f77cb-54d5-40d0-b539-e0a5fd512d2d,MD_68,PU_00000000-0000-0000-0000-000000000000
f6a61f2e-26bd-4b63-97c6-d66913ce48c5
459011cf-ebb6-46ff-831d-8ccfafd82c8a -wi------- 10.00g
IU_e34f77cb-54d5-40d0-b539-e0a5fd512d2d,MD_348,PU_a107b6c4-842e-4b40-9215-c965431a0c0f
so image guid a4c86a68-9123-454c-b417-1b15038a4bf2 is not present on
disk, i think that the image was correctly merged but not removed from
the database.
Any suggestion on how to fix the database to reflect the real situation
on disk??
TIA
--
gb
PGP Key: http://pgp.mit.edu/
Primary key fingerprint: C510 0765 943E EBED A4F2 69D3 16CC DC90 B9CB 0F34
2 years, 10 months
rhv 4.3.11 but not ovirt 4.3.11
by Gianluca Cecchi
Hello,
I see that a 4.3.11 version is available for downstream RHV but not for
upstream oVirt.
Will it be released for oVirt too?
Are there only security fixes inside or also bug fixes (like the few lines
one related to export as OVA)?
Thanks,
Gianluca
2 years, 10 months