Vdsm: ovirt-4.4.8 branch created
by Milan Zamazal
Hi,
ovirt-4.4.8 branch has been created for 4.4.8 backports. You can merge
4.4.9 stuff to master now.
I'm aware about the current CI problems and that we may need fixes for
the new branch. Once the CI problems are resolved we can decide what
contingent fixes to apply in ovirt-4.4.8 branch.
Regards,
Milan
3 years, 3 months
Vdsm 4.4.8 branch?
by Milan Zamazal
Hi,
we are after code freeze for 4.4.8 now. Can we create a 4.4.8 branch in
Vdsm now and start 4.4.9 development on master or does anybody need to
postpone the branch?
Thanks,
Milan
3 years, 3 months
[VDSM] CI fails in QemuGuestAgentTests.test_pci_devices
by Nir Soffer
VDSM CI is failing now consistently. Last related change is:
https://gerrit.ovirt.org/c/vdsm/+/116090
Build:
https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29212//artifact/c...
_____________________ QemuGuestAgentTests.test_pci_devices _____________________
self = <virt.qemuguestagent_test.QemuGuestAgentTests
testMethod=test_pci_devices>
def test_pci_devices(self):
devices = self.qga_poller._qga_call_get_devices(self.vm)['pci_devices']
# Ethernet is returned twice by the agent but should appear only
# once in the list
assert len(devices) == 2
eth = [d for d in devices if d['device_id'] == 4096][0]
assert eth == {
'device_id': 4096,
'driver_date': '2019-08-12',
'driver_name': 'Red Hat VirtIO Ethernet Adapter',
'driver_version': '100.80.104.17300',
'vendor_id': 6900,
}
balloon = [d for d in devices if d['device_id'] == 4098][0]
> assert balloon == {
'device_id': 4098,
'driver_date': '2019-08-12',
'driver_name': 'VirtIO Balloon Driver',
'driver_version': '100.80.104.17300',
'vendor_id': 6900,
}
E AssertionError: assert {'device_id':...4.17300', ...} ==
{'device_id':...4.17300', ...}
E Omitting 4 identical items, use -vv to show
E Differing items:
E {'driver_date': '2019-08-11'} != {'driver_date': '2019-08-12'}
Are we using real time in this test?
E Use -v to get the full diff
3 years, 3 months
OST HE: Engine VM went down due to cpu load (was: [oVirt Jenkins] ovirt-system-tests_he-basic-suite-master - Build # 2126 - Failure!)
by Yedidyah Bar David
On Tue, Aug 3, 2021 at 7:50 AM <jenkins(a)jenkins.phx.ovirt.org> wrote:
>
> Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
> Build: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/2126/
> Build Number: 2126
> Build Status: Failure
> Triggered By: Started by timer
>
> -------------------------------------
> Changes Since Last Success:
> -------------------------------------
> Changes for Build #2126
> [Michal Skrivanek] basic: skipping just the VNC console part of test_virtual_machines
>
>
>
>
> -----------------
> Failed Tests:
> -----------------
> 2 tests failed.
> FAILED: he-basic-suite-master.test-scenarios.test_012_local_maintenance_sdk.test_local_maintenance
>
> Error Message:
> ovirtsdk4.Error: Failed to read response: [(<pycurl.Curl object at 0x5555faf11228>, 7, 'Failed to connect to 192.168.200.99 port 443: Connection refused')]
This looks very similar to the issue we have with dns/dig failures
that cause the engine VM to go down, and it's similar, but different.
dig didn't fail (it now uses TCP), but something else caused the agent
to stop the engine VM - a combination of high cpu load and low free
memory, after restarting the engine VM as part of test_008.
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/21...
:
=====================================================================================
MainThread::INFO::2021-08-03
06:46:55,068::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state ReinitializeFSM (score: 0)
MainThread::INFO::2021-08-03
06:47:04,089::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:47:04,169::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition
(ReinitializeFSM-GlobalMaintenance) sent? ignored
MainThread::INFO::2021-08-03
06:47:05,249::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 3400)
MainThread::INFO::2021-08-03
06:47:14,439::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:47:25,526::states::176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 814 due to cpu load
MainThread::INFO::2021-08-03
06:47:25,527::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2586)
MainThread::INFO::2021-08-03
06:47:25,537::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:47:26,029::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2586)
MainThread::INFO::2021-08-03
06:47:35,050::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:47:35,576::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2586)
MainThread::INFO::2021-08-03
06:47:45,597::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:47:46,521::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2586)
MainThread::INFO::2021-08-03
06:47:55,577::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:47:56,559::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2586)
MainThread::INFO::2021-08-03
06:47:56,559::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Best remote host ost-he-basic-suite-master-host-1 (id: 2, score: 3400)
MainThread::INFO::2021-08-03
06:48:05,633::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:48:06,188::states::176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 820 due to cpu load
MainThread::INFO::2021-08-03
06:48:06,188::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2580)
MainThread::INFO::2021-08-03
06:48:16,256::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:48:16,950::states::176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 831 due to cpu load
MainThread::INFO::2021-08-03
06:48:16,951::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2569)
MainThread::INFO::2021-08-03
06:48:26,053::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:48:26,999::states::176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 839 due to cpu load
MainThread::INFO::2021-08-03
06:48:26,999::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2561)
MainThread::INFO::2021-08-03
06:48:36,026::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:48:36,802::states::176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 844 due to cpu load
MainThread::INFO::2021-08-03
06:48:36,802::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2556)
MainThread::INFO::2021-08-03
06:48:45,827::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
MainThread::INFO::2021-08-03
06:48:46,401::states::176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 849 due to cpu load
MainThread::INFO::2021-08-03
06:48:46,401::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 2551)
MainThread::INFO::2021-08-03
06:48:56,588::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition
(GlobalMaintenance-ReinitializeFSM) sent? ignored
MainThread::INFO::2021-08-03
06:48:58,685::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state ReinitializeFSM (score: 0)
MainThread::INFO::2021-08-03
06:49:05,729::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition
(ReinitializeFSM-EngineStarting) sent? ignored
MainThread::INFO::2021-08-03
06:49:06,146::states::176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Penalizing score by 830 due to cpu load
MainThread::INFO::2021-08-03
06:49:06,146::states::72::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_penalize_memory)
Penalizing score by 400 due to free memory 1782 being lower than
required 3171
MainThread::INFO::2021-08-03
06:49:06,146::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStarting (score: 2170)
MainThread::INFO::2021-08-03
06:49:06,150::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Timeout cleared while transitioning <class
'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class
'ovirt_hosted_engine_ha.agent.states.EngineUp'>
MainThread::INFO::2021-08-03
06:49:06,172::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition
(EngineStarting-EngineUp) sent? ignored
MainThread::INFO::2021-08-03
06:49:06,178::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineUp (score: 2570)
MainThread::ERROR::2021-08-03
06:49:16,197::states::398::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Host ost-he-basic-suite-master-host-1 (id 2) score is significantly
better than local score, shutting down VM on this host
=====================================================================================
I looked a bit at /var/log/messages of the host, and while there is
quite some noise there, can't tell specifically what might have caused
the high cpu load.
Also looked at logs of previous runs, and there is indeed a similar
pattern, where the cpuload on the host causes the agent to penalize
the score, but according to the agent log the load goes down faster,
up to the point of being quite low when the engine is up, and only
then we exit global maintenance, allowing the agent to take actions.
I now ran it again, but think that otherwise I'll ignore this for now,
unless we see more similar failures. If we do, we might want to
check/monitor/log the cpu load on the hosts, and/or change test_008 to
wait, after the engine is up, until the cpu load on the host goes down
a bit. Before the port to pytest we waited a hard-coded 5 minutes and
I
changed that then to only wait until the engine VM is not migrating,
and this worked more-or-less ok. We might need to refine that,
although I'd rather not introduce another arbitrary long delays but
wait for some condition.
I also noticed that this ran was started at "03:04 AM" (UTC), by
timer, whereas all previous timer-based runs started at 01:55 AM,
somewhat earlier - perhaps there are some other things that run at
that time that cause load.
Best regards,
--
Didi
3 years, 3 months
Migration without shared storage is unsafe (was: Change in ovirt-system-tests[master]: HE: Use node image)
by Yedidyah Bar David
On Sun, Aug 8, 2021 at 5:42 PM Code Review <gerrit(a)ovirt.org> wrote:
>
> From Jenkins CI <jenkins(a)ovirt.org>:
>
> Jenkins CI has posted comments on this change. ( https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 )
>
> Change subject: HE: Use node image
> ......................................................................
>
>
> Patch Set 13: Continuous-Integration-1
>
> Build Failed
While trying to deactivate a host, the engine wanted to migrate a VM
(vm0) from host-0 to host-1. vdsm log of host-0 says:
2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm]
(vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate
(migration:503)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
477, in _regular_run
time.time(), machineParams
File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
578, in _startUnderlyingMigration
self._perform_with_conv_schedule(duri, muri)
File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
667, in _perform_with_conv_schedule
self._perform_migration(duri, muri)
File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
596, in _perform_migration
self._migration_flags)
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line
159, in call
return getattr(self._vm._dom, name)(*a, **kw)
File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py",
line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/function.py",
line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2126, in
migrateToURI3
raise libvirtError('virDomainMigrateToURI3() failed')
libvirt.libvirtError: Unsafe migration: Migration without shared
storage is unsafe
Any idea?
--
Didi
3 years, 3 months
Failing to build ovn2.11 / openvswitch2.11 for el9
by Sandro Bonazzola
Hi, I'm trying to rebuild laters ovn rpm from centos build system for
el8s to copr for el9s.
It fails for me here:
sed -f ./build-aux/extract-odp-netlink-h <
datapath/linux/compat/include/linux/openvswitch.h >
include/odp-netlink.h
sh -f ./build-aux/extract-odp-netlink-macros-h include/odp-netlink.h >
include/odp-netlink-macros.h
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes
/usr/bin/python3 ./ovsdb/ovsdb-idlc.in annotate
./vswitchd/vswitch.ovsschema ./lib/vswitch-idl.ann >
lib/vswitch-idl.ovsidl.tmp && mv lib/vswitch-idl.ovsidl.tmp
lib/vswitch-idl.ovsidl
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes
/usr/bin/python3 ./ovsdb/ovsdb-idlc.in c-idl-source
lib/vswitch-idl.ovsidl > lib/vswitch-idl.c.tmp && mv
lib/vswitch-idl.c.tmp lib/vswitch-idl.c
Traceback (most recent call last):
File "/builddir/build/BUILD/ovn2.11-2.11.1/./ovsdb/ovsdb-idlc.in",
line 1581, in <module>
func(*args[1:])
File "/builddir/build/BUILD/ovn2.11-2.11.1/./ovsdb/ovsdb-idlc.in",
line 442, in printCIDLSource
replace_cplusplus_keyword(schema)
File "/builddir/build/BUILD/ovn2.11-2.11.1/./ovsdb/ovsdb-idlc.in",
line 179, in replace_cplusplus_keyword
for columnName in table.columns:
RuntimeError: dictionary keys changed during iteration
make: *** [Makefile:8534: lib/vswitch-idl.c] Error 1
similar for openvswitch2.11 (this still with parallel make with -j2):
Traceback (most recent call last):
File "/builddir/build/BUILD/ovs-2.11.3/build-shared/../ovsdb/ovsdb-idlc.in",
line 1597, in <module>
func(*args[1:])
File "/builddir/build/BUILD/ovs-2.11.3/build-shared/../ovsdb/ovsdb-idlc.in",
line 458, in printCIDLSource
replace_cplusplus_keyword(schema)
File "/builddir/build/BUILD/ovs-2.11.3/build-shared/../ovsdb/ovsdb-idlc.in",
line 179, in replace_cplusplus_keyword
for columnName in table.columns:
RuntimeError: dictionary keys changed during iteration
make: *** [Makefile:8383: lib/vswitch-idl.c] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
File "/builddir/build/BUILD/ovs-2.11.3/build-shared/../ovsdb/ovsdb-idlc.in",
line 1597, in <module>
func(*args[1:])
File "/builddir/build/BUILD/ovs-2.11.3/build-shared/../ovsdb/ovsdb-idlc.in",
line 185, in printCIDLHeader
replace_cplusplus_keyword(schema)
File "/builddir/build/BUILD/ovs-2.11.3/build-shared/../ovsdb/ovsdb-idlc.in",
line 179, in replace_cplusplus_keyword
for columnName in table.columns:
RuntimeError: dictionary keys changed during iteration
make: *** [Makefile:8385: lib/vswitch-idl.h] Error 1
I thought it could have been due to parallel make so I forced make to run
with -j1 but without changing the result.
Any clue on how to get the build working?
Thanks,
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
3 years, 3 months