[VDSM] CI fails in QemuGuestAgentTests.test_pci_devices

VDSM CI is failing now consistently. Last related change is: https://gerrit.ovirt.org/c/vdsm/+/116090 Build: https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29212//artifact/chec... _____________________ QemuGuestAgentTests.test_pci_devices _____________________ self = <virt.qemuguestagent_test.QemuGuestAgentTests testMethod=test_pci_devices> def test_pci_devices(self): devices = self.qga_poller._qga_call_get_devices(self.vm)['pci_devices'] # Ethernet is returned twice by the agent but should appear only # once in the list assert len(devices) == 2 eth = [d for d in devices if d['device_id'] == 4096][0] assert eth == { 'device_id': 4096, 'driver_date': '2019-08-12', 'driver_name': 'Red Hat VirtIO Ethernet Adapter', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } balloon = [d for d in devices if d['device_id'] == 4098][0]
assert balloon == {
'device_id': 4098, 'driver_date': '2019-08-12', 'driver_name': 'VirtIO Balloon Driver', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } E AssertionError: assert {'device_id':...4.17300', ...} == {'device_id':...4.17300', ...} E Omitting 4 identical items, use -vv to show E Differing items: E {'driver_date': '2019-08-11'} != {'driver_date': '2019-08-12'} Are we using real time in this test? E Use -v to get the full diff

Nir Soffer <nsoffer@redhat.com> writes:
VDSM CI is failing now consistently. Last related change is: https://gerrit.ovirt.org/c/vdsm/+/116090
It doesn't look related.
Build: https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29212//artifact/chec...
_____________________ QemuGuestAgentTests.test_pci_devices _____________________
self = <virt.qemuguestagent_test.QemuGuestAgentTests testMethod=test_pci_devices>
def test_pci_devices(self): devices = self.qga_poller._qga_call_get_devices(self.vm)['pci_devices'] # Ethernet is returned twice by the agent but should appear only # once in the list assert len(devices) == 2 eth = [d for d in devices if d['device_id'] == 4096][0] assert eth == { 'device_id': 4096, 'driver_date': '2019-08-12', 'driver_name': 'Red Hat VirtIO Ethernet Adapter', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } balloon = [d for d in devices if d['device_id'] == 4098][0]
assert balloon == {
'device_id': 4098, 'driver_date': '2019-08-12', 'driver_name': 'VirtIO Balloon Driver', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } E AssertionError: assert {'device_id':...4.17300', ...} == {'device_id':...4.17300', ...} E Omitting 4 identical items, use -vv to show E Differing items: E {'driver_date': '2019-08-11'} != {'driver_date': '2019-08-12'}
Are we using real time in this test?
No, I think the problem is related to the following line 'driver-date': 1565568000000000000, in qemuguestagent_test.py and the corresponding conversion date = datetime.date.fromtimestamp(date // 1e9).isoformat() in guestagenthelpers.py. The test passes on my computer so I wonder whether it may be a bug in some version of python3-dateutil. I also wonder whether the fact that it occurred on 12 August is just a coincidence.
E Use -v to get the full diff

Milan Zamazal <mzamazal@redhat.com> writes:
Nir Soffer <nsoffer@redhat.com> writes:
VDSM CI is failing now consistently. Last related change is: https://gerrit.ovirt.org/c/vdsm/+/116090
It doesn't look related.
Build: https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29212//artifact/chec...
_____________________ QemuGuestAgentTests.test_pci_devices _____________________
self = <virt.qemuguestagent_test.QemuGuestAgentTests testMethod=test_pci_devices>
def test_pci_devices(self): devices = self.qga_poller._qga_call_get_devices(self.vm)['pci_devices'] # Ethernet is returned twice by the agent but should appear only # once in the list assert len(devices) == 2 eth = [d for d in devices if d['device_id'] == 4096][0] assert eth == { 'device_id': 4096, 'driver_date': '2019-08-12', 'driver_name': 'Red Hat VirtIO Ethernet Adapter', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } balloon = [d for d in devices if d['device_id'] == 4098][0]
assert balloon == {
'device_id': 4098, 'driver_date': '2019-08-12', 'driver_name': 'VirtIO Balloon Driver', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } E AssertionError: assert {'device_id':...4.17300', ...} == {'device_id':...4.17300', ...} E Omitting 4 identical items, use -vv to show E Differing items: E {'driver_date': '2019-08-11'} != {'driver_date': '2019-08-12'}
Are we using real time in this test?
No, I think the problem is related to the following line
'driver-date': 1565568000000000000,
in qemuguestagent_test.py and the corresponding conversion
date = datetime.date.fromtimestamp(date // 1e9).isoformat()
in guestagenthelpers.py.
The test passes on my computer so I wonder whether it may be a bug in some version of python3-dateutil. I also wonder whether the fact that it occurred on 12 August is just a coincidence.
Looks correct also on current CentOS Stream 8:
datetime.datetime.fromtimestamp(1565568000000000000 // 1e9).isoformat() '2019-08-12T00:00:00'
And there was no leap second this year that could cause it. I guess it could be fixed by shifting the driver timestamp a bit forward but then the test wouldn't check for these kinds of errors anymore. So I'd suggest disabling the test if it causes trouble until the failing part is identified.
E Use -v to get the full diff

Milan Zamazal <mzamazal@redhat.com> writes:
Milan Zamazal <mzamazal@redhat.com> writes:
Nir Soffer <nsoffer@redhat.com> writes:
VDSM CI is failing now consistently. Last related change is: https://gerrit.ovirt.org/c/vdsm/+/116090
It doesn't look related.
Build: https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29212//artifact/chec...
_____________________ QemuGuestAgentTests.test_pci_devices _____________________
self = <virt.qemuguestagent_test.QemuGuestAgentTests testMethod=test_pci_devices>
def test_pci_devices(self): devices = self.qga_poller._qga_call_get_devices(self.vm)['pci_devices'] # Ethernet is returned twice by the agent but should appear only # once in the list assert len(devices) == 2 eth = [d for d in devices if d['device_id'] == 4096][0] assert eth == { 'device_id': 4096, 'driver_date': '2019-08-12', 'driver_name': 'Red Hat VirtIO Ethernet Adapter', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } balloon = [d for d in devices if d['device_id'] == 4098][0]
assert balloon == {
'device_id': 4098, 'driver_date': '2019-08-12', 'driver_name': 'VirtIO Balloon Driver', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } E AssertionError: assert {'device_id':...4.17300', ...} == {'device_id':...4.17300', ...} E Omitting 4 identical items, use -vv to show E Differing items: E {'driver_date': '2019-08-11'} != {'driver_date': '2019-08-12'}
Are we using real time in this test?
No, I think the problem is related to the following line
'driver-date': 1565568000000000000,
in qemuguestagent_test.py and the corresponding conversion
date = datetime.date.fromtimestamp(date // 1e9).isoformat()
in guestagenthelpers.py.
The test passes on my computer so I wonder whether it may be a bug in some version of python3-dateutil. I also wonder whether the fact that it occurred on 12 August is just a coincidence.
Looks correct also on current CentOS Stream 8:
datetime.datetime.fromtimestamp(1565568000000000000 // 1e9).isoformat() '2019-08-12T00:00:00'
And there was no leap second this year that could cause it.
I guess it could be fixed by shifting the driver timestamp a bit forward but then the test wouldn't check for these kinds of errors anymore. So I'd suggest disabling the test if it causes trouble until the failing part is identified.
Or could it be an issue, under some circumstances, with using float in the computation (which is wrong anyway)? Let's try: https://gerrit.ovirt.org/c/vdsm/+/116182
E Use -v to get the full diff

On Thu, Aug 12, 2021 at 7:00 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Milan Zamazal <mzamazal@redhat.com> writes:
Milan Zamazal <mzamazal@redhat.com> writes:
Nir Soffer <nsoffer@redhat.com> writes:
VDSM CI is failing now consistently. Last related change is: https://gerrit.ovirt.org/c/vdsm/+/116090
It doesn't look related.
Build: https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29212//artifact/chec...
_____________________ QemuGuestAgentTests.test_pci_devices _____________________
self = <virt.qemuguestagent_test.QemuGuestAgentTests testMethod=test_pci_devices>
def test_pci_devices(self): devices = self.qga_poller._qga_call_get_devices(self.vm)['pci_devices'] # Ethernet is returned twice by the agent but should appear only # once in the list assert len(devices) == 2 eth = [d for d in devices if d['device_id'] == 4096][0] assert eth == { 'device_id': 4096, 'driver_date': '2019-08-12', 'driver_name': 'Red Hat VirtIO Ethernet Adapter', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } balloon = [d for d in devices if d['device_id'] == 4098][0]
assert balloon == {
'device_id': 4098, 'driver_date': '2019-08-12', 'driver_name': 'VirtIO Balloon Driver', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } E AssertionError: assert {'device_id':...4.17300', ...} == {'device_id':...4.17300', ...} E Omitting 4 identical items, use -vv to show E Differing items: E {'driver_date': '2019-08-11'} != {'driver_date': '2019-08-12'}
Are we using real time in this test?
No, I think the problem is related to the following line
'driver-date': 1565568000000000000,
in qemuguestagent_test.py and the corresponding conversion
date = datetime.date.fromtimestamp(date // 1e9).isoformat()
in guestagenthelpers.py.
The test passes on my computer so I wonder whether it may be a bug in some version of python3-dateutil. I also wonder whether the fact that it occurred on 12 August is just a coincidence.
Looks correct also on current CentOS Stream 8:
datetime.datetime.fromtimestamp(1565568000000000000 // 1e9).isoformat() '2019-08-12T00:00:00'
And there was no leap second this year that could cause it.
I guess it could be fixed by shifting the driver timestamp a bit forward but then the test wouldn't check for these kinds of errors anymore. So I'd suggest disabling the test if it causes trouble until the failing part is identified.
Or could it be an issue, under some circumstances, with using float in the computation (which is wrong anyway)? Let's try: https://gerrit.ovirt.org/c/vdsm/+/116182
Creating datetime from timestamp means we use real time instead of monotonic time which is suspected. I hope we don't use real time in the actual code and this is only a test issue. We have other issues now: - consistent timeout in merge tests [2021-08-12T15:54:08.218Z] storage/merge_test.py .........x..x.. ============================================================= [2021-08-12T15:54:08.218Z] = Watched process timed out = [2021-08-12T15:54:08.218Z] ============================================================= [2021-08-12T15:54:08.218Z] [New LWP 31618] [2021-08-12T15:54:08.218Z] [Thread debugging using libthread_db enabled] [2021-08-12T15:54:08.218Z] Using host libthread_db library "/lib64/libthread_db.so.1". [2021-08-12T15:54:08.218Z] 0x00007f0bcb0a4a41 in poll () from /lib64/libc.so.6 [2021-08-12T15:54:08.218Z] [2021-08-12T15:54:08.218Z] Thread 2 (Thread 0x7f0bb1e35700 (LWP 31618)): [2021-08-12T15:54:08.218Z] Undefined command: "py-bt". Try "help". [2021-08-12T15:54:08.218Z] [Inferior 1 (process 22705) detached] I'm trying to debug it here: https://gerrit.ovirt.org/c/vdsm/+/116180 - network tests failing: [2021-08-12T15:56:17.928Z] network/integration/link_bond_test.py ...F...FFFFFF.FF..... [ 11%] Maybe changes in centos stream are breaking us again. Nir

Nir Soffer <nsoffer@redhat.com> writes:
On Thu, Aug 12, 2021 at 7:00 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Milan Zamazal <mzamazal@redhat.com> writes:
Milan Zamazal <mzamazal@redhat.com> writes:
Nir Soffer <nsoffer@redhat.com> writes:
VDSM CI is failing now consistently. Last related change is: https://gerrit.ovirt.org/c/vdsm/+/116090
It doesn't look related.
Build: https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29212//artifact/chec...
_____________________ QemuGuestAgentTests.test_pci_devices _____________________
self = <virt.qemuguestagent_test.QemuGuestAgentTests testMethod=test_pci_devices>
def test_pci_devices(self): devices = self.qga_poller._qga_call_get_devices(self.vm)['pci_devices'] # Ethernet is returned twice by the agent but should appear only # once in the list assert len(devices) == 2 eth = [d for d in devices if d['device_id'] == 4096][0] assert eth == { 'device_id': 4096, 'driver_date': '2019-08-12', 'driver_name': 'Red Hat VirtIO Ethernet Adapter', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } balloon = [d for d in devices if d['device_id'] == 4098][0]
assert balloon == {
'device_id': 4098, 'driver_date': '2019-08-12', 'driver_name': 'VirtIO Balloon Driver', 'driver_version': '100.80.104.17300', 'vendor_id': 6900, } E AssertionError: assert {'device_id':...4.17300', ...} == {'device_id':...4.17300', ...} E Omitting 4 identical items, use -vv to show E Differing items: E {'driver_date': '2019-08-11'} != {'driver_date': '2019-08-12'}
Are we using real time in this test?
No, I think the problem is related to the following line
'driver-date': 1565568000000000000,
in qemuguestagent_test.py and the corresponding conversion
date = datetime.date.fromtimestamp(date // 1e9).isoformat()
in guestagenthelpers.py.
The test passes on my computer so I wonder whether it may be a bug in some version of python3-dateutil. I also wonder whether the fact that it occurred on 12 August is just a coincidence.
Looks correct also on current CentOS Stream 8:
datetime.datetime.fromtimestamp(1565568000000000000 // 1e9).isoformat() '2019-08-12T00:00:00'
And there was no leap second this year that could cause it.
I guess it could be fixed by shifting the driver timestamp a bit forward but then the test wouldn't check for these kinds of errors anymore. So I'd suggest disabling the test if it causes trouble until the failing part is identified.
Or could it be an issue, under some circumstances, with using float in the computation (which is wrong anyway)? Let's try: https://gerrit.ovirt.org/c/vdsm/+/116182
Creating datetime from timestamp means we use real time instead of monotonic time which is suspected.
I hope we don't use real time in the actual code and this is only a test issue.
It is a conversion of nanosecond-precision driver timestamps obtained from QEMU to dates. It has nothing to do with Vdsm time tracking.
We have other issues now:
- consistent timeout in merge tests
[2021-08-12T15:54:08.218Z] storage/merge_test.py .........x..x.. ============================================================= [2021-08-12T15:54:08.218Z] = Watched process timed out = [2021-08-12T15:54:08.218Z] ============================================================= [2021-08-12T15:54:08.218Z] [New LWP 31618] [2021-08-12T15:54:08.218Z] [Thread debugging using libthread_db enabled] [2021-08-12T15:54:08.218Z] Using host libthread_db library "/lib64/libthread_db.so.1". [2021-08-12T15:54:08.218Z] 0x00007f0bcb0a4a41 in poll () from /lib64/libc.so.6 [2021-08-12T15:54:08.218Z] [2021-08-12T15:54:08.218Z] Thread 2 (Thread 0x7f0bb1e35700 (LWP 31618)): [2021-08-12T15:54:08.218Z] Undefined command: "py-bt". Try "help". [2021-08-12T15:54:08.218Z] [Inferior 1 (process 22705) detached]
I'm trying to debug it here: https://gerrit.ovirt.org/c/vdsm/+/116180
- network tests failing:
[2021-08-12T15:56:17.928Z] network/integration/link_bond_test.py ...F...FFFFFF.FF..... [ 11%]
Maybe changes in centos stream are breaking us again.
Hm, let's see.
participants (2)
-
Milan Zamazal
-
Nir Soffer