Failures in OST (4.0/master) ( was error msg from Jenkins )

Renaming title and adding devel. On Sun, Nov 20, 2016 at 2:36 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
The last failure seems to be storage related.
@Nir please take a look.
Here is engine side error:
2016-11-20 05:54:59,605 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (default task-5) [59fc0074] Exception: org.ovirt.engine.core. vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: u'spUUID=1ca141f1-b64d-4a52-8861-05c7de2a72b2, msdUUID=7d4bf750-4fb8-463f- bbb0-92156c47306e'
and here is vdsm:
jsonrpc.Executor/5::ERROR::2016-11-20 05:54:56,331::multipath::95:: Storage.Multipath::(resize_devices) Could not resize device 360014052749733c7b8248628637b990f Traceback (most recent call last): File "/usr/share/vdsm/storage/multipath.py", line 93, in resize_devices _resize_if_needed(guid) File "/usr/share/vdsm/storage/multipath.py", line 101, in _resize_if_needed for slave in devicemapper.getSlaves(name)] File "/usr/share/vdsm/storage/multipath.py", line 158, in getDeviceSize bs, phyBs = getDeviceBlockSizes(devName) File "/usr/share/vdsm/storage/multipath.py", line 150, in getDeviceBlockSizes "queue", "logical_block_size")).read()) IOError: [Errno 2] No such file or directory: '/sys/block/sdb/queue/logical_block_size'
We now see a different error in master [1], which also indicates the hosts are in a problematic state: ( failing 'assign_hosts_network_label' test ) status: 409 reason: Conflict detail: Cannot add Label. Operation can be performed only when Host status is Maintenance, Up, NonOperational. -------------------- >> begin captured logging << -------------------- [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/testRe...
On Sun, Nov 20, 2016 at 12:50 PM, Eyal Edri <eedri@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:42 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:30 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:18 PM, Eyal Edri <eedri@redhat.com> wrote:
the test fails to run VM because no hosts are in UP state(?) [1], not sure it is related to the triggering patch[2]
status: 400 reason: Bad Request detail: There are no hosts to use. Check that the cluster contains at least one host in Up state.
Thoughts? Shouldn't we fail the test earlier we hosts are not UP?
Yes. It's more likely that we are picking the wrong host or so, but who knows - where are the engine and VDSM logs?
A simple grep on the engine.log[1] finds serveral unrelated issues I'm not sure are reported, it's despairing to even begin... That being said, I don't see the issue there. We may need better logging on the API level, to see what is being sent. Is it consistent?
Just failed now the first time, I didn't see it before.
Y.
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4. 0/3015/artifact/exported-artifacts/basic_suite_4.0.sh-el7/ex ported-artifacts/test_logs/basic-suite-4.0/post-004_basic_ sanity.py/lago-basic-suite-4-0-engine/_var_log_ovirt-engine/engine.log
Y.
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimenta l_4.0/3015/testReport/junit/(root)/004_basic_sanity/vm_run/ [2] http://jenkins.ovirt.org/job/ovirt-engine_4.0_build-arti facts-el7-x86_64/1535/changes#detail
On Sun, Nov 20, 2016 at 1:00 PM, <jenkins@jenkins.phx.ovirt.org> wrote:
Build: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4. 0/3015/, Build Number: 3015, Build Status: FAILURE _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On Sun, Nov 20, 2016 at 6:30 PM, Eyal Edri <eedri@redhat.com> wrote:
Renaming title and adding devel.
On Sun, Nov 20, 2016 at 2:36 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
The last failure seems to be storage related.
@Nir please take a look.
Here is engine side error:
2016-11-20 05:54:59,605 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (default task-5) [59fc0074] Exception: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: u'spUUID=1ca141f1-b64d-4a52-8861-05c7de2a72b2, msdUUID=7d4bf750-4fb8-463f-bbb0-92156c47306e'
and here is vdsm:
jsonrpc.Executor/5::ERROR::2016-11-20 05:54:56,331::multipath::95::Storage.Multipath::(resize_devices) Could not resize device 360014052749733c7b8248628637b990f Traceback (most recent call last): File "/usr/share/vdsm/storage/multipath.py", line 93, in resize_devices _resize_if_needed(guid) File "/usr/share/vdsm/storage/multipath.py", line 101, in _resize_if_needed for slave in devicemapper.getSlaves(name)] File "/usr/share/vdsm/storage/multipath.py", line 158, in getDeviceSize bs, phyBs = getDeviceBlockSizes(devName) File "/usr/share/vdsm/storage/multipath.py", line 150, in getDeviceBlockSizes "queue", "logical_block_size")).read()) IOError: [Errno 2] No such file or directory: '/sys/block/sdb/queue/logical_block_size'
Please open a bug for this, this is an expected situation (when device is during a scan), and we should be able to cope with it. Adding Fred who worked on this area. Nir
We now see a different error in master [1], which also indicates the hosts are in a problematic state: ( failing 'assign_hosts_network_label' test )
status: 409 reason: Conflict detail: Cannot add Label. Operation can be performed only when Host status is Maintenance, Up, NonOperational. -------------------- >> begin captured logging << --------------------
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/testRe...
On Sun, Nov 20, 2016 at 12:50 PM, Eyal Edri <eedri@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:42 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:30 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:18 PM, Eyal Edri <eedri@redhat.com> wrote:
the test fails to run VM because no hosts are in UP state(?) [1], not sure it is related to the triggering patch[2]
status: 400 reason: Bad Request detail: There are no hosts to use. Check that the cluster contains at least one host in Up state.
Thoughts? Shouldn't we fail the test earlier we hosts are not UP?
Yes. It's more likely that we are picking the wrong host or so, but who knows - where are the engine and VDSM logs?
A simple grep on the engine.log[1] finds serveral unrelated issues I'm not sure are reported, it's despairing to even begin... That being said, I don't see the issue there. We may need better logging on the API level, to see what is being sent. Is it consistent?
Just failed now the first time, I didn't see it before.
Y.
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/artifact/...
Y.
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/testRepor... [2] http://jenkins.ovirt.org/job/ovirt-engine_4.0_build-artifacts-el7-x86_64/153...
On Sun, Nov 20, 2016 at 1:00 PM, <jenkins@jenkins.phx.ovirt.org> wrote: > > Build: > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/, > Build Number: 3015, > Build Status: FAILURE > _______________________________________________ > Infra mailing list > Infra@ovirt.org > http://lists.ovirt.org/mailman/listinfo/infra >
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On Nov 20, 2016 6:30 PM, "Eyal Edri" <eedri@redhat.com> wrote:
Renaming title and adding devel.
On Sun, Nov 20, 2016 at 2:36 PM, Piotr Kliczewski <pkliczew@redhat.com>
wrote:
The last failure seems to be storage related.
@Nir please take a look.
Here is engine side error:
2016-11-20 05:54:59,605 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (default task-5) [59fc0074] Exception: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: u'spUUID=1ca141f1-b64d-4a52-8861-05c7de2a72b2, msdUUID=7d4bf750-4fb8-463f-bbb0-92156c47306e'
and here is vdsm:
jsonrpc.Executor/5::ERROR::2016-11-20
05:54:56,331::multipath::95::Storage.Multipath::(resize_devices) Could not resize device 360014052749733c7b8248628637b990f
Traceback (most recent call last): File "/usr/share/vdsm/storage/multipath.py", line 93, in resize_devices _resize_if_needed(guid) File "/usr/share/vdsm/storage/multipath.py", line 101, in _resize_if_needed for slave in devicemapper.getSlaves(name)] File "/usr/share/vdsm/storage/multipath.py", line 158, in getDeviceSize bs, phyBs = getDeviceBlockSizes(devName) File "/usr/share/vdsm/storage/multipath.py", line 150, in getDeviceBlockSizes "queue", "logical_block_size")).read()) IOError: [Errno 2] No such file or directory: '/sys/block/sdb/queue/logical_block_size'
We now see a different error in master [1], which also indicates the hosts are in a problematic state: ( failing 'assign_hosts_network_label' test )
status: 409 reason: Conflict detail: Cannot add Label. Operation can be performed only when Host status is Maintenance, Up, NonOperational.
I believe you are mixing unrelated issues. I've seen this once and I have an unproven theory : The previous suite restarts Engine after LDAP configuration then performs its test, which is quite short (24 seconds on my poor laptop + few additional secs between suites). I'm not convinced it is enough time for hosts status to be updated in Engine back to UP state. Y.
-------------------- >> begin captured logging << --------------------
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/3506/testRe...
On Sun, Nov 20, 2016 at 12:50 PM, Eyal Edri <eedri@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:42 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:30 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Nov 20, 2016 at 1:18 PM, Eyal Edri <eedri@redhat.com> wrote:
the test fails to run VM because no hosts are in UP state(?) [1],
not sure it is related to the triggering patch[2]
status: 400 reason: Bad Request detail: There are no hosts to use. Check that the cluster contains
at least one host in Up state.
Thoughts? Shouldn't we fail the test earlier we hosts are not UP?
Yes. It's more likely that we are picking the wrong host or so, but who knows - where are the engine and VDSM logs?
A simple grep on the engine.log[1] finds serveral unrelated issues I'm not sure are reported, it's despairing to even begin... That being said, I don't see the issue there. We may need better logging on the API level, to see what is being sent. Is it consistent?
Just failed now the first time, I didn't see it before.
Y.
[1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/artifact/...
Y.
[1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/testRepor...
[2] http://jenkins.ovirt.org/job/ovirt-engine_4.0_build-artifacts-el7-x86_64/153...
On Sun, Nov 20, 2016 at 1:00 PM, <jenkins@jenkins.phx.ovirt.org> wrote: > > Build: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_4.0/3015/, > Build Number: 3015, > Build Status: FAILURE > _______________________________________________ > Infra mailing list > Infra@ovirt.org > http://lists.ovirt.org/mailman/listinfo/infra >
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
participants (3)
-
Eyal Edri
-
Nir Soffer
-
Yaniv Kaul