From bkorren at redhat.com Sun Apr 1 10:33:19 2018 From: bkorren at redhat.com (Barak Korren) Date: Sun, 1 Apr 2018 13:33:19 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-01 ] [007_sd_reattach.reattach_storage_domain] Message-ID: Test failed: [ 007_sd_reattach.reattach_storage_domain ] Link to suspected patches: https://gerrit.ovirt.org/c/89634/2 (ovirt-imageio): daemon: Fix OPTIONS to reflect actual capabilities Link to Job: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1502/ Link to all logs: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1502/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-007_sd_reattach.py/ Error snippet from log: Fault reason is "Operation Failed". Fault detail is "[Network error during communication with the Host.]". HTTP response code is 400. -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From didi at redhat.com Mon Apr 2 06:18:13 2018 From: didi at redhat.com (Yedidyah Bar David) Date: Mon, 2 Apr 2018 09:18:13 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt Master (ovirt-engine) ] [ 27-03-2018 ] [ 004_basic_sanity.disk_operations ] In-Reply-To: References: Message-ID: On Thu, Mar 29, 2018 at 1:51 PM, Benny Zlotnik wrote: > I forgot to add the polling after the second snapshot as well, it was added > in this patch: https://gerrit.ovirt.org/#/c/89577/ > > Sorry for the inconvenience NP :-) Now happened again: http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4845/testReport/(root)/004_basic_sanity/snapshot_merge/ detail: Cannot remove Snapshot. Snapshot is currently being created for VM vm0. > > On Thu, Mar 29, 2018 at 1:46 PM, Yedidyah Bar David wrote: >> >> On Wed, Mar 28, 2018 at 4:19 PM, Gal Ben Haim wrote: >> > >> > >> > On Wed, Mar 28, 2018 at 2:48 PM, Benny Zlotnik >> > wrote: >> >> >> >> https://gerrit.ovirt.org/#/c/89534/ >> >> https://gerrit.ovirt.org/#/c/89537/ >> > >> > >> > Merged. >> >> Seems like it failed again similarly: >> >> 08:05:07 vm1_snapshots_service.snapshot_service(snapshot.id).remove() >> 08:05:07 File >> "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line >> 20230, in remove >> 08:05:07 self._internal_remove(headers, query, wait) >> 08:05:07 File >> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 271, >> in _internal_remove >> 08:05:07 return future.wait() if wait else future >> 08:05:07 File >> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in >> wait >> 08:05:07 return self._code(response) >> 08:05:07 File >> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 268, >> in callback >> 08:05:07 self._check_fault(response) >> 08:05:07 File >> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, >> in _check_fault >> 08:05:07 self._raise_error(response, body) >> 08:05:07 File >> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, >> in _raise_error >> 08:05:07 raise error >> 08:05:07 Error: Fault reason is "Operation Failed". Fault detail is >> "[Cannot remove Snapshot. Snapshot is currently being created for VM >> vm1.]". HTTP response code is 409. >> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4770/consoleFull >> >> Can you please check? Thanks. >> >> >> >> >> >> >> >> >> On Wed, Mar 28, 2018 at 12:01 PM, Dafna Ron wrote: >> >>> >> >>> Thanks Beni. >> >>> >> >>> can you please paste the fix that you submit to the mail? >> >>> >> >>> Thanks, >> >>> Dafna >> >>> >> >>> >> >>> On Wed, Mar 28, 2018 at 9:13 AM, Benny Zlotnik >> >>> wrote: >> >>>> >> >>>> This issue looks like it's caused by the DB lock being released >> >>>> before >> >>>> the engine memory lock, there was a similar race with the LSM test a >> >>>> few >> >>>> months back. I'll apply the same fix for the snapshot_cold_merge. >> >>>> . >> >>>> >> >>>> On Tue, Mar 27, 2018 at 10:03 PM, Benny Zlotnik >> >>>> wrote: >> >>>>> >> >>>>> Looking into this >> >>>>> >> >>>>> On Tue, Mar 27, 2018 at 6:45 PM, Dafna Ron wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> We have a failure on 004_basic_sanity.disk_operations. the reported >> >>>>>> change seems to me to be connected to the failure. >> >>>>>> >> >>>>>> Link and headline of suspected patches: >> >>>>>> >> >>>>>> core: introduce CreateSnapshotForVm - >> >>>>>> https://gerrit.ovirt.org/#/c/87671/ >> >>>>>> >> >>>>>> >> >>>>>> Link to Job: >> >>>>>> >> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6564/ >> >>>>>> >> >>>>>> Link to all logs: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6564/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-004_basic_sanity.py/ >> >>>>>> >> >>>>>> (Relevant) error snippet from the log: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> 2018-03-27 11:30:16,167-04 WARN >> >>>>>> [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] >> >>>>>> (default >> >>>>>> task-11) [9f713a3d-82a9-4db8-8207-bb69a0a2c550] Validation of >> >>>>>> action >> >>>>>> 'CreateSnapshotForVm' failed for user admin at internal-authz. Re >> >>>>>> asons: >> >>>>>> >> >>>>>> VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_SNAPSHOT_IS_BEING_TAKEN_FOR_VM,$VmName >> >>>>>> vm1 >> >>>>>> 2018-03-27 11:30:16,176-04 DEBUG >> >>>>>> >> >>>>>> [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] >> >>>>>> (default task-11) [9f713a3d-82a9-4db8-8207-bb69a0a2c550] method: >> >>>>>> runAction, >> >>>>>> params: [CreateSnapshotForVm, CreateSnapshotForVmParameters >> >>>>>> :{commandId='d8bfdb0e-5ae4-4fed-8852-a0b29e377708', user='null', >> >>>>>> commandType='Unknown', >> >>>>>> vmId='65e7fea5-8b7d-4281-bd0d-6a84de207a00'}], >> >>>>>> timeElapsed: 57ms >> >>>>>> 2018-03-27 11:30:16,186-04 ERROR >> >>>>>> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] >> >>>>>> (default >> >>>>>> task-11) [] Operation Failed: [Cannot create Snapshot. Snapshot is >> >>>>>> currently >> >>>>>> being created for VM vm1.] >> >>>>>> 2018-03-27 11:30:16,820-04 INFO >> >>>>>> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] >> >>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-100) >> >>>>>> [16c08ce6-9a73-489f-a1f2-56e21699f14a] Command >> >>>>>> 'CopyImageGroupVolumesData' >> >>>>>> (id: '4c4023 >> >>>>>> b4-21e8-4a98-8ac3-aba51967d160') waiting on child command id: >> >>>>>> '3d96b933-6e3b-4838-94bc-db3ff0cadcdc' type:'CopyData' to complete >> >>>>>> 2018-03-27 11:30:16,833-04 DEBUG >> >>>>>> >> >>>>>> [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] >> >>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-100) >> >>>>>> [16c08ce6-9a73-489f-a1f2-56e21699f14a] method: get, params: >> >>>>>> [a85992e9-40b8-4c80-9c >> >>>>>> 93-3bf767f6efa4], timeElapsed: 8ms >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> >> >> >> _______________________________________________ >> >> Infra mailing list >> >> Infra at ovirt.org >> >> http://lists.ovirt.org/mailman/listinfo/infra >> >> >> > >> > >> > >> > -- >> > GAL bEN HAIM >> > RHV DEVOPS >> > >> > _______________________________________________ >> > Infra mailing list >> > Infra at ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/infra >> > >> >> >> >> -- >> Didi > > -- Didi From didi at redhat.com Mon Apr 2 06:24:38 2018 From: didi at redhat.com (Yedidyah Bar David) Date: Mon, 2 Apr 2018 09:24:38 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt Master (ovirt-engine) ] [ 27-03-2018 ] [ 004_basic_sanity.disk_operations ] In-Reply-To: References: Message-ID: On Mon, Apr 2, 2018 at 9:18 AM, Yedidyah Bar David wrote: > On Thu, Mar 29, 2018 at 1:51 PM, Benny Zlotnik wrote: >> I forgot to add the polling after the second snapshot as well, it was added >> in this patch: https://gerrit.ovirt.org/#/c/89577/ >> >> Sorry for the inconvenience > > NP :-) > > Now happened again: > > http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4845/testReport/(root)/004_basic_sanity/snapshot_merge/ > > detail: Cannot remove Snapshot. Snapshot is currently being created for VM vm0. Well, probably because your patch(es) were only for basic-suite, and above failure is in HE. Hopefully will be handled by Gal's '[WIP] he&basic master: Share tests': https://gerrit.ovirt.org/89615 > >> >> On Thu, Mar 29, 2018 at 1:46 PM, Yedidyah Bar David wrote: >>> >>> On Wed, Mar 28, 2018 at 4:19 PM, Gal Ben Haim wrote: >>> > >>> > >>> > On Wed, Mar 28, 2018 at 2:48 PM, Benny Zlotnik >>> > wrote: >>> >> >>> >> https://gerrit.ovirt.org/#/c/89534/ >>> >> https://gerrit.ovirt.org/#/c/89537/ >>> > >>> > >>> > Merged. >>> >>> Seems like it failed again similarly: >>> >>> 08:05:07 vm1_snapshots_service.snapshot_service(snapshot.id).remove() >>> 08:05:07 File >>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line >>> 20230, in remove >>> 08:05:07 self._internal_remove(headers, query, wait) >>> 08:05:07 File >>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 271, >>> in _internal_remove >>> 08:05:07 return future.wait() if wait else future >>> 08:05:07 File >>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in >>> wait >>> 08:05:07 return self._code(response) >>> 08:05:07 File >>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 268, >>> in callback >>> 08:05:07 self._check_fault(response) >>> 08:05:07 File >>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, >>> in _check_fault >>> 08:05:07 self._raise_error(response, body) >>> 08:05:07 File >>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, >>> in _raise_error >>> 08:05:07 raise error >>> 08:05:07 Error: Fault reason is "Operation Failed". Fault detail is >>> "[Cannot remove Snapshot. Snapshot is currently being created for VM >>> vm1.]". HTTP response code is 409. >>> >>> >>> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4770/consoleFull >>> >>> Can you please check? Thanks. >>> >>> >> >>> >> >>> >> >>> >> On Wed, Mar 28, 2018 at 12:01 PM, Dafna Ron wrote: >>> >>> >>> >>> Thanks Beni. >>> >>> >>> >>> can you please paste the fix that you submit to the mail? >>> >>> >>> >>> Thanks, >>> >>> Dafna >>> >>> >>> >>> >>> >>> On Wed, Mar 28, 2018 at 9:13 AM, Benny Zlotnik >>> >>> wrote: >>> >>>> >>> >>>> This issue looks like it's caused by the DB lock being released >>> >>>> before >>> >>>> the engine memory lock, there was a similar race with the LSM test a >>> >>>> few >>> >>>> months back. I'll apply the same fix for the snapshot_cold_merge. >>> >>>> . >>> >>>> >>> >>>> On Tue, Mar 27, 2018 at 10:03 PM, Benny Zlotnik >>> >>>> wrote: >>> >>>>> >>> >>>>> Looking into this >>> >>>>> >>> >>>>> On Tue, Mar 27, 2018 at 6:45 PM, Dafna Ron wrote: >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> We have a failure on 004_basic_sanity.disk_operations. the reported >>> >>>>>> change seems to me to be connected to the failure. >>> >>>>>> >>> >>>>>> Link and headline of suspected patches: >>> >>>>>> >>> >>>>>> core: introduce CreateSnapshotForVm - >>> >>>>>> https://gerrit.ovirt.org/#/c/87671/ >>> >>>>>> >>> >>>>>> >>> >>>>>> Link to Job: >>> >>>>>> >>> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6564/ >>> >>>>>> >>> >>>>>> Link to all logs: >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6564/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-004_basic_sanity.py/ >>> >>>>>> >>> >>>>>> (Relevant) error snippet from the log: >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> 2018-03-27 11:30:16,167-04 WARN >>> >>>>>> [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] >>> >>>>>> (default >>> >>>>>> task-11) [9f713a3d-82a9-4db8-8207-bb69a0a2c550] Validation of >>> >>>>>> action >>> >>>>>> 'CreateSnapshotForVm' failed for user admin at internal-authz. Re >>> >>>>>> asons: >>> >>>>>> >>> >>>>>> VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_SNAPSHOT_IS_BEING_TAKEN_FOR_VM,$VmName >>> >>>>>> vm1 >>> >>>>>> 2018-03-27 11:30:16,176-04 DEBUG >>> >>>>>> >>> >>>>>> [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] >>> >>>>>> (default task-11) [9f713a3d-82a9-4db8-8207-bb69a0a2c550] method: >>> >>>>>> runAction, >>> >>>>>> params: [CreateSnapshotForVm, CreateSnapshotForVmParameters >>> >>>>>> :{commandId='d8bfdb0e-5ae4-4fed-8852-a0b29e377708', user='null', >>> >>>>>> commandType='Unknown', >>> >>>>>> vmId='65e7fea5-8b7d-4281-bd0d-6a84de207a00'}], >>> >>>>>> timeElapsed: 57ms >>> >>>>>> 2018-03-27 11:30:16,186-04 ERROR >>> >>>>>> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] >>> >>>>>> (default >>> >>>>>> task-11) [] Operation Failed: [Cannot create Snapshot. Snapshot is >>> >>>>>> currently >>> >>>>>> being created for VM vm1.] >>> >>>>>> 2018-03-27 11:30:16,820-04 INFO >>> >>>>>> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] >>> >>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-100) >>> >>>>>> [16c08ce6-9a73-489f-a1f2-56e21699f14a] Command >>> >>>>>> 'CopyImageGroupVolumesData' >>> >>>>>> (id: '4c4023 >>> >>>>>> b4-21e8-4a98-8ac3-aba51967d160') waiting on child command id: >>> >>>>>> '3d96b933-6e3b-4838-94bc-db3ff0cadcdc' type:'CopyData' to complete >>> >>>>>> 2018-03-27 11:30:16,833-04 DEBUG >>> >>>>>> >>> >>>>>> [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] >>> >>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-100) >>> >>>>>> [16c08ce6-9a73-489f-a1f2-56e21699f14a] method: get, params: >>> >>>>>> [a85992e9-40b8-4c80-9c >>> >>>>>> 93-3bf767f6efa4], timeElapsed: 8ms >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >>> >>> >> >>> >> >>> >> _______________________________________________ >>> >> Infra mailing list >>> >> Infra at ovirt.org >>> >> http://lists.ovirt.org/mailman/listinfo/infra >>> >> >>> > >>> > >>> > >>> > -- >>> > GAL bEN HAIM >>> > RHV DEVOPS >>> > >>> > _______________________________________________ >>> > Infra mailing list >>> > Infra at ovirt.org >>> > http://lists.ovirt.org/mailman/listinfo/infra >>> > >>> >>> >>> >>> -- >>> Didi >> >> > > > > -- > Didi -- Didi From ykaul at redhat.com Mon Apr 2 10:07:27 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Mon, 2 Apr 2018 13:07:27 +0300 Subject: [ovirt-devel] oVirt log analyzer In-Reply-To: References: <87bmf7rrmr.fsf@redhat.com> Message-ID: On Thu, Mar 29, 2018 at 9:11 PM, Tomas Jelinek wrote: > > > On Thu, Mar 29, 2018 at 7:55 PM, Greg Sheremeta > wrote: > >> Nice! I think a nice RFE would be to surface this info in the UI. >> >> On Thu, Mar 29, 2018 at 8:30 AM, Milan Zamazal >> wrote: >> >>> Hi, during last year Outreachy internship a tool for analyzing oVirt >>> logs was created. When it is provided with oVirt logs (such as SOS >>> reports, logs gathered by Lago, single or multiple log files) it tries >>> to identify and classify important lines from the logs and present them >>> in a structured form. Its primary purpose is to get a quick and easy >>> overview of actions and errors. >>> >> > I would add that it can correlate more log files (from > engine/vdsm/libvirt/quemu) and show a unified view of them. > It can follow the life of one entity (such as a VM) and show what was > going on with it across the system. I have used it a lot to look for races > and it was pretty useful for that. > This is not very clear from the readme, which only says 'Assuming your *oVirt logs* are stored in DIRECTORY ' - what logs exactly are in that directory? Is that the result of logs from ovirt-log-collector ? Y. > >> >>> The tool analyses given logs and produces text files with the extracted >>> information. There is an Emacs user interface that presents the output >>> in a nice way with added functionality such as filtering. Emacs haters >>> can use the plain text files or write another user interface. :-) >>> >>> You can get ovirt-log-analyzer from >>> https://github.com/mz-pdm/ovirt-log-analyzer >>> README.md explains how to use it. >>> >>> Note that ovirt-log-analyzer has been created within the limited >>> resources of an Outreachy internship with some additional work and not >>> everything is perfect. Feel free to make improvements. >>> >>> Regards, >>> Milan >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> >> -- >> >> GREG SHEREMETA >> >> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX >> >> Red Hat NA >> >> >> >> gshereme at redhat.com IRC: gshereme >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuvalt at redhat.com Mon Apr 2 10:41:46 2018 From: yuvalt at redhat.com (Yuval Turgeman) Date: Mon, 2 Apr 2018 13:41:46 +0300 Subject: [ovirt-devel] Failed To Build ovirt-node following the guide In-Reply-To: <201803291649568693113@coretek.com.cn> References: <201803291649568693113@coretek.com.cn> Message-ID: Hi, You are trying to build legacy node, while ovirt 4x supports next generation node (ovirt-node-ng). Start by cloning ovirt-node-ng (https://gerrit.ovirt.org/ovirt-node-ng), and follow the README. Thanks, Yuval. On Thu, Mar 29, 2018 at 11:49 AM, sundw wrote: > Hello,guys? > > I built ovirt-node following this guide(https://www.ovirt. > org/develop/projects/node/building/). > But I failed. > I got the following Error message after running "make iso publish": > > Error creating Live CD : Failed to find package 'ovirt- > node-plugin-vdsm' : No package(s) available to install > > Could you please give some advice? > > *BTW: Is the content of this url > "https://www.ovirt.org/develop/projects/node/building/ > " OUT OF DATE?* > > ------------------------------ > *???* > ????????????/?????? > 13378105625 <(337)%20810-5625> > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Mon Apr 2 12:07:30 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Mon, 2 Apr 2018 08:07:30 -0400 Subject: [ovirt-devel] Build virt-node on fedora26 failed In-Reply-To: <201803301158105426622@coretek.com.cn> References: <201803301158105426622@coretek.com.cn> Message-ID: That is a very outdated page (talks about Fedora 18 -- that should be a signal that this is outdated :)) Try this: https://github.com/oVirt/ovirt-node-ng/blob/master/docs/book/build.md On Thu, Mar 29, 2018 at 11:58 PM, sundw wrote: > Hi! > I build virt-node base on this guide(https://www.ovirt.org/ > develop/projects/node/building/). > I got the following error messages after I run "make publish": > > Installing tmp.repos/SRPMS/ovirt-node-3.7.0-0.0.master.fc27.src.rpm > error: Failed build dependencies: > /usr/share/selinux/devel/policyhelp is needed by ovirt- > node-3.7.0-0.0.master.fc27.noarch > make[1]: *** [Makefile:813: rpm] Error 1 > make[1]: Leaving directory '/home/coretek/git/ovirt-node' > make: *** [Makefile:827: publish] Error 2 > > Can anyone give me some suggestions? > Thanks! > > > ------------------------------ > *???* > ????????????/?????? > 13378105625 <(337)%20810-5625> > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA gshereme at redhat.com IRC: gshereme -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Mon Apr 2 12:08:43 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Mon, 2 Apr 2018 08:08:43 -0400 Subject: [ovirt-devel] Failed To Build ovirt-node following the guide In-Reply-To: References: <201803291649568693113@coretek.com.cn> Message-ID: https://github.com/oVirt/ovirt-node-ng/blob/master/docs/book/build.md On Mon, Apr 2, 2018 at 6:41 AM, Yuval Turgeman wrote: > Hi, > > You are trying to build legacy node, while ovirt 4x supports next > generation node (ovirt-node-ng). > > Start by cloning ovirt-node-ng (https://gerrit.ovirt.org/ovirt-node-ng), > and follow the README. > > Thanks, > Yuval. > > > On Thu, Mar 29, 2018 at 11:49 AM, sundw > wrote: > >> Hello,guys? >> >> I built ovirt-node following this guide(https://www.ovirt.or >> g/develop/projects/node/building/). >> But I failed. >> I got the following Error message after running "make iso publish": >> >> Error creating Live CD : Failed to find package 'ovirt-node- >> plugin-vdsm' : No package(s) available to install >> >> Could you please give some advice? >> >> *BTW: Is the content of this url >> "https://www.ovirt.org/develop/projects/node/building/ >> " OUT OF DATE?* >> >> ------------------------------ >> *???* >> ????????????/?????? >> 13378105625 <(337)%20810-5625> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA gshereme at redhat.com IRC: gshereme -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabose at redhat.com Mon Apr 2 14:44:07 2018 From: sabose at redhat.com (Sahina Bose) Date: Mon, 2 Apr 2018 20:14:07 +0530 Subject: [ovirt-devel] [OST][HC] HE fails to deploy Message-ID: HE fails to deploy at waiting for host to be up in the local HE VM. The setup logs does not indicate why it failed - atleast I couldn't find anything ---------- Forwarded message ---------- From: Date: Mon, Apr 2, 2018 at 7:50 PM Subject: [oVirt Jenkins] ovirt-system-tests_hc-basic-suite-master - Build # 276 - Still Failing! To: infra at ovirt.org, sabose at redhat.com Project: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic- suite-master/ Build: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic- suite-master/276/ Build Number: 276 Build Status: Still Failing Triggered By: Started by timer ------------------------------------- Changes Since Last Success: ------------------------------------- Changes for Build #265 [Gal Ben Haim] Check if the prefix exists before printing its size [Sandro Bonazzola] ovirt-engine: add jobs for 4.1.10 async Changes for Build #266 [Gal Ben Haim] Check if the prefix exists before printing its size Changes for Build #267 [Gal Ben Haim] Check if the prefix exists before printing its size [Daniel Belenky] ppc repos: Use qemu EV release instead of test [Daniel Belenky] global_setup: Add generic package remove function [Daniel Belenky] Fix package verification in verify_packages Changes for Build #268 [Gal Ben Haim] Check if the prefix exists before printing its size Changes for Build #269 [Gal Ben Haim] Check if the prefix exists before printing its size Changes for Build #270 [Gal Ben Haim] Check if the prefix exists before printing its size Changes for Build #271 [Gal Ben Haim] Check if the prefix exists before printing its size Changes for Build #272 [Gal Ben Haim] Check if the prefix exists before printing its size Changes for Build #273 [Eitan Raviv] network: macpool: test disallowing dups while dups exist [Daniel Belenky] docker cleanup:Fix edge case for unamed containers [Daniel Belenky] nested_config: Count nesting level of options [Daniel Belenky] Introduce conditional execution in STDCI DSL [Daniel Belenky] Add OST STDCI V2 jobs Changes for Build #274 [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch Changes for Build #275 [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch Changes for Build #276 [Barak Korren] Force STDCI V2 job to use physical host [Daniel Belenky] Build container on changes to docker_cleanup ----------------- Failed Tests: ----------------- No tests ran. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stirabos at redhat.com Tue Apr 3 08:14:56 2018 From: stirabos at redhat.com (Simone Tiraboschi) Date: Tue, 3 Apr 2018 10:14:56 +0200 Subject: [ovirt-devel] [OST][HC] HE fails to deploy In-Reply-To: References: Message-ID: On Mon, Apr 2, 2018 at 4:44 PM, Sahina Bose wrote: > HE fails to deploy at waiting for host to be up in the local HE VM. > The setup logs does not indicate why it failed - atleast I couldn't find > anything > I see: "status": "install_failed" So I think that something went wrong with host-deploy on that host but we definitively need host-deploy logs for that and they are just on the engine VM. > > ---------- Forwarded message ---------- > From: > Date: Mon, Apr 2, 2018 at 7:50 PM > Subject: [oVirt Jenkins] ovirt-system-tests_hc-basic-suite-master - Build > # 276 - Still Failing! > To: infra at ovirt.org, sabose at redhat.com > > > Project: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-sui > te-master/ > Build: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-sui > te-master/276/ > Build Number: 276 > Build Status: Still Failing > Triggered By: Started by timer > > ------------------------------------- > Changes Since Last Success: > ------------------------------------- > Changes for Build #265 > [Gal Ben Haim] Check if the prefix exists before printing its size > > [Sandro Bonazzola] ovirt-engine: add jobs for 4.1.10 async > > > Changes for Build #266 > [Gal Ben Haim] Check if the prefix exists before printing its size > > > Changes for Build #267 > [Gal Ben Haim] Check if the prefix exists before printing its size > > [Daniel Belenky] ppc repos: Use qemu EV release instead of test > > [Daniel Belenky] global_setup: Add generic package remove function > > [Daniel Belenky] Fix package verification in verify_packages > > > Changes for Build #268 > [Gal Ben Haim] Check if the prefix exists before printing its size > > > Changes for Build #269 > [Gal Ben Haim] Check if the prefix exists before printing its size > > > Changes for Build #270 > [Gal Ben Haim] Check if the prefix exists before printing its size > > > Changes for Build #271 > [Gal Ben Haim] Check if the prefix exists before printing its size > > > Changes for Build #272 > [Gal Ben Haim] Check if the prefix exists before printing its size > > > Changes for Build #273 > [Eitan Raviv] network: macpool: test disallowing dups while dups exist > > [Daniel Belenky] docker cleanup:Fix edge case for unamed containers > > [Daniel Belenky] nested_config: Count nesting level of options > > [Daniel Belenky] Introduce conditional execution in STDCI DSL > > [Daniel Belenky] Add OST STDCI V2 jobs > > > Changes for Build #274 > [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch > > > Changes for Build #275 > [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch > > > Changes for Build #276 > [Barak Korren] Force STDCI V2 job to use physical host > > [Daniel Belenky] Build container on changes to docker_cleanup > > > > > ----------------- > Failed Tests: > ----------------- > No tests ran. > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stirabos at redhat.com Tue Apr 3 08:20:29 2018 From: stirabos at redhat.com (Simone Tiraboschi) Date: Tue, 3 Apr 2018 10:20:29 +0200 Subject: [ovirt-devel] [OST][HC] HE fails to deploy In-Reply-To: References: Message-ID: On Tue, Apr 3, 2018 at 10:14 AM, Simone Tiraboschi wrote: > > > On Mon, Apr 2, 2018 at 4:44 PM, Sahina Bose wrote: > >> HE fails to deploy at waiting for host to be up in the local HE VM. >> The setup logs does not indicate why it failed - atleast I couldn't find >> anything >> > > I see: > > "status": "install_failed" > > So I think that something went wrong with host-deploy on that host but we > definitively need host-deploy logs for that and they are just on the engine > VM. > According to the timestamps it could be related to: Apr 2 09:58:13 lago-hc-basic-suite-master-host-0 systemd: Starting Open vSwitch Database Unit... Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: runuser: System error Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: /etc/openvswitch/conf.db does not exist ... (warning). Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: Creating empty database /etc/openvswitch/conf.db runuser: System error Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: [FAILED] Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: ovsdb-server.service: control process exited, code=exited status=1 Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Failed to start Open vSwitch Database Unit. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Unit ovsdb-server.service entered failed state. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: ovsdb-server.service failed. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Invalid request descriptor Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Assertion failed for Open vSwitch Delete Transient Ports. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: ovsdb-server.service holdoff time over, scheduling restart. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: start request repeated too quickly for ovsdb-server.service Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Failed to start Open vSwitch Database Unit. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Unit ovsdb-server.service entered failed state. Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: ovsdb-server.service failed. > > >> >> ---------- Forwarded message ---------- >> From: >> Date: Mon, Apr 2, 2018 at 7:50 PM >> Subject: [oVirt Jenkins] ovirt-system-tests_hc-basic-suite-master - >> Build # 276 - Still Failing! >> To: infra at ovirt.org, sabose at redhat.com >> >> >> Project: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-sui >> te-master/ >> Build: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-sui >> te-master/276/ >> Build Number: 276 >> Build Status: Still Failing >> Triggered By: Started by timer >> >> ------------------------------------- >> Changes Since Last Success: >> ------------------------------------- >> Changes for Build #265 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> [Sandro Bonazzola] ovirt-engine: add jobs for 4.1.10 async >> >> >> Changes for Build #266 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> >> Changes for Build #267 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> [Daniel Belenky] ppc repos: Use qemu EV release instead of test >> >> [Daniel Belenky] global_setup: Add generic package remove function >> >> [Daniel Belenky] Fix package verification in verify_packages >> >> >> Changes for Build #268 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> >> Changes for Build #269 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> >> Changes for Build #270 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> >> Changes for Build #271 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> >> Changes for Build #272 >> [Gal Ben Haim] Check if the prefix exists before printing its size >> >> >> Changes for Build #273 >> [Eitan Raviv] network: macpool: test disallowing dups while dups exist >> >> [Daniel Belenky] docker cleanup:Fix edge case for unamed containers >> >> [Daniel Belenky] nested_config: Count nesting level of options >> >> [Daniel Belenky] Introduce conditional execution in STDCI DSL >> >> [Daniel Belenky] Add OST STDCI V2 jobs >> >> >> Changes for Build #274 >> [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch >> >> >> Changes for Build #275 >> [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch >> >> >> Changes for Build #276 >> [Barak Korren] Force STDCI V2 job to use physical host >> >> [Daniel Belenky] Build container on changes to docker_cleanup >> >> >> >> >> ----------------- >> Failed Tests: >> ----------------- >> No tests ran. >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mzamazal at redhat.com Tue Apr 3 09:48:32 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Tue, 03 Apr 2018 11:48:32 +0200 Subject: [ovirt-devel] oVirt log analyzer In-Reply-To: (Yaniv Kaul's message of "Mon, 2 Apr 2018 13:07:27 +0300") References: <87bmf7rrmr.fsf@redhat.com> Message-ID: <87h8os4o3z.fsf@redhat.com> Yaniv Kaul writes: > On Thu, Mar 29, 2018 at 9:11 PM, Tomas Jelinek wrote: > >> >> >> On Thu, Mar 29, 2018 at 7:55 PM, Greg Sheremeta >> wrote: >> >>> Nice! I think a nice RFE would be to surface this info in the UI. >>> >>> On Thu, Mar 29, 2018 at 8:30 AM, Milan Zamazal >>> wrote: >>> >>>> Hi, during last year Outreachy internship a tool for analyzing oVirt >>>> logs was created. When it is provided with oVirt logs (such as SOS >>>> reports, logs gathered by Lago, single or multiple log files) it tries >>>> to identify and classify important lines from the logs and present them >>>> in a structured form. Its primary purpose is to get a quick and easy >>>> overview of actions and errors. >>>> >>> >> I would add that it can correlate more log files (from >> engine/vdsm/libvirt/quemu) and show a unified view of them. >> It can follow the life of one entity (such as a VM) and show what was >> going on with it across the system. I have used it a lot to look for races >> and it was pretty useful for that. >> > > This is not very clear from the readme, which only says 'Assuming your *oVirt > logs* are stored in DIRECTORY ' - what logs exactly are in that > directory? Whatever logs you have got, unpacked (i.e. not in tar), either uncompressed or compressed with xz or gzip. The analyzer looks for Engine, Vdsm, libvirt, QEMU, sanlock, and SPM lock logs by default, identified by file names. Directory structure doesn't matter. > Is that the result of logs from ovirt-log-collector ? Y. > > >> >>> >>>> The tool analyses given logs and produces text files with the extracted >>>> information. There is an Emacs user interface that presents the output >>>> in a nice way with added functionality such as filtering. Emacs haters >>>> can use the plain text files or write another user interface. :-) >>>> >>>> You can get ovirt-log-analyzer from >>>> https://github.com/mz-pdm/ovirt-log-analyzer >>>> README.md explains how to use it. >>>> >>>> Note that ovirt-log-analyzer has been created within the limited >>>> resources of an Outreachy internship with some additional work and not >>>> everything is perfect. Feel free to make improvements. >>>> >>>> Regards, >>>> Milan >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> >>> -- >>> >>> GREG SHEREMETA >>> >>> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX >>> >>> Red Hat NA >>> >>> >>> >>> gshereme at redhat.com IRC: gshereme >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel From mzamazal at redhat.com Tue Apr 3 09:55:20 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Tue, 03 Apr 2018 11:55:20 +0200 Subject: [ovirt-devel] oVirt log analyzer In-Reply-To: (Barak Korren's message of "Thu, 29 Mar 2018 21:49:08 +0300") References: <87bmf7rrmr.fsf@redhat.com> Message-ID: <87d0zg4nsn.fsf@redhat.com> Barak Korren writes: > On 29 March 2018 at 21:11, Tomas Jelinek wrote: > >> >> >> On Thu, Mar 29, 2018 at 7:55 PM, Greg Sheremeta >> wrote: >> >>> Nice! I think a nice RFE would be to surface this info in the UI. >>> >>> On Thu, Mar 29, 2018 at 8:30 AM, Milan Zamazal >>> wrote: >>> >>>> Hi, during last year Outreachy internship a tool for analyzing oVirt >>>> logs was created. When it is provided with oVirt logs (such as SOS >>>> reports, logs gathered by Lago, single or multiple log files) it tries >>>> to identify and classify important lines from the logs and present them >>>> in a structured form. Its primary purpose is to get a quick and easy >>>> overview of actions and errors. >>>> >>> >> I would add that it can correlate more log files (from >> engine/vdsm/libvirt/quemu) and show a unified view of them. >> It can follow the life of one entity (such as a VM) and show what was >> going on with it across the system. I have used it a lot to look for races >> and it was pretty useful for that. >> > > > I wonder if we can automate running it on OST failures to get automated > failure analysis. Well, ovirt-log-analyzer doesn't provide failure analysis, it just tries to extract more important lines from the logs and add some information to them. It may or may not be useful to add its output to OST failures in future, but some feedback and testing are needed first, I assume there are still bugs and things to improve. >>>> The tool analyses given logs and produces text files with the extracted >>>> information. There is an Emacs user interface that presents the output >>>> in a nice way with added functionality such as filtering. Emacs haters >>>> can use the plain text files or write another user interface. :-) >>>> >>>> You can get ovirt-log-analyzer from >>>> https://github.com/mz-pdm/ovirt-log-analyzer >>>> README.md explains how to use it. >>>> >>>> Note that ovirt-log-analyzer has been created within the limited >>>> resources of an Outreachy internship with some additional work and not >>>> everything is perfect. Feel free to make improvements. >>>> >>>> Regards, >>>> Milan >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> >>> -- >>> >>> GREG SHEREMETA >>> >>> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX >>> >>> Red Hat NA >>> >>> >>> >>> gshereme at redhat.com IRC: gshereme >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> From bkorren at redhat.com Tue Apr 3 10:52:17 2018 From: bkorren at redhat.com (Barak Korren) Date: Tue, 3 Apr 2018 13:52:17 +0300 Subject: [ovirt-devel] oVirt log analyzer In-Reply-To: <87d0zg4nsn.fsf@redhat.com> References: <87bmf7rrmr.fsf@redhat.com> <87d0zg4nsn.fsf@redhat.com> Message-ID: On 3 April 2018 at 12:55, Milan Zamazal wrote: > Barak Korren writes: > >> On 29 March 2018 at 21:11, Tomas Jelinek wrote: >> >>> >>> >>> On Thu, Mar 29, 2018 at 7:55 PM, Greg Sheremeta >>> wrote: >>> >>>> Nice! I think a nice RFE would be to surface this info in the UI. >>>> >>>> On Thu, Mar 29, 2018 at 8:30 AM, Milan Zamazal >>>> wrote: >>>> >>>>> Hi, during last year Outreachy internship a tool for analyzing oVirt >>>>> logs was created. When it is provided with oVirt logs (such as SOS >>>>> reports, logs gathered by Lago, single or multiple log files) it tries >>>>> to identify and classify important lines from the logs and present them >>>>> in a structured form. Its primary purpose is to get a quick and easy >>>>> overview of actions and errors. >>>>> >>>> >>> I would add that it can correlate more log files (from >>> engine/vdsm/libvirt/quemu) and show a unified view of them. >>> It can follow the life of one entity (such as a VM) and show what was >>> going on with it across the system. I have used it a lot to look for races >>> and it was pretty useful for that. >>> >> >> >> I wonder if we can automate running it on OST failures to get automated >> failure analysis. > > Well, ovirt-log-analyzer doesn't provide failure analysis, it just tries > to extract more important lines from the logs and add some information > to them. It may or may not be useful to add its output to OST failures > in future, but some feedback and testing are needed first, I assume > there are still bugs and things to improve. > I'll try to be more specific about the use case we're lookin at. Right now, when a test fails in OST - al you get is the traceback from nose of the oVirt API cll you've made. For some tests additional information is provided by having nise collect it from STDOUT. It could be very useful if we could use some information about the API call that had been made, and use it to extract relevant information about the call from the vdms and engine logs, Can ovirt-log-analyzer be used for that? -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From bkorren at redhat.com Tue Apr 3 11:07:30 2018 From: bkorren at redhat.com (Barak Korren) Date: Tue, 3 Apr 2018 14:07:30 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] Message-ID: Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] Link to suspected patches: (Patch seems unrelated - do we have sporadic communication issues arising in PST?) https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: attempt to install vdsm-gluster Link to Job: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ Link to all logs: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ Error snippet from log: Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 78, in wrapper prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", line 139, in prepare_migration_attachments_ipv6 engine, host_service, MIGRATION_NETWORK, ip_configuration) File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", line 71, in modify_ip_config check_connectivity=True) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 36729, in setup_networks return self._internal_action(action, 'setupnetworks', None, headers, query, wait) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 299, in _internal_action return future.wait() if wait else future File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 296, in callback self._check_fault(response) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error Error: Fault reason is "Operation Failed". Fault detail is "[Network error during communication with the Host.]". HTTP response code is 400. -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From gbenhaim at redhat.com Tue Apr 3 11:13:32 2018 From: gbenhaim at redhat.com (Gal Ben Haim) Date: Tue, 3 Apr 2018 14:13:32 +0300 Subject: [ovirt-devel] Lago v0.43 Release Announcement Message-ID: Hi All, On behalf of the Lago team, I'm pleased to announce Lago v0.43 is available! For a detailed release announcement please refer to our Github page at: https://github.com/lago-project/lago/releases/tag/0.43 Resources ========== Lago Docs: http://lago.readthedocs.io/en/latest/ GitHub: https://github.com/lago-project/lago/ YUM Repository: http://resources.ovirt.org/repos/lago/stable/0.0/rpm/ Pypi: https://pypi.python.org/pypi/lago Changelog: http://lago.readthedocs.io/en/0.43/_static/ChangeLog.txt As always, if you find any problems, or willing to contribute, visit our GitHub page. Enjoy! -- *GAL bEN HAIM* RHV DEVOPS -------------- next part -------------- An HTML attachment was scrubbed... URL: From kyland\sundw at coretek.com.cn Tue Apr 3 10:56:48 2018 From: kyland\sundw at coretek.com.cn (sundw) Date: Tue, 3 Apr 2018 18:56:48 +0800 Subject: [ovirt-devel] How To Construct An Local Developing Enviroment? Message-ID: <201804031856386663754@coretek.com.cn> Hi Guys! I plan to construct an local developing environment for oVirt. But I can not find any relative pages from the web site(www.ovirt.org). Maybe I am not patient :). Could you please give me some advice or some links about How To Construct An Local Developing Enviroment? Thanks! ??? ????????????/?????? 13378105625 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bzlotnik at redhat.com Tue Apr 3 11:49:22 2018 From: bzlotnik at redhat.com (Benny Zlotnik) Date: Tue, 3 Apr 2018 14:49:22 +0300 Subject: [ovirt-devel] How To Construct An Local Developing Enviroment? In-Reply-To: <201804031856386663754@coretek.com.cn> References: <201804031856386663754@coretek.com.cn> Message-ID: Hi, Take a look here: https://www.ovirt.org/develop/developer-guide/engine/engine-development-environment/ On Tue, Apr 3, 2018 at 1:56 PM, sundw wrote: > Hi Guys! > I plan to construct an local developing environment for oVirt. > But I can not find any relative pages from the web site(www.ovirt.org). > Maybe I am not patient :). > > Could you please give me some advice or some links about > How To Construct An Local Developing Enviroment? > > Thanks! > > ------------------------------ > *???* > ????????????/?????? > 13378105625 <(337)%20810-5625> > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Tue Apr 3 11:51:41 2018 From: bkorren at redhat.com (Barak Korren) Date: Tue, 3 Apr 2018 14:51:41 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On 3 April 2018 at 14:07, Barak Korren wrote: > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] > > Link to suspected patches: > > (Patch seems unrelated - do we have sporadic communication issues > arising in PST?) > https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: > attempt to install vdsm-gluster > > Link to Job: > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ > > Link to all logs: > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ > > Error snippet from log: > > > > Traceback (most recent call last): > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > testMethod() > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest > self.test(*self.arg) > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 129, in wrapped_test > test() > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 59, in wrapper > return func(get_test_prefix(), *args, **kwargs) > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 78, in wrapper > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", > line 139, in prepare_migration_attachments_ipv6 > engine, host_service, MIGRATION_NETWORK, ip_configuration) > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", > line 71, in modify_ip_config > check_connectivity=True) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > line 36729, in setup_networks > return self._internal_action(action, 'setupnetworks', None, > headers, query, wait) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 299, in _internal_action > return future.wait() if wait else future > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 55, in wait > return self._code(response) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 296, in callback > self._check_fault(response) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 132, in _check_fault > self._raise_error(response, body) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 118, in _raise_error > raise error > Error: Fault reason is "Operation Failed". Fault detail is "[Network > error during communication with the Host.]". HTTP response code is > 400. > > > > > Same failure seems to have happened again - on a different patch - this time foe ovirt-engine: https://gerrit.ovirt.org/#/c/89748/1 Failed test run: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1523/ -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From danken at redhat.com Tue Apr 3 11:57:17 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Tue, 3 Apr 2018 14:57:17 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 3, 2018 at 2:07 PM, Barak Korren wrote: > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] > > Link to suspected patches: > > (Patch seems unrelated - do we have sporadic communication issues > arising in PST?) > https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: > attempt to install vdsm-gluster > > Link to Job: > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ > > Link to all logs: > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ > > Error snippet from log: > > > > Traceback (most recent call last): > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > testMethod() > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest > self.test(*self.arg) > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 129, in wrapped_test > test() > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 59, in wrapper > return func(get_test_prefix(), *args, **kwargs) > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 78, in wrapper > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", > line 139, in prepare_migration_attachments_ipv6 > engine, host_service, MIGRATION_NETWORK, ip_configuration) > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", > line 71, in modify_ip_config > check_connectivity=True) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > line 36729, in setup_networks > return self._internal_action(action, 'setupnetworks', None, > headers, query, wait) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 299, in _internal_action > return future.wait() if wait else future > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 55, in wait > return self._code(response) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 296, in callback > self._check_fault(response) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 132, in _check_fault > self._raise_error(response, body) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 118, in _raise_error > raise error > Error: Fault reason is "Operation Failed". Fault detail is "[Network > error during communication with the Host.]". HTTP response code is > 400. The error occurred sometime in the interval 09:32:58 [basic-suit] @ Run test: 006_migrations.py: 09:33:55 [basic-suit] Error occured, aborting and indeed http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-engine/_var_log/ovirt-engine/engine.log/*view*/ has Engine disconnected from the host at 2018-04-03 05:33:32,307-04 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Vds timeout occured Maybe Piotr can read more into it. From mperina at redhat.com Tue Apr 3 11:59:21 2018 From: mperina at redhat.com (Martin Perina) Date: Tue, 3 Apr 2018 13:59:21 +0200 Subject: [ovirt-devel] How To Construct An Local Developing Enviroment? In-Reply-To: References: <201804031856386663754@coretek.com.cn> Message-ID: On Tue, Apr 3, 2018 at 1:49 PM, Benny Zlotnik wrote: > Hi, > > Take a look here: https://www.ovirt.org/develop/ > developer-guide/engine/engine-development-environment/ > Unfortunately we have a lot of outdated information in above page, please use below instead: https://github.com/oVirt/ovirt-engine/blob/master/README.adoc And please don't forget about postgresql-contrib/rh-postgresql95-postgresql-contrib package requirement: http://lists.ovirt.org/pipermail/devel/2018-January/032452.html ? > > > On Tue, Apr 3, 2018 at 1:56 PM, sundw wrote: > >> Hi Guys! >> I plan to construct an local developing environment for oVirt. >> But I can not find any relative pages from the web site(www.ovirt.org). >> Maybe I am not patient :). >> >> Could you please give me some advice or some links about >> How To Construct An Local Developing Enviroment? >> >> Thanks! >> >> ------------------------------ >> *???* >> ????????????/?????? >> 13378105625 <(337)%20810-5625> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Tue Apr 3 12:01:12 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Tue, 3 Apr 2018 08:01:12 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Barak, I was getting these 400s and 409s sporadically all last week while iterating on my docker stuff. I thought maybe it was my messing with the http_proxy stuff or doing docker rms. Is it possible I'm breaking things? I'm still working on it. Been working in straight for a while now: https://gerrit.ovirt.org/#/c/67166/ On Tue, Apr 3, 2018 at 7:51 AM, Barak Korren wrote: > On 3 April 2018 at 14:07, Barak Korren wrote: > > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] > > > > Link to suspected patches: > > > > (Patch seems unrelated - do we have sporadic communication issues > > arising in PST?) > > https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: > > attempt to install vdsm-gluster > > > > Link to Job: > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ > > > > Link to all logs: > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1521/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/ > > > > Error snippet from log: > > > > > > > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > > testMethod() > > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in > runTest > > self.test(*self.arg) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 129, in wrapped_test > > test() > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 59, in wrapper > > return func(get_test_prefix(), *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 78, in wrapper > > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs > > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", > > line 139, in prepare_migration_attachments_ipv6 > > engine, host_service, MIGRATION_NETWORK, ip_configuration) > > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", > > line 71, in modify_ip_config > > check_connectivity=True) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > > line 36729, in setup_networks > > return self._internal_action(action, 'setupnetworks', None, > > headers, query, wait) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 299, in _internal_action > > return future.wait() if wait else future > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 55, in wait > > return self._code(response) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 296, in callback > > self._check_fault(response) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 132, in _check_fault > > self._raise_error(response, body) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 118, in _raise_error > > raise error > > Error: Fault reason is "Operation Failed". Fault detail is "[Network > > error during communication with the Host.]". HTTP response code is > > 400. > > > > > > > > > > > > > Same failure seems to have happened again - on a different patch - > this time foe ovirt-engine: > https://gerrit.ovirt.org/#/c/89748/1 > > Failed test run: > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1523/ > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA gshereme at redhat.com IRC: gshereme -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Tue Apr 3 12:07:19 2018 From: bkorren at redhat.com (Barak Korren) Date: Tue, 3 Apr 2018 15:07:19 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On 3 April 2018 at 15:01, Greg Sheremeta wrote: > Barak, I was getting these 400s and 409s sporadically all last week while > iterating on my docker stuff. I thought maybe it was my messing with the > http_proxy stuff or doing docker rms. Is it possible I'm breaking things? > I'm still working on it. Been working in straight for a while now: > https://gerrit.ovirt.org/#/c/67166/ > > Greg, your work shouldn't be affecting other things unless you merge it... -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkliczew at redhat.com Tue Apr 3 12:57:52 2018 From: pkliczew at redhat.com (Piotr Kliczewski) Date: Tue, 3 Apr 2018 14:57:52 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Dan, It looks like it was one of the calls triggered when vdsm was down: 2018-04-03 05:30:16,065-0400 INFO (mailbox-hsm) [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/ddb765d2-2137-437d-95f8-c46dbdbc7711/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', 'seek=1'] (mailbox:387) 2018-04-03 05:31:22,441-0400 INFO (MainThread) [vds] (PID: 20548) I am the actual vdsm 4.20.23-28.gitd11ed44.el7.centos lago-basic-suite-4-2-host-0 (3.10.0-693.21.1.el7.x86_64) (vdsmd:149) which failed and caused timeout. Thanks, Piotr On Tue, Apr 3, 2018 at 1:57 PM, Dan Kenigsberg wrote: > On Tue, Apr 3, 2018 at 2:07 PM, Barak Korren wrote: > > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] > > > > Link to suspected patches: > > > > (Patch seems unrelated - do we have sporadic communication issues > > arising in PST?) > > https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: > > attempt to install vdsm-gluster > > > > Link to Job: > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ > > > > Link to all logs: > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1521/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/ > > > > Error snippet from log: > > > > > > > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > > testMethod() > > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in > runTest > > self.test(*self.arg) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 129, in wrapped_test > > test() > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 59, in wrapper > > return func(get_test_prefix(), *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 78, in wrapper > > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs > > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", > > line 139, in prepare_migration_attachments_ipv6 > > engine, host_service, MIGRATION_NETWORK, ip_configuration) > > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", > > line 71, in modify_ip_config > > check_connectivity=True) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > > line 36729, in setup_networks > > return self._internal_action(action, 'setupnetworks', None, > > headers, query, wait) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 299, in _internal_action > > return future.wait() if wait else future > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 55, in wait > > return self._code(response) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 296, in callback > > self._check_fault(response) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 132, in _check_fault > > self._raise_error(response, body) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 118, in _raise_error > > raise error > > Error: Fault reason is "Operation Failed". Fault detail is "[Network > > error during communication with the Host.]". HTTP response code is > > 400. > > The error occurred sometime in the interval > > 09:32:58 [basic-suit] @ Run test: 006_migrations.py: > 09:33:55 [basic-suit] Error occured, aborting > > and indeed > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1521/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/lago- > basic-suite-4-2-engine/_var_log/ovirt-engine/engine.log/*view*/ > > has Engine disconnected from the host at > > 2018-04-03 05:33:32,307-04 ERROR > [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] > (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] Unable to > RefreshCapabilities: VDSNetworkException: VDSGenericException: > VDSNetworkException: Vds timeout occured > > Maybe Piotr can read more into it. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Tue Apr 3 14:43:00 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Tue, 3 Apr 2018 17:43:00 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 3, 2018 at 3:57 PM, Piotr Kliczewski wrote: > Dan, > > It looks like it was one of the calls triggered when vdsm was down: > > 2018-04-03 05:30:16,065-0400 INFO (mailbox-hsm) > [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - > ['/usr/bin/dd', > 'of=/rhev/data-center/ddb765d2-2137-437d-95f8-c46dbdbc7711/mastersd/dom_md/inbox', > 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', > 'seek=1'] (mailbox:387) > 2018-04-03 05:31:22,441-0400 INFO (MainThread) [vds] (PID: 20548) I am the > actual vdsm 4.20.23-28.gitd11ed44.el7.centos lago-basic-suite-4-2-host-0 > (3.10.0-693.21.1.el7.x86_64) (vdsmd:149) > > > which failed and caused timeout. > > Thanks, > Piotr > > On Tue, Apr 3, 2018 at 1:57 PM, Dan Kenigsberg wrote: >> >> On Tue, Apr 3, 2018 at 2:07 PM, Barak Korren wrote: >> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] >> > >> > Link to suspected patches: >> > >> > (Patch seems unrelated - do we have sporadic communication issues >> > arising in PST?) >> > https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: >> > attempt to install vdsm-gluster >> > >> > Link to Job: >> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ >> > >> > Link to all logs: >> > >> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ >> > >> > Error snippet from log: >> > >> > >> > >> > Traceback (most recent call last): >> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >> > testMethod() >> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in >> > runTest >> > self.test(*self.arg) >> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> > 129, in wrapped_test >> > test() >> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> > 59, in wrapper >> > return func(get_test_prefix(), *args, **kwargs) >> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> > 78, in wrapper >> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs >> > File >> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", >> > line 139, in prepare_migration_attachments_ipv6 >> > engine, host_service, MIGRATION_NETWORK, ip_configuration) >> > File >> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", >> > line 71, in modify_ip_config >> > check_connectivity=True) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", >> > line 36729, in setup_networks >> > return self._internal_action(action, 'setupnetworks', None, >> > headers, query, wait) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 299, in _internal_action >> > return future.wait() if wait else future >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 55, in wait >> > return self._code(response) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 296, in callback >> > self._check_fault(response) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 132, in _check_fault >> > self._raise_error(response, body) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 118, in _raise_error >> > raise error >> > Error: Fault reason is "Operation Failed". Fault detail is "[Network >> > error during communication with the Host.]". HTTP response code is >> > 400. >> >> The error occurred sometime in the interval >> >> 09:32:58 [basic-suit] @ Run test: 006_migrations.py: >> 09:33:55 [basic-suit] Error occured, aborting >> >> and indeed >> >> >> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-engine/_var_log/ovirt-engine/engine.log/*view*/ >> >> has Engine disconnected from the host at >> >> 2018-04-03 05:33:32,307-04 ERROR >> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] >> (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] Unable to >> RefreshCapabilities: VDSNetworkException: VDSGenericException: >> VDSNetworkException: Vds timeout occured >> >> Maybe Piotr can read more into it. I should have thought of a down vdsm; but it was down because what seems to be soft fencing http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/messages/*view*/ Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Started Session 46 of user root. Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Starting Session 46 of user root. Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopped MOM instance configured for VDSM purposes. Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopping Virtual Desktop Server Manager... Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel: scsi_verify_blk_ioctl: 33 callbacks suppressed Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel: dd: sending ioctl 80306d02 to a partition! Apr 3 05:30:17 lago-basic-suite-4-2-host-0 systemd: vdsmd.service stop-sigterm timed out. Killing. From bkorren at redhat.com Wed Apr 4 06:22:09 2018 From: bkorren at redhat.com (Barak Korren) Date: Wed, 4 Apr 2018 09:22:09 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-03 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On 3 April 2018 at 17:43, Dan Kenigsberg wrote: > On Tue, Apr 3, 2018 at 3:57 PM, Piotr Kliczewski wrote: >> Dan, >> >> It looks like it was one of the calls triggered when vdsm was down: >> >> 2018-04-03 05:30:16,065-0400 INFO (mailbox-hsm) >> [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - >> ['/usr/bin/dd', >> 'of=/rhev/data-center/ddb765d2-2137-437d-95f8-c46dbdbc7711/mastersd/dom_md/inbox', >> 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', >> 'seek=1'] (mailbox:387) >> 2018-04-03 05:31:22,441-0400 INFO (MainThread) [vds] (PID: 20548) I am the >> actual vdsm 4.20.23-28.gitd11ed44.el7.centos lago-basic-suite-4-2-host-0 >> (3.10.0-693.21.1.el7.x86_64) (vdsmd:149) >> >> >> which failed and caused timeout. >> >> Thanks, >> Piotr >> >> On Tue, Apr 3, 2018 at 1:57 PM, Dan Kenigsberg wrote: >>> >>> On Tue, Apr 3, 2018 at 2:07 PM, Barak Korren wrote: >>> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] >>> > >>> > Link to suspected patches: >>> > >>> > (Patch seems unrelated - do we have sporadic communication issues >>> > arising in PST?) >>> > https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: >>> > attempt to install vdsm-gluster >>> > >>> > Link to Job: >>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ >>> > >>> > Link to all logs: >>> > >>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ >>> > >>> > Error snippet from log: >>> > >>> > >>> > >>> > Traceback (most recent call last): >>> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >>> > testMethod() >>> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in >>> > runTest >>> > self.test(*self.arg) >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 129, in wrapped_test >>> > test() >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 59, in wrapper >>> > return func(get_test_prefix(), *args, **kwargs) >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 78, in wrapper >>> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs >>> > File >>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", >>> > line 139, in prepare_migration_attachments_ipv6 >>> > engine, host_service, MIGRATION_NETWORK, ip_configuration) >>> > File >>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", >>> > line 71, in modify_ip_config >>> > check_connectivity=True) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", >>> > line 36729, in setup_networks >>> > return self._internal_action(action, 'setupnetworks', None, >>> > headers, query, wait) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 299, in _internal_action >>> > return future.wait() if wait else future >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 55, in wait >>> > return self._code(response) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 296, in callback >>> > self._check_fault(response) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 132, in _check_fault >>> > self._raise_error(response, body) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 118, in _raise_error >>> > raise error >>> > Error: Fault reason is "Operation Failed". Fault detail is "[Network >>> > error during communication with the Host.]". HTTP response code is >>> > 400. >>> >>> The error occurred sometime in the interval >>> >>> 09:32:58 [basic-suit] @ Run test: 006_migrations.py: >>> 09:33:55 [basic-suit] Error occured, aborting >>> >>> and indeed >>> >>> >>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-engine/_var_log/ovirt-engine/engine.log/*view*/ >>> >>> has Engine disconnected from the host at >>> >>> 2018-04-03 05:33:32,307-04 ERROR >>> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] >>> (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] Unable to >>> RefreshCapabilities: VDSNetworkException: VDSGenericException: >>> VDSNetworkException: Vds timeout occured >>> >>> Maybe Piotr can read more into it. > > I should have thought of a down vdsm; but it was down because what > seems to be soft fencing > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/messages/*view*/ > > Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Started Session > 46 of user root. > Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Starting Session > 46 of user root. > Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopped MOM > instance configured for VDSM purposes. > Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopping Virtual > Desktop Server Manager... > Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel: > scsi_verify_blk_ioctl: 33 callbacks suppressed > Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel: dd: sending ioctl > 80306d02 to a partition! > Apr 3 05:30:17 lago-basic-suite-4-2-host-0 systemd: vdsmd.service > stop-sigterm timed out. Killing. This failure looks like another instance of the same issue: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1525/ -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From mzamazal at redhat.com Wed Apr 4 08:47:34 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Wed, 04 Apr 2018 10:47:34 +0200 Subject: [ovirt-devel] oVirt log analyzer In-Reply-To: (Barak Korren's message of "Tue, 3 Apr 2018 13:52:17 +0300") References: <87bmf7rrmr.fsf@redhat.com> <87d0zg4nsn.fsf@redhat.com> Message-ID: <87bmezcq8p.fsf@redhat.com> Barak Korren writes: > Right now, when a test fails in OST - al you get is the traceback from > nose of the oVirt API cll you've made. For some tests additional > information is provided by having nise collect it from STDOUT. > > It could be very useful if we could use some information about the API > call that had been made, and use it to extract relevant information > about the call from the vdms and engine logs, > > Can ovirt-log-analyzer be used for that? Maybe, but not out of the box. It can parse logs and find some relations between Engine and Vdsm logs, so it might be extended to do that. However it's more focused on guess work. Depending on kind of the information to be provided there could be simpler ways to get it. From piotr.kliczewski at gmail.com Wed Apr 4 11:01:56 2018 From: piotr.kliczewski at gmail.com (Piotr Kliczewski) Date: Wed, 4 Apr 2018 13:01:56 +0200 Subject: [ovirt-devel] oVirt log analyzer In-Reply-To: <87bmezcq8p.fsf@redhat.com> References: <87bmf7rrmr.fsf@redhat.com> <87d0zg4nsn.fsf@redhat.com> <87bmezcq8p.fsf@redhat.com> Message-ID: On Wed, Apr 4, 2018 at 10:47 AM, Milan Zamazal wrote: > Barak Korren writes: > >> Right now, when a test fails in OST - al you get is the traceback from >> nose of the oVirt API cll you've made. For some tests additional >> information is provided by having nise collect it from STDOUT. >> >> It could be very useful if we could use some information about the API >> call that had been made, and use it to extract relevant information >> about the call from the vdms and engine logs, >> >> Can ovirt-log-analyzer be used for that? > > Maybe, but not out of the box. It can parse logs and find some > relations between Engine and Vdsm logs, so it might be extended to do > that. However it's more focused on guess work. Depending on kind of > the information to be provided there could be simpler ways to get it. Is correlation id not enough to find the relations? > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel From mzamazal at redhat.com Wed Apr 4 11:17:09 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Wed, 04 Apr 2018 13:17:09 +0200 Subject: [ovirt-devel] oVirt log analyzer In-Reply-To: (Piotr Kliczewski's message of "Wed, 4 Apr 2018 13:01:56 +0200") References: <87bmf7rrmr.fsf@redhat.com> <87d0zg4nsn.fsf@redhat.com> <87bmezcq8p.fsf@redhat.com> Message-ID: <87y3i3b4qy.fsf@redhat.com> Piotr Kliczewski writes: > On Wed, Apr 4, 2018 at 10:47 AM, Milan Zamazal wrote: >> Barak Korren writes: >> > >>> Right now, when a test fails in OST - al you get is the traceback from >>> nose of the oVirt API cll you've made. For some tests additional >>> information is provided by having nise collect it from STDOUT. >>> >>> It could be very useful if we could use some information about the API >>> call that had been made, and use it to extract relevant information >>> about the call from the vdms and engine logs, >>> >>> Can ovirt-log-analyzer be used for that? >> >> Maybe, but not out of the box. It can parse logs and find some >> relations between Engine and Vdsm logs, so it might be extended to do >> that. However it's more focused on guess work. Depending on kind of >> the information to be provided there could be simpler ways to get it. > > Is correlation id not enough to find the relations? It is enough to find relations between API calls and something like grep may be easier to use for that than ovirt-log-analyzer. Again, it depends on what kind of information should be provided. From fromani at redhat.com Wed Apr 4 12:35:20 2018 From: fromani at redhat.com (Francesco Romani) Date: Wed, 4 Apr 2018 14:35:20 +0200 Subject: [ovirt-devel] [vdsm][ci] EL7 failing for timeout lately Message-ID: <270cf9fe-6b18-ca73-79d5-6b855d7fab5e@redhat.com> Hi all, in the last few days quite a lot of CI tests on LE7 workers failed for timeout. Random example: http://jenkins.ovirt.org/job/vdsm_4.2_check-patch-el7-x86_64/246/consoleFull This happened both on 4.2 and on master test suite, quite often but not always. I wonder if we should just increase the timeout or if something else is going on. Thoughts? -- Francesco Romani Senior SW Eng., Virtualization R&D Red Hat IRC: fromani github: @fromanirh From eedri at redhat.com Wed Apr 4 12:52:21 2018 From: eedri at redhat.com (Eyal Edri) Date: Wed, 4 Apr 2018 15:52:21 +0300 Subject: [ovirt-devel] [vdsm][ci] EL7 failing for timeout lately In-Reply-To: <270cf9fe-6b18-ca73-79d5-6b855d7fab5e@redhat.com> References: <270cf9fe-6b18-ca73-79d5-6b855d7fab5e@redhat.com> Message-ID: Adding infra, from infra side I'm not aware of any major changes that can cause this to happen. Have you considered introducing performance testing so you'll know how much time each test take and fail/warn if it start to take much longer? On Wed, Apr 4, 2018 at 3:35 PM, Francesco Romani wrote: > Hi all, > > > in the last few days quite a lot of CI tests on LE7 workers failed for > timeout. Random example: > http://jenkins.ovirt.org/job/vdsm_4.2_check-patch-el7-x86_ > 64/246/consoleFull > > > This happened both on 4.2 and on master test suite, quite often but not > always. > > > I wonder if we should just increase the timeout or if something else is > going on. > > > Thoughts? > > -- > Francesco Romani > Senior SW Eng., Virtualization R&D > Red Hat > IRC: fromani github: @fromanirh > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Wed Apr 4 13:59:40 2018 From: bkorren at redhat.com (Barak Korren) Date: Wed, 4 Apr 2018 16:59:40 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] Message-ID: Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] Link to suspected patches: (Probably unrelated) https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - examples: export template to an export domain This seems to happen multiple times sporadically, I thought this would be solved by https://gerrit.ovirt.org/#/c/89781/ but it isn't. Link to Job: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ Link to all logs: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ Error snippet from log: Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 78, in wrapper prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", line 139, in prepare_migration_attachments_ipv6 engine, host_service, MIGRATION_NETWORK, ip_configuration) File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", line 71, in modify_ip_config check_connectivity=True) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 36729, in setup_networks return self._internal_action(action, 'setupnetworks', None, headers, query, wait) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 299, in _internal_action return future.wait() if wait else future File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 296, in callback self._check_fault(response) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error Error: Fault reason is "Operation Failed". Fault detail is "[Network error during communication with the Host.]". HTTP response code is 400. -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From sabose at redhat.com Wed Apr 4 14:09:09 2018 From: sabose at redhat.com (Sahina Bose) Date: Wed, 4 Apr 2018 19:39:09 +0530 Subject: [ovirt-devel] [OST][HC] HE fails to deploy In-Reply-To: References: Message-ID: On Tue, Apr 3, 2018 at 1:50 PM, Simone Tiraboschi wrote: > > > On Tue, Apr 3, 2018 at 10:14 AM, Simone Tiraboschi > wrote: > >> >> >> On Mon, Apr 2, 2018 at 4:44 PM, Sahina Bose wrote: >> >>> HE fails to deploy at waiting for host to be up in the local HE VM. >>> The setup logs does not indicate why it failed - atleast I couldn't find >>> anything >>> >> >> I see: >> >> "status": "install_failed" >> >> So I think that something went wrong with host-deploy on that host but we >> definitively need host-deploy logs for that and they are just on the engine >> VM. >> > > According to the timestamps it could be related to: > Apr 2 09:58:13 lago-hc-basic-suite-master-host-0 systemd: Starting Open > vSwitch Database Unit... > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: runuser: > System error > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: > /etc/openvswitch/conf.db does not exist ... (warning). > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: Creating empty > database /etc/openvswitch/conf.db runuser: System error > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 ovs-ctl: [FAILED] > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: > ovsdb-server.service: control process exited, code=exited status=1 > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Failed to > start Open vSwitch Database Unit. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Unit > ovsdb-server.service entered failed state. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: > ovsdb-server.service failed. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Cannot add > dependency job for unit lvm2-lvmetad.socket, ignoring: Invalid request > descriptor > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Assertion > failed for Open vSwitch Delete Transient Ports. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: > ovsdb-server.service holdoff time over, scheduling restart. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Cannot add > dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: start request > repeated too quickly for ovsdb-server.service > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Failed to > start Open vSwitch Database Unit. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: Unit > ovsdb-server.service entered failed state. > Apr 2 09:58:14 lago-hc-basic-suite-master-host-0 systemd: > ovsdb-server.service failed. > Does this require an update to openvswitch rpms used in suite? Are the HE suites passing? > >> >> >>> >>> ---------- Forwarded message ---------- >>> From: >>> Date: Mon, Apr 2, 2018 at 7:50 PM >>> Subject: [oVirt Jenkins] ovirt-system-tests_hc-basic-suite-master - >>> Build # 276 - Still Failing! >>> To: infra at ovirt.org, sabose at redhat.com >>> >>> >>> Project: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-sui >>> te-master/ >>> Build: http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-sui >>> te-master/276/ >>> Build Number: 276 >>> Build Status: Still Failing >>> Triggered By: Started by timer >>> >>> ------------------------------------- >>> Changes Since Last Success: >>> ------------------------------------- >>> Changes for Build #265 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> [Sandro Bonazzola] ovirt-engine: add jobs for 4.1.10 async >>> >>> >>> Changes for Build #266 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> >>> Changes for Build #267 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> [Daniel Belenky] ppc repos: Use qemu EV release instead of test >>> >>> [Daniel Belenky] global_setup: Add generic package remove function >>> >>> [Daniel Belenky] Fix package verification in verify_packages >>> >>> >>> Changes for Build #268 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> >>> Changes for Build #269 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> >>> Changes for Build #270 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> >>> Changes for Build #271 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> >>> Changes for Build #272 >>> [Gal Ben Haim] Check if the prefix exists before printing its size >>> >>> >>> Changes for Build #273 >>> [Eitan Raviv] network: macpool: test disallowing dups while dups exist >>> >>> [Daniel Belenky] docker cleanup:Fix edge case for unamed containers >>> >>> [Daniel Belenky] nested_config: Count nesting level of options >>> >>> [Daniel Belenky] Introduce conditional execution in STDCI DSL >>> >>> [Daniel Belenky] Add OST STDCI V2 jobs >>> >>> >>> Changes for Build #274 >>> [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch >>> >>> >>> Changes for Build #275 >>> [Gal Ben Haim] he-iscsi-master: Temporarily exclude in check-patch >>> >>> >>> Changes for Build #276 >>> [Barak Korren] Force STDCI V2 job to use physical host >>> >>> [Daniel Belenky] Build container on changes to docker_cleanup >>> >>> >>> >>> >>> ----------------- >>> Failed Tests: >>> ----------------- >>> No tests ran. >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Wed Apr 4 14:27:04 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Wed, 4 Apr 2018 17:27:04 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Wed, Apr 4, 2018 at 4:59 PM, Barak Korren wrote: > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] > > Link to suspected patches: > (Probably unrelated) > https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - examples: > export template to an export domain > > This seems to happen multiple times sporadically, I thought this would > be solved by > https://gerrit.ovirt.org/#/c/89781/ but it isn't. right, it is a completely unrelated issue there (with external networks). here, however, the host dies while setting setupNetworks of an ipv6 address. Setup network waits for Engine's confirmation at 08:33:00,711 http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/vdsm/supervdsm.log but kernel messages stop at 08:33:23 http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/messages/*view*/ Does the lago VM of this host crash? pause? > > Link to Job: > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ > > Link to all logs: > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ > > Error snippet from log: > > > > Traceback (most recent call last): > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > testMethod() > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest > self.test(*self.arg) > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 129, in wrapped_test > test() > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 59, in wrapper > return func(get_test_prefix(), *args, **kwargs) > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 78, in wrapper > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", > line 139, in prepare_migration_attachments_ipv6 > engine, host_service, MIGRATION_NETWORK, ip_configuration) > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", > line 71, in modify_ip_config > check_connectivity=True) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > line 36729, in setup_networks > return self._internal_action(action, 'setupnetworks', None, > headers, query, wait) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 299, in _internal_action > return future.wait() if wait else future > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 55, in wait > return self._code(response) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 296, in callback > self._check_fault(response) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 132, in _check_fault > self._raise_error(response, body) > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > 118, in _raise_error > raise error > Error: Fault reason is "Operation Failed". Fault detail is "[Network > error during communication with the Host.]". HTTP response code is > 400. > > > > > > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > > From gbenhaim at redhat.com Wed Apr 4 16:01:51 2018 From: gbenhaim at redhat.com (Gal Ben Haim) Date: Wed, 4 Apr 2018 19:01:51 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: >From lago's log, I see that lago collected the logs from the VMs using ssh (after the test failed), which means that the VM didn't crash. On Wed, Apr 4, 2018 at 5:27 PM, Dan Kenigsberg wrote: > On Wed, Apr 4, 2018 at 4:59 PM, Barak Korren wrote: > > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] > > > > Link to suspected patches: > > (Probably unrelated) > > https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - examples: > > export template to an export domain > > > > This seems to happen multiple times sporadically, I thought this would > > be solved by > > https://gerrit.ovirt.org/#/c/89781/ but it isn't. > > right, it is a completely unrelated issue there (with external networks). > here, however, the host dies while setting setupNetworks of an ipv6 > address. Setup network waits for Engine's confirmation at 08:33:00,711 > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1537/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/lago- > basic-suite-4-2-host-0/_var_log/vdsm/supervdsm.log > but kernel messages stop at 08:33:23 > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1537/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/lago- > basic-suite-4-2-host-0/_var_log/messages/*view*/ > > Does the lago VM of this host crash? pause? > > > > > > Link to Job: > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ > > > > Link to all logs: > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1537/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/ > > > > Error snippet from log: > > > > > > > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > > testMethod() > > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in > runTest > > self.test(*self.arg) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 129, in wrapped_test > > test() > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 59, in wrapper > > return func(get_test_prefix(), *args, **kwargs) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 78, in wrapper > > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs > > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", > > line 139, in prepare_migration_attachments_ipv6 > > engine, host_service, MIGRATION_NETWORK, ip_configuration) > > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", > > line 71, in modify_ip_config > > check_connectivity=True) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > > line 36729, in setup_networks > > return self._internal_action(action, 'setupnetworks', None, > > headers, query, wait) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 299, in _internal_action > > return future.wait() if wait else future > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 55, in wait > > return self._code(response) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 296, in callback > > self._check_fault(response) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 132, in _check_fault > > self._raise_error(response, body) > > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line > > 118, in _raise_error > > raise error > > Error: Fault reason is "Operation Failed". Fault detail is "[Network > > error during communication with the Host.]". HTTP response code is > > 400. > > > > > > > > > > > > > > > > -- > > Barak Korren > > RHV DevOps team , RHCE, RHCi > > Red Hat EMEA > > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > > _______________________________________________ > > Devel mailing list > > Devel at ovirt.org > > http://lists.ovirt.org/mailman/listinfo/devel > > > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- *GAL bEN HAIM* RHV DEVOPS -------------- next part -------------- An HTML attachment was scrubbed... URL: From ylavi at redhat.com Wed Apr 4 16:07:17 2018 From: ylavi at redhat.com (Yaniv Lavi) Date: Wed, 4 Apr 2018 19:07:17 +0300 Subject: [ovirt-devel] [kubevirt-dev] Re: [virt-tools-list] Project for profiles and defaults for libvirt domains In-Reply-To: <20180322171753.GU3583@redhat.com> References: <20180320142031.GB23007@wheatley> <20180320151012.GU4530@redhat.com> <20180322145401.GD19999@wheatley> <20180322171753.GU3583@redhat.com> Message-ID: Hi, I'd like to go one step back and discuss why we should try to do this on the high level. For the last 5-10 years of KVM development, we are pragmatically providing the Linux host level APIs via project specific host agents/integration code (Nova agent, oVirt host agent, virt-manager). In recent time we see new projects that have similar requirements (Cockpit, different automation tool, KubeVirt), this means that all of the Linux virt stack consumers are reinventing the wheel and using very different paths to consume the partial solutions that are provided today. The use of the Linux virt stack is well defined by the existing projects scope and it makes a lot of sense to try to provide the common patterns via the virt stack directly as a host level API that different client or management consume. The main goal is to improve the developer experience for virtualization management applications with an API set that is useful to the entire set of tools (OSP, oVirt, KubeVirt, Cockpit and so on). The Linux virt developer community currently is not able to provide best practices and optimizations from single node knowledge. This means that all of that smarts is locked to the specific project integration in the good case or not provided at all and the projects as a whole lose from that. When testing the Linux virt stack itself and since each project has different usage pattern, we lose the ability to test abilities on the lower level making the entire stack less stable and complete for new features. This also limits the different projects ability to contribute back to the Linux stack based on their user and market experience for others in open source to gain. I understand this shift is technically challenging for existing projects, but I do see value in doing this even for new implementation like Cockpit and KubeVirt. I also believe that the end result could be appealing enough to cause project like OSP, virt-manager and oVirt to consider to reduce the existing capabilities of their host side integrations/agents to shims on the host level and reuse the common/better-tested pattern as clients that was developed against the experience of the different projects. I call us all to collaborate and try to converge on a solution that will help all in the long term in the value you get from the common base. Thanks, YANIV LAVI SENIOR TECHNICAL PRODUCT MANAGER Red Hat Israel Ltd. 34 Jerusalem Road, Building A, 1st floor Ra'anana, Israel 4350109 ylavi at redhat.com T: +972-9-7692306/8272306 F: +972-9-7692223 IM: ylavi TRIED. TESTED. TRUSTED. @redhatnews Red Hat Red Hat On Thu, Mar 22, 2018 at 7:18 PM, Daniel P. Berrang? wrote: > On Thu, Mar 22, 2018 at 03:54:01PM +0100, Martin Kletzander wrote: > > > > > > > One more thing could be automatically figuring out best values based > on > > > > libosinfo-provided data. > > > > > > > > 2) Policies > > > > > > > > Lot of the time there are parts of the domain definition that need > to be > > > > added, but nobody really cares about them. Sometimes it's enough to > > > > have few templates, another time you might want to have a policy > > > > per-scenario and want to combine them in various ways. For example > with > > > > the data provided by point 1). > > > > > > > > For example if you want PCI-Express, you need the q35 machine type, > but > > > > you don't really want to care about the machine type. Or you want to > > > > use SPICE, but you don't want to care about adding QXL. > > > > > > > > What if some of these policies could be specified once (using some > DSL > > > > for example), and used by virtuned to merge them in a unified and > > > > predictable way? > > > > > > > > 3) Abstracting the XML > > > > > > > > This is probably just usable for stateless apps, but it might happen > > > > that some apps don't really want to care about the XML at all. They > > > > just want an abstract view of the domain, possibly add/remove a > device > > > > and that's it. We could do that as well. I can't really tell how > much > > > > of a demand there is for it, though. > > > > > > It is safe to say that applications do not want to touch XML at all. > > > Any non-trivial application has created an abstraction around XML, > > > so that they have an API to express what they want, rather than > > > manipulating of strings to format/parse XML. > > > > > > > Sure, this was just meant to be a question as to whether it's worth > > pursuing or not. You make a good point on why it is not (at least for > > existing apps). > > > > However, since this was optional, the way this would look without the > > XML abstraction is that both input and output would be valid domain > > definitions, ultimately resulting in something similar to virt-xml with > > the added benefit of applying a policy from a file/string either > > supplied by the application itself. Whether that policy was taken from > > a common repository of such knowledge is orthogonal to this idea. Since > > you would work with the same data, the upgrade could be incremental as > > you'd only let virtuned fill in values for new options and could slowly > > move on to using it for some pre-existing ones. None of the previous > > approaches did this, if I'm not mistaken. Of course it gets more > > difficult when you need to expose all the bits libvirt does and keep > > them in sync (as you write below). > > That has implications for how mgmt app deals with XML. Nova has object > models for representing XML in memory, but it doesn't aim to have > loss-less roundtrip from parse -> object -> format. So if Nova gets > basic XML from virttuned, parses it into its object to let it set > more fields and then formats it again, chances are it will have lost > a bunch of stuff from virttuned. Of course if you know about this > need upfront you can design the application such that it can safely > round-trip, but this is just example of problem with integrating to > existing apps. > > The other thing that concerns is that there are dependancies between > different bits of XML for a given device. ie if feature X is set to > a certain value, that prevents use of feature Y. So if virttuned > sets feature X, but the downstream application uses feature Y, the > final result can be incompatible. The application won't know this > because it doesn't control what stuff virttuned would be setting. > This can in turn cause ordering constraints. > > eg the application needs to say that virtio-net is being used, then > virttuned can set some defaults like enabling vhost-net, and then > the application can fill in more bits that it cares about. Or if > we let virttuned go first, setting virtio-net model + vhost-net, > then application wants to change model to e1000e, it has to be > aware that it must now delete the vhost-net bit that virtuned > added. This ends up being more complicated that just ignoring > virttuned and coding up use of vhost-net in application code. > > > > > This is the same kind of problem we faced wrt libvirt-gconfig and > > > libvirt-gobject usage from virt-manager - it has an extensive code > > > base that already works, and rewriting it to use something new > > > is alot of work for no short-term benefit. libvirt-gconfig/gobject > > > were supposed to be the "easy" bits for virt-manager to adopt, as > > > they don't really include much logic that would step on virt-manager's > > > toes. libvirt-designer was going to be a very opinionated library > > > and in retrospective that makes it even harder to consider adopting > > > it for usage in virt-manager, as it'll have signficant liklihood > > > of making functionally significant changes in behaviour. > > > > > > > The initial idea (which I forgot to mention) was that all the decisions > > libvirt currently does (so that it keeps the guest ABI stable) would be > > moved into data (let's say some DSL) and it could then be switched or > > adjusted if that's not what the mgmt app wants (on a per-definition > > basis, of course). I didn't feel very optimistic about the upstream > > acceptance for that idea, so I figured that there could be something > > that lives beside libvirt, helps with some policies if requested and > > then the resulting XML could be fed into libvirt for determining the > > rest. > > I can't even imagine how we would go about encoding the stable guest > ABI logic libvirt does today in data ! > > > > > > There's also the problem with use of native libraries that would > > > impact many apps. We only got OpenStack to grudgingly allow the > > > > By native you mean actual binary libraries or native to the OpenStack > > code as in python module? Because what I had in mind for this project > > was a python module with optional wrapper for REST API. > > I meant native binary libraries. ie openstack is not happy in general > with adding dependancies on new OS services, because there's a big > time lag for getting them into all distros. By comparison a pure > python library, they can just handle automatically in their deployment > tools, just pip installing on any OS distro straight from pypi. This > is what made use of libosinfo a hard sell in Nova. > > The same thing is seen with Go / Rust where some applications have > decided they're better of actually re-implementing the libvirt RPC > protocol in Go / Rust rather than use the libvirt.so client. I think > this is a bad tradeoff in general, but I can see why they like it > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/ > dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/ > dberrange :| > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ylavi at redhat.com Wed Apr 4 16:23:01 2018 From: ylavi at redhat.com (Yaniv Lavi) Date: Wed, 4 Apr 2018 19:23:01 +0300 Subject: [ovirt-devel] [kubevirt-dev] Re: [virt-tools-list] Project for profiles and defaults for libvirt domains In-Reply-To: References: <20180320142031.GB23007@wheatley> <20180320151012.GU4530@redhat.com> <20180322145401.GD19999@wheatley> <20180322171753.GU3583@redhat.com> Message-ID: [resending to include KubeVirt devs ] YANIV LAVI SENIOR TECHNICAL PRODUCT MANAGER Red Hat Israel Ltd. 34 Jerusalem Road, Building A, 1st floor Ra'anana, Israel 4350109 ylavi at redhat.com T: +972-9-7692306/8272306 F: +972-9-7692223 IM: ylavi TRIED. TESTED. TRUSTED. @redhatnews Red Hat Red Hat On Wed, Apr 4, 2018 at 7:07 PM, Yaniv Lavi wrote: > Hi, > I'd like to go one step back and discuss why we should try to do this on > the high level. > > For the last 5-10 years of KVM development, we are pragmatically providing > the Linux host level APIs via project specific host agents/integration code > (Nova agent, oVirt host agent, virt-manager). > In recent time we see new projects that have similar requirements > (Cockpit, different automation tool, KubeVirt), this means that all of the > Linux virt stack consumers are reinventing the wheel and using very > different paths to consume the partial solutions that are provided today. > > The use of the Linux virt stack is well defined by the existing projects > scope and it makes a lot of sense to try to provide the common patterns via > the virt stack directly as a host level API that different client or > management consume. > The main goal is to improve the developer experience for virtualization > management applications with an API set that is useful to the entire set of > tools (OSP, oVirt, KubeVirt, Cockpit and so on). > > The Linux virt developer community currently is not able to provide best > practices and optimizations from single node knowledge. This means that all > of that smarts is locked to the specific project integration in the good > case or not provided at all and the projects as a whole lose from that. > When testing the Linux virt stack itself and since each project has > different usage pattern, we lose the ability to test abilities on the lower > level making the entire stack less stable and complete for new features. > > This also limits the different projects ability to contribute back to the > Linux stack based on their user and market experience for others in open > source to gain. > > I understand this shift is technically challenging for existing projects, > but I do see value in doing this even for new implementation like Cockpit > and KubeVirt. > I also believe that the end result could be appealing enough to cause > project like OSP, virt-manager and oVirt to consider to reduce the existing > capabilities of their host side integrations/agents to shims on the host > level and reuse the common/better-tested pattern as clients that was > developed against the experience of the different projects. > > I call us all to collaborate and try to converge on a solution that will > help all in the long term in the value you get from the common base. > > > Thanks, > > YANIV LAVI > > SENIOR TECHNICAL PRODUCT MANAGER > > Red Hat Israel Ltd. > > 34 Jerusalem Road, Building A, 1st floor > > Ra'anana, Israel 4350109 > > ylavi at redhat.com T: +972-9-7692306/8272306 F: +972-9-7692223 IM: ylavi > TRIED. TESTED. TRUSTED. > @redhatnews Red Hat Red Hat > > > On Thu, Mar 22, 2018 at 7:18 PM, Daniel P. Berrang? > wrote: > >> On Thu, Mar 22, 2018 at 03:54:01PM +0100, Martin Kletzander wrote: >> > > >> > > > One more thing could be automatically figuring out best values >> based on >> > > > libosinfo-provided data. >> > > > >> > > > 2) Policies >> > > > >> > > > Lot of the time there are parts of the domain definition that need >> to be >> > > > added, but nobody really cares about them. Sometimes it's enough to >> > > > have few templates, another time you might want to have a policy >> > > > per-scenario and want to combine them in various ways. For example >> with >> > > > the data provided by point 1). >> > > > >> > > > For example if you want PCI-Express, you need the q35 machine type, >> but >> > > > you don't really want to care about the machine type. Or you want >> to >> > > > use SPICE, but you don't want to care about adding QXL. >> > > > >> > > > What if some of these policies could be specified once (using some >> DSL >> > > > for example), and used by virtuned to merge them in a unified and >> > > > predictable way? >> > > > >> > > > 3) Abstracting the XML >> > > > >> > > > This is probably just usable for stateless apps, but it might happen >> > > > that some apps don't really want to care about the XML at all. They >> > > > just want an abstract view of the domain, possibly add/remove a >> device >> > > > and that's it. We could do that as well. I can't really tell how >> much >> > > > of a demand there is for it, though. >> > > >> > > It is safe to say that applications do not want to touch XML at all. >> > > Any non-trivial application has created an abstraction around XML, >> > > so that they have an API to express what they want, rather than >> > > manipulating of strings to format/parse XML. >> > > >> > >> > Sure, this was just meant to be a question as to whether it's worth >> > pursuing or not. You make a good point on why it is not (at least for >> > existing apps). >> > >> > However, since this was optional, the way this would look without the >> > XML abstraction is that both input and output would be valid domain >> > definitions, ultimately resulting in something similar to virt-xml with >> > the added benefit of applying a policy from a file/string either >> > supplied by the application itself. Whether that policy was taken from >> > a common repository of such knowledge is orthogonal to this idea. Since >> > you would work with the same data, the upgrade could be incremental as >> > you'd only let virtuned fill in values for new options and could slowly >> > move on to using it for some pre-existing ones. None of the previous >> > approaches did this, if I'm not mistaken. Of course it gets more >> > difficult when you need to expose all the bits libvirt does and keep >> > them in sync (as you write below). >> >> That has implications for how mgmt app deals with XML. Nova has object >> models for representing XML in memory, but it doesn't aim to have >> loss-less roundtrip from parse -> object -> format. So if Nova gets >> basic XML from virttuned, parses it into its object to let it set >> more fields and then formats it again, chances are it will have lost >> a bunch of stuff from virttuned. Of course if you know about this >> need upfront you can design the application such that it can safely >> round-trip, but this is just example of problem with integrating to >> existing apps. >> >> The other thing that concerns is that there are dependancies between >> different bits of XML for a given device. ie if feature X is set to >> a certain value, that prevents use of feature Y. So if virttuned >> sets feature X, but the downstream application uses feature Y, the >> final result can be incompatible. The application won't know this >> because it doesn't control what stuff virttuned would be setting. >> This can in turn cause ordering constraints. >> >> eg the application needs to say that virtio-net is being used, then >> virttuned can set some defaults like enabling vhost-net, and then >> the application can fill in more bits that it cares about. Or if >> we let virttuned go first, setting virtio-net model + vhost-net, >> then application wants to change model to e1000e, it has to be >> aware that it must now delete the vhost-net bit that virtuned >> added. This ends up being more complicated that just ignoring >> virttuned and coding up use of vhost-net in application code. >> >> >> > > This is the same kind of problem we faced wrt libvirt-gconfig and >> > > libvirt-gobject usage from virt-manager - it has an extensive code >> > > base that already works, and rewriting it to use something new >> > > is alot of work for no short-term benefit. libvirt-gconfig/gobject >> > > were supposed to be the "easy" bits for virt-manager to adopt, as >> > > they don't really include much logic that would step on virt-manager's >> > > toes. libvirt-designer was going to be a very opinionated library >> > > and in retrospective that makes it even harder to consider adopting >> > > it for usage in virt-manager, as it'll have signficant liklihood >> > > of making functionally significant changes in behaviour. >> > > >> > >> > The initial idea (which I forgot to mention) was that all the decisions >> > libvirt currently does (so that it keeps the guest ABI stable) would be >> > moved into data (let's say some DSL) and it could then be switched or >> > adjusted if that's not what the mgmt app wants (on a per-definition >> > basis, of course). I didn't feel very optimistic about the upstream >> > acceptance for that idea, so I figured that there could be something >> > that lives beside libvirt, helps with some policies if requested and >> > then the resulting XML could be fed into libvirt for determining the >> > rest. >> >> I can't even imagine how we would go about encoding the stable guest >> ABI logic libvirt does today in data ! >> >> > >> > > There's also the problem with use of native libraries that would >> > > impact many apps. We only got OpenStack to grudgingly allow the >> > >> > By native you mean actual binary libraries or native to the OpenStack >> > code as in python module? Because what I had in mind for this project >> > was a python module with optional wrapper for REST API. >> >> I meant native binary libraries. ie openstack is not happy in general >> with adding dependancies on new OS services, because there's a big >> time lag for getting them into all distros. By comparison a pure >> python library, they can just handle automatically in their deployment >> tools, just pip installing on any OS distro straight from pypi. This >> is what made use of libosinfo a hard sell in Nova. >> >> The same thing is seen with Go / Rust where some applications have >> decided they're better of actually re-implementing the libvirt RPC >> protocol in Go / Rust rather than use the libvirt.so client. I think >> this is a bad tradeoff in general, but I can see why they like it >> >> Regards, >> Daniel >> -- >> |: https://berrange.com -o- https://www.flickr.com/photos/ >> dberrange :| >> |: https://libvirt.org -o- >> https://fstop138.berrange.com :| >> |: https://entangle-photo.org -o- https://www.instagram.com/dber >> range :| >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkotas at redhat.com Thu Apr 5 11:53:07 2018 From: pkotas at redhat.com (Petr Kotas) Date: Thu, 5 Apr 2018 13:53:07 +0200 Subject: [ovirt-devel] How To Construct An Local Developing Enviroment? In-Reply-To: <201804031856386663754@coretek.com.cn> References: <201804031856386663754@coretek.com.cn> Message-ID: Hi, you can take a look at my env here https://github.com/petrkotas/ovirt-dev-env it is not complete and there is still some work left. Feel free to contact me if you have any questions. Best, Petr On Tue, Apr 3, 2018 at 12:56 PM, sundw wrote: > Hi Guys! > I plan to construct an local developing environment for oVirt. > But I can not find any relative pages from the web site(www.ovirt.org). > Maybe I am not patient :). > > Could you please give me some advice or some links about > How To Construct An Local Developing Enviroment? > > Thanks! > > ------------------------------ > *???* > ????????????/?????? > 13378105625 > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbonazzo at redhat.com Thu Apr 5 13:50:32 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Thu, 5 Apr 2018 15:50:32 +0200 Subject: [ovirt-devel] [oVirt Jenkins] ovirt-system-tests_he-basic-ansible-suite-master - Build # 143 - Still Failing! In-Reply-To: References: <10152330.1097.1522918439558.JavaMail.jenkins@jenkins.phx.ovirt.org> <270731872.1104.1522935461316.JavaMail.jenkins@jenkins.phx.ovirt.org> Message-ID: Adding devel mailing list and some relevant people. 2018-04-05 15:41 GMT+02:00 Simone Tiraboschi : > The first error is here: > > > *13:25:37* [ INFO ] TASK [Wait for the engine to come up on the target VM]*13:37:01* [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.316574", "end": "2018-04-05 09:37:00.826888", "rc": 0, "start": "2018-04-05 09:37:00.510314", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=1838 (Thu Apr 5 09:36:51 2018)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=1838 (Thu Apr 5 09:36:51 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineUnexpectedlyDown\\nstopped=False\\ntimeout=Wed Dec 31 19:31:35 1969\\n\", \"hostname\": \"lago-he-basic-ansible-suite-master-host0\", \"host-id\": 1, \"engine-status\": {\"reason\": \"bad vm status\", \"health\": \"bad\", \"vm\": \"down_unexpected\", \"detail\": \"Down\"}, \"score\": 0, \"stopped\": false, \"maintenance\": false, \"crc32\": \"72e3e0e9\", \"local_conf_timestamp\": 1838, \"host-ts\": 1838}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=1838 (Thu Apr 5 09:36:51 2018)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=1838 (Thu Apr 5 09:36:51 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineUnexpectedlyDown\\nstopped=False\\ntimeout=Wed Dec 31 19:31:35 1969\\n\", \"hostname\": \"lago-he-basic-ansible-suite-master-host0\", \"host-id\": 1, \"engine-status\": {\"reason\": \"bad vm status\", \"health\": \"bad\", \"vm\": \"down_unexpected\", \"detail\": \"Down\"}, \"score\": 0, \"stopped\": false, \"maintenance\": false, \"crc32\": \"72e3e0e9\", \"local_conf_timestamp\": 1838, \"host-ts\": 1838}, \"global_maintenance\": false}"]}*13:37:01* [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook > > > Anso so we have to check VDSM logs where we can find: > > 2018-04-05 09:27:01,083-0400 ERROR (vm/9bb2deab) [virt.vm] (vmId='9bb2deab-f686-4629-acee-90e2774a66b6') The vm start process failed (vm:945) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 874, in _startUnderlyingVm > self._run() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2832, in _run > domxml = hooks.before_vm_start(self._buildDomainXML(), > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2258, in _buildDomainXML > return vmxml.format_xml(dom, pretty=True) > File "/usr/lib/python2.7/site-packages/vdsm/virt/vmxml.py", line 71, in format_xml > stream, encoding='utf-8', xml_declaration=True) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 820, in write > serialize(write, self._root, encoding, qnames, namespaces) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml > _serialize_xml(write, e, encoding, qnames, None) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml > _serialize_xml(write, e, encoding, qnames, None) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml > _serialize_xml(write, e, encoding, qnames, None) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml > v = _escape_attrib(v, encoding) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1092, in _escape_attrib > _raise_serialization_error(text) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1052, in _raise_serialization_error > "cannot serialize %r (type %s)" % (text, type(text).__name__) > TypeError: cannot serialize 0 (type int) > 2018-04-05 09:27:01,084-0400 INFO (vm/9bb2deab) [virt.vm] (vmId='9bb2deab-f686-4629-acee-90e2774a66b6') Changed state to Down: cannot serialize 0 (type int) (code=1) (vm:1685) > > > It seams another regression with the libvirt XML generated by the engine. > > > > On Thu, Apr 5, 2018 at 3:37 PM, wrote: > >> Project: http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ans >> ible-suite-master/ >> Build: http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ans >> ible-suite-master/143/ >> Build Number: 143 >> Build Status: Still Failing >> Triggered By: Started by user Simone Tiraboschi >> >> ------------------------------------- >> Changes Since Last Success: >> ------------------------------------- >> Changes for Build #139 >> [Gal Ben Haim] he: Share control.sh between all the non-ansible suites >> >> [Daniel Belenky] Workaround for a race in gerrit-trigger plugin >> >> >> Changes for Build #140 >> [Gal Ben Haim] he: Share control.sh between all the non-ansible suites >> >> >> Changes for Build #141 >> [Gal Ben Haim] he: Share control.sh between all the non-ansible suites >> >> >> Changes for Build #142 >> [Yaniv Kaul] Disable 006_migrations.prepare_migration_attachments_ipv6 >> >> >> Changes for Build #143 >> [Yaniv Kaul] Disable 006_migrations.prepare_migration_attachments_ipv6 >> >> >> >> >> ----------------- >> Failed Tests: >> ----------------- >> No tests ran. > > > -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mskrivan at redhat.com Thu Apr 5 14:20:36 2018 From: mskrivan at redhat.com (Michal Skrivanek) Date: Thu, 5 Apr 2018 16:20:36 +0200 Subject: [ovirt-devel] [oVirt Jenkins] ovirt-system-tests_he-basic-ansible-suite-master - Build # 143 - Still Failing! In-Reply-To: References: <10152330.1097.1522918439558.JavaMail.jenkins@jenkins.phx.ovirt.org> <270731872.1104.1522935461316.JavaMail.jenkins@jenkins.phx.ovirt.org> Message-ID: > On 5 Apr 2018, at 15:50, Sandro Bonazzola wrote: > > Adding devel mailing list and some relevant people. should pass now with https://gerrit.ovirt.org/#/c/89873/ (under verification) Thanks, michal > > 2018-04-05 15:41 GMT+02:00 Simone Tiraboschi >: > The first error is here: > > > 13:25:37 [ INFO ] TASK [Wait for the engine to come up on the target VM] > 13:37:01 [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.316574", "end": "2018-04-05 09:37:00.826888", "rc": 0, "start": "2018-04-05 09:37:00.510314", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=1838 (Thu Apr 5 09:36:51 2018)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=1838 (Thu Apr 5 09:36:51 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineUnexpectedlyDown\\nstopped=False\\ntimeout=Wed Dec 31 19:31:35 1969\\n\", \"hostname\": \"lago-he-basic-ansible-suite-master-host0\", \"host-id\": 1, \"engine-status\": {\"reason\": \"bad vm status\", \"health\": \"bad\", \"vm\": \"down_unexpected\", \"detail\": \"Down\"}, \"score\": 0, \"stopped\": false, \"maintenance\": false, \"crc32\": \"72e3e0e9\", \"local_conf_timestamp\": 1838, \"host-ts\": 1838}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=1838 (Thu Apr 5 09:36:51 2018)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=1838 (Thu Apr 5 09:36:51 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineUnexpectedlyDown\\nstopped=False\\ntimeout=Wed Dec 31 19:31:35 1969\\n\", \"hostname\": \"lago-he-basic-ansible-suite-master-host0\", \"host-id\": 1, \"engine-status\": {\"reason\": \"bad vm status\", \"health\": \"bad\", \"vm\": \"down_unexpected\", \"detail\": \"Down\"}, \"score\": 0, \"stopped\": false, \"maintenance\": false, \"crc32\": \"72e3e0e9\", \"local_conf_timestamp\": 1838, \"host-ts\": 1838}, \"global_maintenance\": false}"]} > 13:37:01 [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook > > Anso so we have to check VDSM logs where we can find: > 2018-04-05 09:27:01,083-0400 ERROR (vm/9bb2deab) [virt.vm] (vmId='9bb2deab-f686-4629-acee-90e2774a66b6') The vm start process failed (vm:945) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 874, in _startUnderlyingVm > self._run() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2832, in _run > domxml = hooks.before_vm_start(self._buildDomainXML(), > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2258, in _buildDomainXML > return vmxml.format_xml(dom, pretty=True) > File "/usr/lib/python2.7/site-packages/vdsm/virt/vmxml.py", line 71, in format_xml > stream, encoding='utf-8', xml_declaration=True) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 820, in write > serialize(write, self._root, encoding, qnames, namespaces) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml > _serialize_xml(write, e, encoding, qnames, None) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml > _serialize_xml(write, e, encoding, qnames, None) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml > _serialize_xml(write, e, encoding, qnames, None) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml > v = _escape_attrib(v, encoding) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1092, in _escape_attrib > _raise_serialization_error(text) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1052, in _raise_serialization_error > "cannot serialize %r (type %s)" % (text, type(text).__name__) > TypeError: cannot serialize 0 (type int) > 2018-04-05 09:27:01,084-0400 INFO (vm/9bb2deab) [virt.vm] (vmId='9bb2deab-f686-4629-acee-90e2774a66b6') Changed state to Down: cannot serialize 0 (type int) (code=1) (vm:1685) > > It seams another regression with the libvirt XML generated by the engine. > > > > On Thu, Apr 5, 2018 at 3:37 PM, > wrote: > Project: http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-master/ > Build: http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-master/143/ > Build Number: 143 > Build Status: Still Failing > Triggered By: Started by user Simone Tiraboschi > > ------------------------------------- > Changes Since Last Success: > ------------------------------------- > Changes for Build #139 > [Gal Ben Haim] he: Share control.sh between all the non-ansible suites > > [Daniel Belenky] Workaround for a race in gerrit-trigger plugin > > > Changes for Build #140 > [Gal Ben Haim] he: Share control.sh between all the non-ansible suites > > > Changes for Build #141 > [Gal Ben Haim] he: Share control.sh between all the non-ansible suites > > > Changes for Build #142 > [Yaniv Kaul] Disable 006_migrations.prepare_migration_attachments_ipv6 > > > Changes for Build #143 > [Yaniv Kaul] Disable 006_migrations.prepare_migration_attachments_ipv6 > > > > > ----------------- > Failed Tests: > ----------------- > No tests ran. > > > > > -- > SANDRO BONAZZOLA > ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D > Red Hat?EMEA > sbonazzo at redhat.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vszocs at redhat.com Thu Apr 5 15:39:10 2018 From: vszocs at redhat.com (Vojtech Szocs) Date: Thu, 5 Apr 2018 17:39:10 +0200 Subject: [ovirt-devel] How To Construct An Local Developing Enviroment? In-Reply-To: References: <201804031856386663754@coretek.com.cn> Message-ID: On Thu, Apr 5, 2018 at 1:53 PM, Petr Kotas wrote: > Hi, > > you can take a look at my env here https://github.com/petrkotas/ > ovirt-dev-env > ?Thanks Petr for sharing this! > > > it is not complete and there is still some work left. > > Feel free to contact me if you have any questions. > > Best, > Petr > > On Tue, Apr 3, 2018 at 12:56 PM, sundw > wrote: > >> Hi Guys! >> I plan to construct an local developing environment for oVirt. >> But I can not find any relative pages from the web site(www.ovirt.org). >> Maybe I am not patient :). >> >> Could you please give me some advice or some links about >> How To Construct An Local Developing Enviroment? >> >> Thanks! >> >> ------------------------------ >> *???* >> ????????????/?????? >> 13378105625 >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Thu Apr 5 15:46:08 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Thu, 05 Apr 2018 15:46:08 +0000 Subject: [ovirt-devel] How To Construct An Local Developing Enviroment? In-Reply-To: References: <201804031856386663754@coretek.com.cn> Message-ID: +1. Petr, can we make this more official? Anyone else working on it? I also had the thought of using ansible to stand up the environment on bare metal. Greg On Thu, Apr 5, 2018, 11:41 AM Vojtech Szocs wrote: > > > On Thu, Apr 5, 2018 at 1:53 PM, Petr Kotas wrote: > >> Hi, >> >> you can take a look at my env here >> https://github.com/petrkotas/ovirt-dev-env >> > > ?Thanks Petr for sharing this! > > >> >> >> it is not complete and there is still some work left. >> >> Feel free to contact me if you have any questions. >> >> Best, >> Petr >> >> On Tue, Apr 3, 2018 at 12:56 PM, sundw >> wrote: >> >>> Hi Guys! >>> I plan to construct an local developing environment for oVirt. >>> But I can not find any relative pages from the web site(www.ovirt.org). >>> Maybe I am not patient :). >>> >>> Could you please give me some advice or some links about >>> How To Construct An Local Developing Enviroment? >>> >>> Thanks! >>> >>> ------------------------------ >>> *???* >>> ????????????/?????? >>> 13378105625 >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabose at redhat.com Fri Apr 6 07:28:22 2018 From: sabose at redhat.com (Sahina Bose) Date: Fri, 6 Apr 2018 12:58:22 +0530 Subject: [ovirt-devel] [OST][HC] HE fails to deploy In-Reply-To: References: Message-ID: 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} Both the 4.2 and master suites are failing on getting local VM IP. Any idea what changed or if I have to change the test? thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From stirabos at redhat.com Fri Apr 6 07:40:12 2018 From: stirabos at redhat.com (Simone Tiraboschi) Date: Fri, 6 Apr 2018 09:40:12 +0200 Subject: [ovirt-devel] [OST][HC] HE fails to deploy In-Reply-To: References: Message-ID: On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose wrote: > 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] > 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} > 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} > > Both the 4.2 and master suites are failing on getting local VM IP. > Any idea what changed or if I have to change the test? > > thanks! > Hi Sahina, 4.2 and master suite non HC are correctly running this morning. http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-ansible-suite-master/146/ http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-ansible-suite-4.2/76/ I'll try to check the difference with HC suites. Are you using more than one subnet in the HC suites? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Fri Apr 6 11:20:11 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 6 Apr 2018 12:20:11 +0100 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Dan, was there a fix for the issues? can I have a link to the fix if there was? Thanks, Dafna On Wed, Apr 4, 2018 at 5:01 PM, Gal Ben Haim wrote: > From lago's log, I see that lago collected the logs from the VMs using ssh > (after the test failed), which means > that the VM didn't crash. > > On Wed, Apr 4, 2018 at 5:27 PM, Dan Kenigsberg wrote: > >> On Wed, Apr 4, 2018 at 4:59 PM, Barak Korren wrote: >> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] >> > >> > Link to suspected patches: >> > (Probably unrelated) >> > https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - examples: >> > export template to an export domain >> > >> > This seems to happen multiple times sporadically, I thought this would >> > be solved by >> > https://gerrit.ovirt.org/#/c/89781/ but it isn't. >> >> right, it is a completely unrelated issue there (with external networks). >> here, however, the host dies while setting setupNetworks of an ipv6 >> address. Setup network waits for Engine's confirmation at 08:33:00,711 >> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1 >> 537/artifact/exported-artifacts/basic-suit-4.2-el7/test_ >> logs/basic-suite-4.2/post-006_migrations.py/lago-basic- >> suite-4-2-host-0/_var_log/vdsm/supervdsm.log >> but kernel messages stop at 08:33:23 >> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1 >> 537/artifact/exported-artifacts/basic-suit-4.2-el7/test_ >> logs/basic-suite-4.2/post-006_migrations.py/lago-basic- >> suite-4-2-host-0/_var_log/messages/*view*/ >> >> Does the lago VM of this host crash? pause? >> >> >> > >> > Link to Job: >> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ >> > >> > Link to all logs: >> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1 >> 537/artifact/exported-artifacts/basic-suit-4.2-el7/test_ >> logs/basic-suite-4.2/post-006_migrations.py/ >> > >> > Error snippet from log: >> > >> > >> > >> > Traceback (most recent call last): >> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >> > testMethod() >> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in >> runTest >> > self.test(*self.arg) >> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> > 129, in wrapped_test >> > test() >> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> > 59, in wrapper >> > return func(get_test_prefix(), *args, **kwargs) >> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> > 78, in wrapper >> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs >> > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt >> -system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", >> > line 139, in prepare_migration_attachments_ipv6 >> > engine, host_service, MIGRATION_NETWORK, ip_configuration) >> > File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt >> -system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", >> > line 71, in modify_ip_config >> > check_connectivity=True) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", >> > line 36729, in setup_networks >> > return self._internal_action(action, 'setupnetworks', None, >> > headers, query, wait) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 299, in _internal_action >> > return future.wait() if wait else future >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 55, in wait >> > return self._code(response) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 296, in callback >> > self._check_fault(response) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 132, in _check_fault >> > self._raise_error(response, body) >> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >> > 118, in _raise_error >> > raise error >> > Error: Fault reason is "Operation Failed". Fault detail is "[Network >> > error during communication with the Host.]". HTTP response code is >> > 400. >> > >> > >> > >> > >> > >> > >> > >> > -- >> > Barak Korren >> > RHV DevOps team , RHCE, RHCi >> > Red Hat EMEA >> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >> > _______________________________________________ >> > Devel mailing list >> > Devel at ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/devel >> > >> > >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > *GAL bEN HAIM* > RHV DEVOPS > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Fri Apr 6 12:20:45 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 6 Apr 2018 13:20:45 +0100 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt Master (ovirt-engine) ] [ 06-04-2018 ] [ 006_migrations.migrate_vm ] Message-ID: hi, 006_migrations.migrate_vm failure points to the below patch. There seems to be an issue to migrate vms but I am not sure its related to this patch. can you please check? *Link and headline of suspected patches: aaa: can't switch user when accessing the engine with an active kerberos ticket - https://gerrit.ovirt.org/#/c/89872/ Link to Job:* * http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6704/ Link to all logs:* http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6704/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/ *(Relevant) error snippet from the log: 2018-04-06 04:07:56,517-0400 ERROR (qgapoller/2) [Executor] Unhandled exception in timeout=10, duration=0.00 at 0x7fc4541193d0> (executor:317)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 314, in __call__ self._execute() File "/usr/lib/python2.7/site-packages/vdsm/virt/qemuguestagent.py", line 388, in _execute for ifname, ifparams in six.iteritems(interfaces): File "/usr/lib/python2.7/site-packages/six.py", line 599, in iteritems return d.iteritems(**kw)AttributeError: 'list' object has no attribute 'iteritems'*2018-04-06 04:23:10,165-0400 ERROR (migsrc/aad106a1) [virt.vm] (vmId='aad106a1-9043-4661-bb9b-9894f66b506b') migration destination error: Virtual machine already exists (migration:290) ** -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Fri Apr 6 12:46:32 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 6 Apr 2018 13:46:32 +0100 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 (vdsm) ] [ 05-04-2018 ] [ 004_basic_sanity.hotplug_cpu ] Message-ID: Hi, We had a failure on 004_basic_sanity.hotplug_cpu. looking at the logs, I do not see an issue on the cpu hotplug but it seems that the host had an issue with network config. can you please have a look? *Link and headline of suspected patches: Failed patch: * *virt: storage: minimal changes to the drive XML - https://gerrit.ovirt.org/#/c/89508/ * *Reported as failure cause: * *virt: extract local adjustments to XML - https://gerrit.ovirt.org/#/c/89506/ * *Link to Job:http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1565/ Link to all logs:http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1565/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-004_basic_sanity.py/ (Relevant) error snippet from the log: MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:17,451::cmdutils::150::root::(exec_cmd) /sbin/ifdown eth2 (cwd None)MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:17,719::cmdutils::158::root::(exec_cmd) SUCCESS: = ''; = 0MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:17,720::cmdutils::150::root::(exec_cmd) /sbin/ifdown eth3 (cwd None)MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:17,982::cmdutils::158::root::(exec_cmd) SUCCESS: = ''; = 0ifup/onc5bbb5eed6a84::DEBUG::2018-04-05 18:58:18,165::cmdutils::158::root::(exec_cmd) FAILED: = 'Running scope as unit b32c8477-857f-476c-96c8-580daf0d4217.scope.\n/etc/sysconfig/network-scripts/ifup-eth: line 304: 12145 Terminated /sbin/dhclient ${DHCLIENTARGS} ${DEVICE}\nDevice "onc5bbb5eed6a84" does not exist.\nCannot find device "onc5bbb5eed6a84"\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\n'; = 1ifup/onc5bbb5eed6a84::ERROR::2018-04-05 18:58:18,165::concurrent::201::root::(run) FINISH thread failedTraceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 942, in _exec_ifup _exec_ifup_by_name(iface.name , cgroup) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 928, in _exec_ifup_by_name raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')ConfigNetworkError: (29, '\n')MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:20,075::cmdutils::150::root::(exec_cmd) /sbin/ip addr flush dev eth2 scope global (cwd None)MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:20,081::cmdutils::158::root::(exec_cmd) SUCCESS: = ''; = 0MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:20,082::ifcfg::479::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-eth2MainProcess|jsonrpc/2::DEBUG::2018-04-05 18:58:20,084::ifcfg::512::root::(_persistentBackup) backing up ifcfg-eth2: HWADDR="54:52:c0:a8:c9:02" BOOTPROTO="dhcp" ONBOOT="yes" TYPE="Ethernet" NAME="eth2" * -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkotas at redhat.com Fri Apr 6 15:05:33 2018 From: pkotas at redhat.com (Petr Kotas) Date: Fri, 6 Apr 2018 17:05:33 +0200 Subject: [ovirt-devel] How To Construct An Local Developing Enviroment? In-Reply-To: References: <201804031856386663754@coretek.com.cn> Message-ID: Hi everyone, first of all, thanks for having an interest in my dev setup :). Second, I am open to making it more official. The first step is to create list of missing features. Please if you can, take a look at the setup and create an issue for any feature you would like to see in there. Thanks!! Best, Petr On Thu, Apr 5, 2018 at 5:46 PM, Greg Sheremeta wrote: > +1. Petr, can we make this more official? Anyone else working on it? > > I also had the thought of using ansible to stand up the environment on > bare metal. > > Greg > > > On Thu, Apr 5, 2018, 11:41 AM Vojtech Szocs wrote: > >> >> >> On Thu, Apr 5, 2018 at 1:53 PM, Petr Kotas wrote: >> >>> Hi, >>> >>> you can take a look at my env here https://github.com/petrkotas/ >>> ovirt-dev-env >>> >> >> ?Thanks Petr for sharing this! >> >> >>> >>> >>> it is not complete and there is still some work left. >>> >>> Feel free to contact me if you have any questions. >>> >>> Best, >>> Petr >>> >>> On Tue, Apr 3, 2018 at 12:56 PM, sundw >>> wrote: >>> >>>> Hi Guys! >>>> I plan to construct an local developing environment for oVirt. >>>> But I can not find any relative pages from the web site(www.ovirt.org). >>>> Maybe I am not patient :). >>>> >>>> Could you please give me some advice or some links about >>>> How To Construct An Local Developing Enviroment? >>>> >>>> Thanks! >>>> >>>> ------------------------------ >>>> *???* >>>> ????????????/?????? >>>> 13378105625 >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mperina at redhat.com Fri Apr 6 16:07:10 2018 From: mperina at redhat.com (Martin Perina) Date: Fri, 6 Apr 2018 18:07:10 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt Master (ovirt-engine) ] [ 06-04-2018 ] [ 006_migrations.migrate_vm ] In-Reply-To: References: Message-ID: Adding Ravi On Fri, Apr 6, 2018 at 2:20 PM, Dafna Ron wrote: > hi, > > 006_migrations.migrate_vm failure points to the below patch. There seems > to be an issue to migrate vms but I am not sure its related to this patch. > > can you please check? > > > > *Link and headline of suspected patches: aaa: can't switch user when > accessing the engine with an active kerberos ticket - > https://gerrit.ovirt.org/#/c/89872/ > Link to Job:* > > * http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6704/ > Link > to all logs:* http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/6704/artifact/exported-artifacts/basic-suit-master- > el7/test_logs/basic-suite-master/post-006_migrations.py/ > > > > > > > > > > > > > > > > > > > > *(Relevant) error snippet from the log: 2018-04-06 > 04:07:56,517-0400 ERROR (qgapoller/2) [Executor] Unhandled exception in > vm=aad106a1-9043-4661-bb9b-9894f66b506b at 0x7fc4681a2590> timeout=10, > duration=0.00 at 0x7fc4541193d0> (executor:317)Traceback (most recent call > last): File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, > in _execute_task task() File > "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in > __call__ self._callable() File > "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 314, in > __call__ self._execute() File > "/usr/lib/python2.7/site-packages/vdsm/virt/qemuguestagent.py", line 388, > in _execute for ifname, ifparams in six.iteritems(interfaces): File > "/usr/lib/python2.7/site-packages/six.py", line 599, in iteritems return > d.iteritems(**kw)AttributeError: 'list' object has no attribute 'iteritems'*2018-04-06 > 04:23:10,165-0400 ERROR (migsrc/aad106a1) [virt.vm] > (vmId='aad106a1-9043-4661-bb9b-9894f66b506b') migration destination > error: Virtual machine already exists (migration:290) > > > ** > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Fri Apr 6 16:10:11 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 6 Apr 2018 17:10:11 +0100 Subject: [ovirt-devel] OST Failure - Weekly update [30/03/2018-06/04/2018] Message-ID: Hello, I would like to update on this week's failures and OST current status. We had a few failures this week and some are still on-going issues. We had two unrelated issues failing the test: 006_migrations.prepare_migration_attachments_ipv6 This failure was periodical and is causing failure for several changes. This issue is still on-going and seems to be pointing to this change as the root cause: https://gerrit.ovirt.org/#/c/89812/1 - examples: export template to an export domain Issue is still under investigation. On Thursday evening and Friday we seem to have had failures which indicate issues with migration and ipv6 configuration, The tests that failed were 006_migrations.migrate_vm and 004_basic_sanity.hotplug_cpu cpu hotplug failure was reporting these changes: Failed patch: virt: storage: minimal changes to the drive XML - https://gerrit.ovirt.org/#/c/89508/ Reported as failure cause: virt: extract local adjustments to XML - https://gerrit.ovirt.org/#/c/89506/ and migration failure was reporting this patch: *aaa: can't switch user when accessing the engine with an active kerberos ticket - https://gerrit.ovirt.org/#/c/89872/ * *This three issues are still on-going and you can see the cases resolved vs the backlog which is going forward to next week: * *Infra/OST tests related issues: * We also had a failure which seems to be related to the ost test and was fixed in this patch: https://gerrit.ovirt.org/#/c/89781/ - Prevent network to be imported by auto_sync Further discussion on this can be found in the google doc opened by Dominik: https://docs.google.com/document/d/1VeKfI7luw-HHTCCmfogLNiYLTtIG_3maENVQlDwx0zk/edit#heading=h.peefxb3julf3 *Below you can see the chart for this week's resolved issues but cause of failure:* *Code* = regression of working components/functionalities *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power outages *OST Tests* - package related issues, failed build artifacts *Below is a chart of resolved failures based on ovirt version* *Below is a chart showing failures by suite type: * Thanks, Dafna -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30468 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5340 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5062 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5515 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26207 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26005 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6545 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25622 bytes Desc: not available URL: From mperina at redhat.com Fri Apr 6 16:15:35 2018 From: mperina at redhat.com (Martin Perina) Date: Fri, 6 Apr 2018 18:15:35 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt Master (ovirt-engine) ] [ 06-04-2018 ] [ 006_migrations.migrate_vm ] In-Reply-To: References: Message-ID: On Fri, Apr 6, 2018 at 6:07 PM, Martin Perina wrote: > Adding Ravi > > On Fri, Apr 6, 2018 at 2:20 PM, Dafna Ron wrote: > >> hi, >> >> 006_migrations.migrate_vm failure points to the below patch. There seems >> to be an issue to migrate vms but I am not sure its related to this patch. >> >> can you please check? >> >> >> *Link and headline of suspected patches: aaa: can't switch user when >> accessing the engine with an active kerberos ticket - >> https://gerrit.ovirt.org/#/c/89872/ * >> > The issue is definitely not related to above patch, according to logs admin at internal user were successfully logged into engine and performed several tasks? ?like adding hosts. Adding Michal as the issue is around VM migrations ? > >> *Link to Job:* >> >> * http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6704/ >> Link >> to all logs:* http://jenkins.ovirt.org/job/o >> virt-master_change-queue-tester/6704/artifact/exported-artif >> acts/basic-suit-master-el7/test_logs/basic-suite-master/ >> post-006_migrations.py/ >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> *(Relevant) error snippet from the log: 2018-04-06 >> 04:07:56,517-0400 ERROR (qgapoller/2) [Executor] Unhandled exception in >> > vm=aad106a1-9043-4661-bb9b-9894f66b506b at 0x7fc4681a2590> timeout=10, >> duration=0.00 at 0x7fc4541193d0> (executor:317)Traceback (most recent call >> last): File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, >> in _execute_task task() File >> "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in >> __call__ self._callable() File >> "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 314, in >> __call__ self._execute() File >> "/usr/lib/python2.7/site-packages/vdsm/virt/qemuguestagent.py", line 388, >> in _execute for ifname, ifparams in six.iteritems(interfaces): File >> "/usr/lib/python2.7/site-packages/six.py", line 599, in iteritems return >> d.iteritems(**kw)AttributeError: 'list' object has no attribute 'iteritems'*2018-04-06 >> 04:23:10,165-0400 ERROR (migsrc/aad106a1) [virt.vm] >> (vmId='aad106a1-9043-4661-bb9b-9894f66b506b') migration destination >> error: Virtual machine already exists (migration:290) >> >> >> ** >> > > > > -- > Martin Perina > Associate Manager, Software Engineering > Red Hat Czech s.r.o. > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykaul at redhat.com Fri Apr 6 17:41:41 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Fri, 06 Apr 2018 17:41:41 +0000 Subject: [ovirt-devel] OST Failure - Weekly update [30/03/2018-06/04/2018] In-Reply-To: References: Message-ID: On Fri, Apr 6, 2018, 7:11 PM Dafna Ron wrote: > Hello, > > I would like to update on this week's failures and OST current status. > > We had a few failures this week and some are still on-going issues. > > We had two unrelated issues failing the test: > 006_migrations.prepare_migration_attachments_ipv6 > This failure was periodical and is causing failure for several changes. > This issue is still on-going and seems to be pointing to this change as > the root cause: > https://gerrit.ovirt.org/#/c/89812/1 - examples: export template to an > export domain > There is no way the above is the root cause for anything. It's an example script. > Issue is still under investigation. > I've disabled that test. Y. > On Thursday evening and Friday we seem to have had failures which indicate > issues with migration and ipv6 configuration, > The tests that failed were 006_migrations.migrate_vm and > 004_basic_sanity.hotplug_cpu > > cpu hotplug failure was reporting these changes: > > Failed patch: > virt: storage: minimal changes to the drive XML - > https://gerrit.ovirt.org/#/c/89508/ > > Reported as failure cause: > virt: extract local adjustments to XML - > https://gerrit.ovirt.org/#/c/89506/ > > and migration failure was reporting this patch: > > > > *aaa: can't switch user when accessing the engine with an active kerberos > ticket - https://gerrit.ovirt.org/#/c/89872/ > * > > > > > > > > *This three issues are still on-going and you can see the cases resolved > vs the backlog which is going forward to next week: * > > *Infra/OST tests related issues: * > > We also had a failure which seems to be related to the ost test and was > fixed in this patch: > https://gerrit.ovirt.org/#/c/89781/ - Prevent network to be imported by > auto_sync > > Further discussion on this can be found in the google doc opened by > Dominik: > https://docs.google.com/document/d/1VeKfI7luw-HHTCCmfogLNiYLTtIG_3maENVQlDwx0zk/edit#heading=h.peefxb3julf3 > > > > *Below you can see the chart for this week's resolved issues but cause of > failure:* > *Code* = regression of working components/functionalities > *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power > outages > *OST Tests* - package related issues, failed build artifacts > > > > > > > > > > > *Below is a chart of resolved failures based on ovirt version* > > > > > *Below is a chart showing failures by suite type: * > Thanks, > Dafna > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5340 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6545 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5062 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25622 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26207 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5515 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26005 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30468 bytes Desc: not available URL: From danken at redhat.com Fri Apr 6 21:30:31 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Sat, 7 Apr 2018 00:30:31 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: No, I am afraid that we have not managed to understand why setting and ipv6 address too the host off the grid. We shall continue researching this next week. Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but could it possibly be related (I really doubt that)? On Fri, Apr 6, 2018 at 2:20 PM, Dafna Ron wrote: > Dan, was there a fix for the issues? > can I have a link to the fix if there was? > > Thanks, > Dafna > > > On Wed, Apr 4, 2018 at 5:01 PM, Gal Ben Haim wrote: >> >> From lago's log, I see that lago collected the logs from the VMs using ssh >> (after the test failed), which means >> that the VM didn't crash. >> >> On Wed, Apr 4, 2018 at 5:27 PM, Dan Kenigsberg wrote: >>> >>> On Wed, Apr 4, 2018 at 4:59 PM, Barak Korren wrote: >>> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] >>> > >>> > Link to suspected patches: >>> > (Probably unrelated) >>> > https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - examples: >>> > export template to an export domain >>> > >>> > This seems to happen multiple times sporadically, I thought this would >>> > be solved by >>> > https://gerrit.ovirt.org/#/c/89781/ but it isn't. >>> >>> right, it is a completely unrelated issue there (with external networks). >>> here, however, the host dies while setting setupNetworks of an ipv6 >>> address. Setup network waits for Engine's confirmation at 08:33:00,711 >>> >>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/vdsm/supervdsm.log >>> but kernel messages stop at 08:33:23 >>> >>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/messages/*view*/ >>> >>> Does the lago VM of this host crash? pause? >>> >>> >>> > >>> > Link to Job: >>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ >>> > >>> > Link to all logs: >>> > >>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ >>> > >>> > Error snippet from log: >>> > >>> > >>> > >>> > Traceback (most recent call last): >>> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >>> > testMethod() >>> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in >>> > runTest >>> > self.test(*self.arg) >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 129, in wrapped_test >>> > test() >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 59, in wrapper >>> > return func(get_test_prefix(), *args, **kwargs) >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 78, in wrapper >>> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs >>> > File >>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", >>> > line 139, in prepare_migration_attachments_ipv6 >>> > engine, host_service, MIGRATION_NETWORK, ip_configuration) >>> > File >>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", >>> > line 71, in modify_ip_config >>> > check_connectivity=True) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", >>> > line 36729, in setup_networks >>> > return self._internal_action(action, 'setupnetworks', None, >>> > headers, query, wait) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 299, in _internal_action >>> > return future.wait() if wait else future >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 55, in wait >>> > return self._code(response) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 296, in callback >>> > self._check_fault(response) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 132, in _check_fault >>> > self._raise_error(response, body) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 118, in _raise_error >>> > raise error >>> > Error: Fault reason is "Operation Failed". Fault detail is "[Network >>> > error during communication with the Host.]". HTTP response code is >>> > 400. >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > Barak Korren >>> > RHV DevOps team , RHCE, RHCi >>> > Red Hat EMEA >>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>> > _______________________________________________ >>> > Devel mailing list >>> > Devel at ovirt.org >>> > http://lists.ovirt.org/mailman/listinfo/devel >>> > >>> > >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >> >> >> >> >> -- >> GAL bEN HAIM >> RHV DEVOPS >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel > > From sabose at redhat.com Sat Apr 7 04:19:50 2018 From: sabose at redhat.com (Sahina Bose) Date: Sat, 7 Apr 2018 09:49:50 +0530 Subject: [ovirt-devel] [OST][HC] HE fails to deploy In-Reply-To: References: Message-ID: On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi wrote: > > > On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose wrote: > >> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >> >> Both the 4.2 and master suites are failing on getting local VM IP. >> Any idea what changed or if I have to change the test? >> >> thanks! >> > > Hi Sahina, > 4.2 and master suite non HC are correctly running this morning. > http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ > ovirt-system-tests_he-basic-ansible-suite-master/146/ > http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ > ovirt-system-tests_he-basic-ansible-suite-4.2/76/ > > I'll try to check the difference with HC suites. > > Are you using more than one subnet in the HC suites? > No, I'm not. And we havent's changed anything related to network in the test suite. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Sun Apr 8 05:59:21 2018 From: bkorren at redhat.com (Barak Korren) Date: Sun, 8 Apr 2018 08:59:21 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On 7 April 2018 at 00:30, Dan Kenigsberg wrote: > No, I am afraid that we have not managed to understand why setting and > ipv6 address too the host off the grid. We shall continue researching > this next week. > > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but > could it possibly be related (I really doubt that)? > at this point I think we should seriously consider disabling the relevant test, as its impacting a large number of changes. > On Fri, Apr 6, 2018 at 2:20 PM, Dafna Ron wrote: >> Dan, was there a fix for the issues? >> can I have a link to the fix if there was? >> >> Thanks, >> Dafna >> >> >> On Wed, Apr 4, 2018 at 5:01 PM, Gal Ben Haim wrote: >>> >>> From lago's log, I see that lago collected the logs from the VMs using ssh >>> (after the test failed), which means >>> that the VM didn't crash. >>> >>> On Wed, Apr 4, 2018 at 5:27 PM, Dan Kenigsberg wrote: >>>> >>>> On Wed, Apr 4, 2018 at 4:59 PM, Barak Korren wrote: >>>> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] >>>> > >>>> > Link to suspected patches: >>>> > (Probably unrelated) >>>> > https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - examples: >>>> > export template to an export domain >>>> > >>>> > This seems to happen multiple times sporadically, I thought this would >>>> > be solved by >>>> > https://gerrit.ovirt.org/#/c/89781/ but it isn't. >>>> >>>> right, it is a completely unrelated issue there (with external networks). >>>> here, however, the host dies while setting setupNetworks of an ipv6 >>>> address. Setup network waits for Engine's confirmation at 08:33:00,711 >>>> >>>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/vdsm/supervdsm.log >>>> but kernel messages stop at 08:33:23 >>>> >>>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/messages/*view*/ >>>> >>>> Does the lago VM of this host crash? pause? >>>> >>>> >>>> > >>>> > Link to Job: >>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ >>>> > >>>> > Link to all logs: >>>> > >>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ >>>> > >>>> > Error snippet from log: >>>> > >>>> > >>>> > >>>> > Traceback (most recent call last): >>>> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >>>> > testMethod() >>>> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in >>>> > runTest >>>> > self.test(*self.arg) >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>>> > 129, in wrapped_test >>>> > test() >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>>> > 59, in wrapper >>>> > return func(get_test_prefix(), *args, **kwargs) >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>>> > 78, in wrapper >>>> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs >>>> > File >>>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", >>>> > line 139, in prepare_migration_attachments_ipv6 >>>> > engine, host_service, MIGRATION_NETWORK, ip_configuration) >>>> > File >>>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", >>>> > line 71, in modify_ip_config >>>> > check_connectivity=True) >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", >>>> > line 36729, in setup_networks >>>> > return self._internal_action(action, 'setupnetworks', None, >>>> > headers, query, wait) >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>>> > 299, in _internal_action >>>> > return future.wait() if wait else future >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>>> > 55, in wait >>>> > return self._code(response) >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>>> > 296, in callback >>>> > self._check_fault(response) >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>>> > 132, in _check_fault >>>> > self._raise_error(response, body) >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>>> > 118, in _raise_error >>>> > raise error >>>> > Error: Fault reason is "Operation Failed". Fault detail is "[Network >>>> > error during communication with the Host.]". HTTP response code is >>>> > 400. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > Barak Korren >>>> > RHV DevOps team , RHCE, RHCi >>>> > Red Hat EMEA >>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>> > _______________________________________________ >>>> > Devel mailing list >>>> > Devel at ovirt.org >>>> > http://lists.ovirt.org/mailman/listinfo/devel >>>> > >>>> > >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>> >>> >>> >>> >>> -- >>> GAL bEN HAIM >>> RHV DEVOPS >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >> >> > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From eedri at redhat.com Sun Apr 8 06:15:12 2018 From: eedri at redhat.com (Eyal Edri) Date: Sun, 8 Apr 2018 09:15:12 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. Is it still failing? On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren wrote: > On 7 April 2018 at 00:30, Dan Kenigsberg wrote: > > No, I am afraid that we have not managed to understand why setting and > > ipv6 address too the host off the grid. We shall continue researching > > this next week. > > > > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but > > could it possibly be related (I really doubt that)? > > > > at this point I think we should seriously consider disabling the > relevant test, as its impacting a large number of changes. > > > On Fri, Apr 6, 2018 at 2:20 PM, Dafna Ron wrote: > >> Dan, was there a fix for the issues? > >> can I have a link to the fix if there was? > >> > >> Thanks, > >> Dafna > >> > >> > >> On Wed, Apr 4, 2018 at 5:01 PM, Gal Ben Haim > wrote: > >>> > >>> From lago's log, I see that lago collected the logs from the VMs using > ssh > >>> (after the test failed), which means > >>> that the VM didn't crash. > >>> > >>> On Wed, Apr 4, 2018 at 5:27 PM, Dan Kenigsberg > wrote: > >>>> > >>>> On Wed, Apr 4, 2018 at 4:59 PM, Barak Korren > wrote: > >>>> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] > >>>> > > >>>> > Link to suspected patches: > >>>> > (Probably unrelated) > >>>> > https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - examples: > >>>> > export template to an export domain > >>>> > > >>>> > This seems to happen multiple times sporadically, I thought this > would > >>>> > be solved by > >>>> > https://gerrit.ovirt.org/#/c/89781/ but it isn't. > >>>> > >>>> right, it is a completely unrelated issue there (with external > networks). > >>>> here, however, the host dies while setting setupNetworks of an ipv6 > >>>> address. Setup network waits for Engine's confirmation at 08:33:00,711 > >>>> > >>>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1537/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/lago- > basic-suite-4-2-host-0/_var_log/vdsm/supervdsm.log > >>>> but kernel messages stop at 08:33:23 > >>>> > >>>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1537/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/lago- > basic-suite-4-2-host-0/_var_log/messages/*view*/ > >>>> > >>>> Does the lago VM of this host crash? pause? > >>>> > >>>> > >>>> > > >>>> > Link to Job: > >>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ > >>>> > > >>>> > Link to all logs: > >>>> > > >>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/ > 1537/artifact/exported-artifacts/basic-suit-4.2-el7/ > test_logs/basic-suite-4.2/post-006_migrations.py/ > >>>> > > >>>> > Error snippet from log: > >>>> > > >>>> > > >>>> > > >>>> > Traceback (most recent call last): > >>>> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > >>>> > testMethod() > >>>> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, > in > >>>> > runTest > >>>> > self.test(*self.arg) > >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", > line > >>>> > 129, in wrapped_test > >>>> > test() > >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", > line > >>>> > 59, in wrapper > >>>> > return func(get_test_prefix(), *args, **kwargs) > >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", > line > >>>> > 78, in wrapper > >>>> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs > >>>> > File > >>>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", > >>>> > line 139, in prepare_migration_attachments_ipv6 > >>>> > engine, host_service, MIGRATION_NETWORK, ip_configuration) > >>>> > File > >>>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ > ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", > >>>> > line 71, in modify_ip_config > >>>> > check_connectivity=True) > >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", > >>>> > line 36729, in setup_networks > >>>> > return self._internal_action(action, 'setupnetworks', None, > >>>> > headers, query, wait) > >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", > line > >>>> > 299, in _internal_action > >>>> > return future.wait() if wait else future > >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", > line > >>>> > 55, in wait > >>>> > return self._code(response) > >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", > line > >>>> > 296, in callback > >>>> > self._check_fault(response) > >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", > line > >>>> > 132, in _check_fault > >>>> > self._raise_error(response, body) > >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", > line > >>>> > 118, in _raise_error > >>>> > raise error > >>>> > Error: Fault reason is "Operation Failed". Fault detail is "[Network > >>>> > error during communication with the Host.]". HTTP response code is > >>>> > 400. > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > -- > >>>> > Barak Korren > >>>> > RHV DevOps team , RHCE, RHCi > >>>> > Red Hat EMEA > >>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > >>>> > _______________________________________________ > >>>> > Devel mailing list > >>>> > Devel at ovirt.org > >>>> > http://lists.ovirt.org/mailman/listinfo/devel > >>>> > > >>>> > > >>>> _______________________________________________ > >>>> Devel mailing list > >>>> Devel at ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/devel > >>> > >>> > >>> > >>> > >>> -- > >>> GAL bEN HAIM > >>> RHV DEVOPS > >>> > >>> _______________________________________________ > >>> Devel mailing list > >>> Devel at ovirt.org > >>> http://lists.ovirt.org/mailman/listinfo/devel > >> > >> > > _______________________________________________ > > Devel mailing list > > Devel at ovirt.org > > http://lists.ovirt.org/mailman/listinfo/devel > > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ehaas at redhat.com Sun Apr 8 06:21:34 2018 From: ehaas at redhat.com (Edward Haas) Date: Sun, 8 Apr 2018 09:21:34 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: > Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. > Is it still failing? > > On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren wrote: > >> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >> > No, I am afraid that we have not managed to understand why setting and >> > ipv6 address too the host off the grid. We shall continue researching >> > this next week. >> > >> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but >> > could it possibly be related (I really doubt that)? >> > >> > Sorry, but I do not see how this problem is related to VDSM. There is nothing that indicates that there is a VDSM problem. Has the RPC connection between Engine and VDSM failed? > >> at this point I think we should seriously consider disabling the >> relevant test, as its impacting a large number of changes. >> >> > On Fri, Apr 6, 2018 at 2:20 PM, Dafna Ron wrote: >> >> Dan, was there a fix for the issues? >> >> can I have a link to the fix if there was? >> >> >> >> Thanks, >> >> Dafna >> >> >> >> >> >> On Wed, Apr 4, 2018 at 5:01 PM, Gal Ben Haim >> wrote: >> >>> >> >>> From lago's log, I see that lago collected the logs from the VMs >> using ssh >> >>> (after the test failed), which means >> >>> that the VM didn't crash. >> >>> >> >>> On Wed, Apr 4, 2018 at 5:27 PM, Dan Kenigsberg >> wrote: >> >>>> >> >>>> On Wed, Apr 4, 2018 at 4:59 PM, Barak Korren >> wrote: >> >>>> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] >> >>>> > >> >>>> > Link to suspected patches: >> >>>> > (Probably unrelated) >> >>>> > https://gerrit.ovirt.org/#/c/89812/1 (ovirt-engine-sdk) - >> examples: >> >>>> > export template to an export domain >> >>>> > >> >>>> > This seems to happen multiple times sporadically, I thought this >> would >> >>>> > be solved by >> >>>> > https://gerrit.ovirt.org/#/c/89781/ but it isn't. >> >>>> >> >>>> right, it is a completely unrelated issue there (with external >> networks). >> >>>> here, however, the host dies while setting setupNetworks of an ipv6 >> >>>> address. Setup network waits for Engine's confirmation at >> 08:33:00,711 >> >>>> >> >>>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1 >> 537/artifact/exported-artifacts/basic-suit-4.2-el7/test_ >> logs/basic-suite-4.2/post-006_migrations.py/lago-basic- >> suite-4-2-host-0/_var_log/vdsm/supervdsm.log >> >>>> but kernel messages stop at 08:33:23 >> >>>> >> >>>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1 >> 537/artifact/exported-artifacts/basic-suit-4.2-el7/test_ >> logs/basic-suite-4.2/post-006_migrations.py/lago-basic- >> suite-4-2-host-0/_var_log/messages/*view*/ >> >>>> >> >>>> Does the lago VM of this host crash? pause? >> >>>> >> >>>> >> >>>> > >> >>>> > Link to Job: >> >>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1537/ >> >>>> > >> >>>> > Link to all logs: >> >>>> > >> >>>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1 >> 537/artifact/exported-artifacts/basic-suit-4.2-el7/test_ >> logs/basic-suite-4.2/post-006_migrations.py/ >> >>>> > >> >>>> > Error snippet from log: >> >>>> > >> >>>> > >> >>>> > >> >>>> > Traceback (most recent call last): >> >>>> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >> >>>> > testMethod() >> >>>> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, >> in >> >>>> > runTest >> >>>> > self.test(*self.arg) >> >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", >> line >> >>>> > 129, in wrapped_test >> >>>> > test() >> >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", >> line >> >>>> > 59, in wrapper >> >>>> > return func(get_test_prefix(), *args, **kwargs) >> >>>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", >> line >> >>>> > 78, in wrapper >> >>>> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, >> **kwargs >> >>>> > File >> >>>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt >> -system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", >> >>>> > line 139, in prepare_migration_attachments_ipv6 >> >>>> > engine, host_service, MIGRATION_NETWORK, ip_configuration) >> >>>> > File >> >>>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt >> -system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", >> >>>> > line 71, in modify_ip_config >> >>>> > check_connectivity=True) >> >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", >> >>>> > line 36729, in setup_networks >> >>>> > return self._internal_action(action, 'setupnetworks', None, >> >>>> > headers, query, wait) >> >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", >> line >> >>>> > 299, in _internal_action >> >>>> > return future.wait() if wait else future >> >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", >> line >> >>>> > 55, in wait >> >>>> > return self._code(response) >> >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", >> line >> >>>> > 296, in callback >> >>>> > self._check_fault(response) >> >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", >> line >> >>>> > 132, in _check_fault >> >>>> > self._raise_error(response, body) >> >>>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", >> line >> >>>> > 118, in _raise_error >> >>>> > raise error >> >>>> > Error: Fault reason is "Operation Failed". Fault detail is >> "[Network >> >>>> > error during communication with the Host.]". HTTP response code is >> >>>> > 400. >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > -- >> >>>> > Barak Korren >> >>>> > RHV DevOps team , RHCE, RHCi >> >>>> > Red Hat EMEA >> >>>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >> >>>> > _______________________________________________ >> >>>> > Devel mailing list >> >>>> > Devel at ovirt.org >> >>>> > http://lists.ovirt.org/mailman/listinfo/devel >> >>>> > >> >>>> > >> >>>> _______________________________________________ >> >>>> Devel mailing list >> >>>> Devel at ovirt.org >> >>>> http://lists.ovirt.org/mailman/listinfo/devel >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> GAL bEN HAIM >> >>> RHV DEVOPS >> >>> >> >>> _______________________________________________ >> >>> Devel mailing list >> >>> Devel at ovirt.org >> >>> http://lists.ovirt.org/mailman/listinfo/devel >> >> >> >> >> > _______________________________________________ >> > Devel mailing list >> > Devel at ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/devel >> >> >> >> -- >> Barak Korren >> RHV DevOps team , RHCE, RHCi >> Red Hat EMEA >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > > Eyal edri > > > MANAGER > > RHV DevOps > > EMEA VIRTUALIZATION R&D > > > Red Hat EMEA > TRIED. TESTED. TRUSTED. > phone: +972-9-7692018 > irc: eedri (on #tlv #rhev-dev #rhev-integ) > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykaul at redhat.com Sun Apr 8 08:15:31 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Sun, 8 Apr 2018 11:15:31 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 (vdsm) ] [ 05-04-2018 ] [ 004_basic_sanity.hotplug_cpu ] In-Reply-To: References: Message-ID: I've seen the exact(?) issue on 003_basic_networking.py : verify_interhost_connectivity_ipv4 test. Y. On Fri, Apr 6, 2018 at 3:46 PM, Dafna Ron wrote: > Hi, > > We had a failure on 004_basic_sanity.hotplug_cpu. looking at the logs, I > do not see an issue on the cpu hotplug but it seems that the host had an > issue with network config. > > can you please have a look? > > > *Link and headline of suspected patches: Failed patch: * > *virt: storage: minimal changes to the drive XML - > https://gerrit.ovirt.org/#/c/89508/ * > > *Reported as failure cause: * > *virt: extract local adjustments to XML - > https://gerrit.ovirt.org/#/c/89506/ * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Link to > Job:http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1565/ > Link to > all > logs:http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1565/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-004_basic_sanity.py/ > (Relevant) > error snippet from the log: MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:17,451::cmdutils::150::root::(exec_cmd) /sbin/ifdown eth2 (cwd > None)MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:17,719::cmdutils::158::root::(exec_cmd) SUCCESS: = ''; = > 0MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:17,720::cmdutils::150::root::(exec_cmd) /sbin/ifdown eth3 (cwd > None)MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:17,982::cmdutils::158::root::(exec_cmd) SUCCESS: = ''; = > 0ifup/onc5bbb5eed6a84::DEBUG::2018-04-05 > 18:58:18,165::cmdutils::158::root::(exec_cmd) FAILED: = 'Running > scope as unit > b32c8477-857f-476c-96c8-580daf0d4217.scope.\n/etc/sysconfig/network-scripts/ifup-eth: > line 304: 12145 Terminated /sbin/dhclient ${DHCLIENTARGS} > ${DEVICE}\nDevice "onc5bbb5eed6a84" does not exist.\nCannot find device > "onc5bbb5eed6a84"\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" > does not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does > not exist.\nDevice "onc5bbb5eed6a84" does not exist.\nDevice > "onc5bbb5eed6a84" does not exist.\nDevice "onc5bbb5eed6a84" does not > exist.\n'; = 1ifup/onc5bbb5eed6a84::ERROR::2018-04-05 > 18:58:18,165::concurrent::201::root::(run) FINISH thread > > failedTraceback (most recent call last): File > "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in > run ret = func(*args, **kwargs) File > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", > line 942, in _exec_ifup _exec_ifup_by_name(iface.name > , cgroup) File > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", > line 928, in _exec_ifup_by_name raise > ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else > '')ConfigNetworkError: (29, '\n')MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:20,075::cmdutils::150::root::(exec_cmd) /sbin/ip addr flush dev eth2 > scope global (cwd None)MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:20,081::cmdutils::158::root::(exec_cmd) SUCCESS: = ''; = > 0MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:20,082::ifcfg::479::root::(_atomicBackup) Backed up > /etc/sysconfig/network-scripts/ifcfg-eth2MainProcess|jsonrpc/2::DEBUG::2018-04-05 > 18:58:20,084::ifcfg::512::root::(_persistentBackup) backing up ifcfg-eth2: > HWADDR="54:52:c0:a8:c9:02" BOOTPROTO="dhcp" ONBOOT="yes" TYPE="Ethernet" > NAME="eth2" * > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Sun Apr 8 10:21:41 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Sun, 8 Apr 2018 13:21:41 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: > > > On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: > >> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >> Is it still failing? >> >> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren wrote: >> >>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>> > No, I am afraid that we have not managed to understand why setting and >>> > ipv6 address too the host off the grid. We shall continue researching >>> > this next week. >>> > >>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but >>> > could it possibly be related (I really doubt that)? >>> > >>> >> > Sorry, but I do not see how this problem is related to VDSM. > There is nothing that indicates that there is a VDSM problem. > > Has the RPC connection between Engine and VDSM failed? > > Further up the thread, Piotr noticed that (at least on one failure of this test) that the Vdsm host lost connectivity to its storage, and Vdsm process was restarted. However, this does not seems to happen in all cases where this test fails. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Sun Apr 8 10:40:32 2018 From: bkorren at redhat.com (Barak Korren) Date: Sun, 8 Apr 2018 13:40:32 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 2018-04-08 ] [098_ovirt_provider_ovn.use_ovn_provider] Message-ID: Test failed: 098_ovirt_provider_ovn.use_ovn_provider Link to suspected patches: https://gerrit.ovirt.org/#/c/89581/3 Link to Job: https://gerrit.ovirt.org/#/c/89581/3 Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6714/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/ Error snippet from log: 'name' -------------------- >> begin captured logging << -------------------- lago.providers.libvirt.cpu: DEBUG: numa : cpus_per_cell: 1, total_cells: 2 lago.providers.libvirt.cpu: DEBUG: numa: lago.providers.libvirt.cpu: DEBUG: numa : cpus_per_cell: 1, total_cells: 2 lago.providers.libvirt.cpu: DEBUG: numa: lago.providers.libvirt.cpu: DEBUG: numa : cpus_per_cell: 1, total_cells: 2 lago.providers.libvirt.cpu: DEBUG: numa: requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 py.warnings: WARNING: * Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/tokens/ HTTP/1.1" 200 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ HTTP/1.1" 200 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ HTTP/1.1" 200 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ HTTP/1.1" 200 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/networks/ HTTP/1.1" 201 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/subnets/ HTTP/1.1" 201 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/ports/ HTTP/1.1" 201 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ HTTP/1.1" 200 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ HTTP/1.1" 200 None requests.packages.urllib3.connectionpool: INFO: * Starting new HTTPS connection (1): 192.168.201.4 requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ HTTP/1.1" 200 None --------------------- >> end captured logging << --------------------- Note: we're seeing similar issues on the same patches in both the 'master' and the 4.2 change queues. -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From danken at redhat.com Sun Apr 8 11:03:11 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Sun, 8 Apr 2018 14:03:11 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 2018-04-08 ] [098_ovirt_provider_ovn.use_ovn_provider] In-Reply-To: References: Message-ID: On Sun, Apr 8, 2018 at 1:40 PM, Barak Korren wrote: > Test failed: 098_ovirt_provider_ovn.use_ovn_provider > > Link to suspected patches: > https://gerrit.ovirt.org/#/c/89581/3 > > Link to Job: > https://gerrit.ovirt.org/#/c/89581/3 > > Link to all logs: > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6714/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/ > > Error snippet from log: > > > > 'name' > -------------------- >> begin captured logging << -------------------- > lago.providers.libvirt.cpu: DEBUG: numa > : cpus_per_cell: 1, total_cells: 2 > lago.providers.libvirt.cpu: DEBUG: numa: > > > > > > lago.providers.libvirt.cpu: DEBUG: numa > : cpus_per_cell: 1, total_cells: 2 > lago.providers.libvirt.cpu: DEBUG: numa: > > > > > > lago.providers.libvirt.cpu: DEBUG: numa > : cpus_per_cell: 1, total_cells: 2 > lago.providers.libvirt.cpu: DEBUG: numa: > > > > > > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > py.warnings: WARNING: * Unverified HTTPS request is being made. > Adding certificate verification is strongly advised. See: > https://urllib3.readthedocs.org/en/latest/security.html > requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/tokens/ > HTTP/1.1" 200 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ > HTTP/1.1" 200 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ > HTTP/1.1" 200 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ > HTTP/1.1" 200 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/networks/ > HTTP/1.1" 201 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/subnets/ > HTTP/1.1" 201 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/ports/ > HTTP/1.1" 201 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ > HTTP/1.1" 200 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ > HTTP/1.1" 200 None > requests.packages.urllib3.connectionpool: INFO: * Starting new > HTTPS connection (1): 192.168.201.4 > requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ > HTTP/1.1" 200 None > --------------------- >> end captured logging << --------------------- > > > > Note: we're seeing similar issues on the same patches in both the > 'master' and the 4.2 change queues. I've tried to revert the suspected patch. Let us see if it makes OST happy again http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2525/ From ylavi at redhat.com Sun Apr 8 12:35:12 2018 From: ylavi at redhat.com (Yaniv Lavi) Date: Sun, 8 Apr 2018 15:35:12 +0300 Subject: [ovirt-devel] [kubevirt-dev] Re: [virt-tools-list] Project for profiles and defaults for libvirt domains In-Reply-To: References: <20180320142031.GB23007@wheatley> <20180320151012.GU4530@redhat.com> <20180322145401.GD19999@wheatley> <20180322171753.GU3583@redhat.com> Message-ID: [resending to include OSP devs ] YANIV LAVI SENIOR TECHNICAL PRODUCT MANAGER Red Hat Israel Ltd. 34 Jerusalem Road, Building A, 1st floor Ra'anana, Israel 4350109 ylavi at redhat.com T: +972-9-7692306/8272306 F: +972-9-7692223 IM: ylavi TRIED. TESTED. TRUSTED. @redhatnews Red Hat Red Hat On Wed, Apr 4, 2018 at 7:23 PM, Yaniv Lavi wrote: > [resending to include KubeVirt devs ] > > YANIV LAVI > > SENIOR TECHNICAL PRODUCT MANAGER > > Red Hat Israel Ltd. > > 34 Jerusalem Road, Building A, 1st floor > > Ra'anana, Israel 4350109 > > ylavi at redhat.com T: +972-9-7692306/8272306 F: +972-9-7692223 IM: ylavi > TRIED. TESTED. TRUSTED. > @redhatnews Red Hat Red Hat > > > On Wed, Apr 4, 2018 at 7:07 PM, Yaniv Lavi wrote: > >> Hi, >> I'd like to go one step back and discuss why we should try to do this on >> the high level. >> >> For the last 5-10 years of KVM development, we are pragmatically >> providing the Linux host level APIs via project specific host >> agents/integration code (Nova agent, oVirt host agent, virt-manager). >> In recent time we see new projects that have similar requirements >> (Cockpit, different automation tool, KubeVirt), this means that all of the >> Linux virt stack consumers are reinventing the wheel and using very >> different paths to consume the partial solutions that are provided today. >> >> The use of the Linux virt stack is well defined by the existing projects >> scope and it makes a lot of sense to try to provide the common patterns via >> the virt stack directly as a host level API that different client or >> management consume. >> The main goal is to improve the developer experience for virtualization >> management applications with an API set that is useful to the entire set of >> tools (OSP, oVirt, KubeVirt, Cockpit and so on). >> >> The Linux virt developer community currently is not able to provide best >> practices and optimizations from single node knowledge. This means that all >> of that smarts is locked to the specific project integration in the good >> case or not provided at all and the projects as a whole lose from that. >> When testing the Linux virt stack itself and since each project has >> different usage pattern, we lose the ability to test abilities on the lower >> level making the entire stack less stable and complete for new features. >> >> This also limits the different projects ability to contribute back to the >> Linux stack based on their user and market experience for others in open >> source to gain. >> >> I understand this shift is technically challenging for existing projects, >> but I do see value in doing this even for new implementation like Cockpit >> and KubeVirt. >> I also believe that the end result could be appealing enough to cause >> project like OSP, virt-manager and oVirt to consider to reduce the existing >> capabilities of their host side integrations/agents to shims on the host >> level and reuse the common/better-tested pattern as clients that was >> developed against the experience of the different projects. >> >> I call us all to collaborate and try to converge on a solution that will >> help all in the long term in the value you get from the common base. >> >> >> Thanks, >> >> YANIV LAVI >> >> SENIOR TECHNICAL PRODUCT MANAGER >> >> Red Hat Israel Ltd. >> >> 34 Jerusalem Road, Building A, 1st floor >> >> Ra'anana, Israel 4350109 >> >> ylavi at redhat.com T: +972-9-7692306/8272306 F: +972-9-7692223 IM: ylavi >> TRIED. TESTED. TRUSTED. >> @redhatnews Red Hat Red Hat >> >> >> On Thu, Mar 22, 2018 at 7:18 PM, Daniel P. Berrang? >> wrote: >> >>> On Thu, Mar 22, 2018 at 03:54:01PM +0100, Martin Kletzander wrote: >>> > > >>> > > > One more thing could be automatically figuring out best values >>> based on >>> > > > libosinfo-provided data. >>> > > > >>> > > > 2) Policies >>> > > > >>> > > > Lot of the time there are parts of the domain definition that need >>> to be >>> > > > added, but nobody really cares about them. Sometimes it's enough >>> to >>> > > > have few templates, another time you might want to have a policy >>> > > > per-scenario and want to combine them in various ways. For >>> example with >>> > > > the data provided by point 1). >>> > > > >>> > > > For example if you want PCI-Express, you need the q35 machine >>> type, but >>> > > > you don't really want to care about the machine type. Or you want >>> to >>> > > > use SPICE, but you don't want to care about adding QXL. >>> > > > >>> > > > What if some of these policies could be specified once (using some >>> DSL >>> > > > for example), and used by virtuned to merge them in a unified and >>> > > > predictable way? >>> > > > >>> > > > 3) Abstracting the XML >>> > > > >>> > > > This is probably just usable for stateless apps, but it might >>> happen >>> > > > that some apps don't really want to care about the XML at all. >>> They >>> > > > just want an abstract view of the domain, possibly add/remove a >>> device >>> > > > and that's it. We could do that as well. I can't really tell how >>> much >>> > > > of a demand there is for it, though. >>> > > >>> > > It is safe to say that applications do not want to touch XML at all. >>> > > Any non-trivial application has created an abstraction around XML, >>> > > so that they have an API to express what they want, rather than >>> > > manipulating of strings to format/parse XML. >>> > > >>> > >>> > Sure, this was just meant to be a question as to whether it's worth >>> > pursuing or not. You make a good point on why it is not (at least for >>> > existing apps). >>> > >>> > However, since this was optional, the way this would look without the >>> > XML abstraction is that both input and output would be valid domain >>> > definitions, ultimately resulting in something similar to virt-xml with >>> > the added benefit of applying a policy from a file/string either >>> > supplied by the application itself. Whether that policy was taken from >>> > a common repository of such knowledge is orthogonal to this idea. >>> Since >>> > you would work with the same data, the upgrade could be incremental as >>> > you'd only let virtuned fill in values for new options and could slowly >>> > move on to using it for some pre-existing ones. None of the previous >>> > approaches did this, if I'm not mistaken. Of course it gets more >>> > difficult when you need to expose all the bits libvirt does and keep >>> > them in sync (as you write below). >>> >>> That has implications for how mgmt app deals with XML. Nova has object >>> models for representing XML in memory, but it doesn't aim to have >>> loss-less roundtrip from parse -> object -> format. So if Nova gets >>> basic XML from virttuned, parses it into its object to let it set >>> more fields and then formats it again, chances are it will have lost >>> a bunch of stuff from virttuned. Of course if you know about this >>> need upfront you can design the application such that it can safely >>> round-trip, but this is just example of problem with integrating to >>> existing apps. >>> >>> The other thing that concerns is that there are dependancies between >>> different bits of XML for a given device. ie if feature X is set to >>> a certain value, that prevents use of feature Y. So if virttuned >>> sets feature X, but the downstream application uses feature Y, the >>> final result can be incompatible. The application won't know this >>> because it doesn't control what stuff virttuned would be setting. >>> This can in turn cause ordering constraints. >>> >>> eg the application needs to say that virtio-net is being used, then >>> virttuned can set some defaults like enabling vhost-net, and then >>> the application can fill in more bits that it cares about. Or if >>> we let virttuned go first, setting virtio-net model + vhost-net, >>> then application wants to change model to e1000e, it has to be >>> aware that it must now delete the vhost-net bit that virtuned >>> added. This ends up being more complicated that just ignoring >>> virttuned and coding up use of vhost-net in application code. >>> >>> >>> > > This is the same kind of problem we faced wrt libvirt-gconfig and >>> > > libvirt-gobject usage from virt-manager - it has an extensive code >>> > > base that already works, and rewriting it to use something new >>> > > is alot of work for no short-term benefit. libvirt-gconfig/gobject >>> > > were supposed to be the "easy" bits for virt-manager to adopt, as >>> > > they don't really include much logic that would step on >>> virt-manager's >>> > > toes. libvirt-designer was going to be a very opinionated library >>> > > and in retrospective that makes it even harder to consider adopting >>> > > it for usage in virt-manager, as it'll have signficant liklihood >>> > > of making functionally significant changes in behaviour. >>> > > >>> > >>> > The initial idea (which I forgot to mention) was that all the decisions >>> > libvirt currently does (so that it keeps the guest ABI stable) would be >>> > moved into data (let's say some DSL) and it could then be switched or >>> > adjusted if that's not what the mgmt app wants (on a per-definition >>> > basis, of course). I didn't feel very optimistic about the upstream >>> > acceptance for that idea, so I figured that there could be something >>> > that lives beside libvirt, helps with some policies if requested and >>> > then the resulting XML could be fed into libvirt for determining the >>> > rest. >>> >>> I can't even imagine how we would go about encoding the stable guest >>> ABI logic libvirt does today in data ! >>> >>> > >>> > > There's also the problem with use of native libraries that would >>> > > impact many apps. We only got OpenStack to grudgingly allow the >>> > >>> > By native you mean actual binary libraries or native to the OpenStack >>> > code as in python module? Because what I had in mind for this project >>> > was a python module with optional wrapper for REST API. >>> >>> I meant native binary libraries. ie openstack is not happy in general >>> with adding dependancies on new OS services, because there's a big >>> time lag for getting them into all distros. By comparison a pure >>> python library, they can just handle automatically in their deployment >>> tools, just pip installing on any OS distro straight from pypi. This >>> is what made use of libosinfo a hard sell in Nova. >>> >>> The same thing is seen with Go / Rust where some applications have >>> decided they're better of actually re-implementing the libvirt RPC >>> protocol in Go / Rust rather than use the libvirt.so client. I think >>> this is a bad tradeoff in general, but I can see why they like it >>> >>> Regards, >>> Daniel >>> -- >>> |: https://berrange.com -o- https://www.flickr.com/photos/ >>> dberrange :| >>> |: https://libvirt.org -o- >>> https://fstop138.berrange.com :| >>> |: https://entangle-photo.org -o- https://www.instagram.com/dber >>> range :| >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Sun Apr 8 16:05:07 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Sun, 8 Apr 2018 19:05:07 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 2018-04-08 ] [098_ovirt_provider_ovn.use_ovn_provider] In-Reply-To: References: Message-ID: On Sun, Apr 8, 2018 at 2:03 PM, Dan Kenigsberg wrote: > On Sun, Apr 8, 2018 at 1:40 PM, Barak Korren wrote: >> Test failed: 098_ovirt_provider_ovn.use_ovn_provider >> >> Link to suspected patches: >> https://gerrit.ovirt.org/#/c/89581/3 >> >> Link to Job: >> https://gerrit.ovirt.org/#/c/89581/3 >> >> Link to all logs: >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6714/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/ >> >> Error snippet from log: >> >> >> >> 'name' >> -------------------- >> begin captured logging << -------------------- >> lago.providers.libvirt.cpu: DEBUG: numa >> : cpus_per_cell: 1, total_cells: 2 >> lago.providers.libvirt.cpu: DEBUG: numa: >> >> >> >> >> >> lago.providers.libvirt.cpu: DEBUG: numa >> : cpus_per_cell: 1, total_cells: 2 >> lago.providers.libvirt.cpu: DEBUG: numa: >> >> >> >> >> >> lago.providers.libvirt.cpu: DEBUG: numa >> : cpus_per_cell: 1, total_cells: 2 >> lago.providers.libvirt.cpu: DEBUG: numa: >> >> >> >> >> >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> py.warnings: WARNING: * Unverified HTTPS request is being made. >> Adding certificate verification is strongly advised. See: >> https://urllib3.readthedocs.org/en/latest/security.html >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/tokens/ >> HTTP/1.1" 200 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ >> HTTP/1.1" 200 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ >> HTTP/1.1" 200 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ >> HTTP/1.1" 200 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/networks/ >> HTTP/1.1" 201 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/subnets/ >> HTTP/1.1" 201 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/ports/ >> HTTP/1.1" 201 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ >> HTTP/1.1" 200 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ >> HTTP/1.1" 200 None >> requests.packages.urllib3.connectionpool: INFO: * Starting new >> HTTPS connection (1): 192.168.201.4 >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ >> HTTP/1.1" 200 None >> --------------------- >> end captured logging << --------------------- >> >> >> >> Note: we're seeing similar issues on the same patches in both the >> 'master' and the 4.2 change queues. > > > I've tried to revert the suspected patch. Let us see if it makes OST happy again > http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2525/ Indeed, the revert fixes OST. However, we can wait for Marcin to properly fix it the problem tomorrow. I don't think anybody but him needs green OST for ovirt-provider-ovn, so merging the revert can wait. From mmirecki at redhat.com Sun Apr 8 21:47:29 2018 From: mmirecki at redhat.com (Marcin Mirecki) Date: Sun, 8 Apr 2018 23:47:29 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 2018-04-08 ] [098_ovirt_provider_ovn.use_ovn_provider] In-Reply-To: References: Message-ID: Fix pending: https://gerrit.ovirt.org/#/c/89980/ On Sun, Apr 8, 2018 at 6:05 PM, Dan Kenigsberg wrote: > On Sun, Apr 8, 2018 at 2:03 PM, Dan Kenigsberg wrote: > > On Sun, Apr 8, 2018 at 1:40 PM, Barak Korren wrote: > >> Test failed: 098_ovirt_provider_ovn.use_ovn_provider > >> > >> Link to suspected patches: > >> https://gerrit.ovirt.org/#/c/89581/3 > >> > >> Link to Job: > >> https://gerrit.ovirt.org/#/c/89581/3 > >> > >> Link to all logs: > >> http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/6714/artifact/exported-artifacts/basic-suit-master- > el7/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/ > >> > >> Error snippet from log: > >> > >> > >> > >> 'name' > >> -------------------- >> begin captured logging << -------------------- > >> lago.providers.libvirt.cpu: DEBUG: numa > >> : cpus_per_cell: 1, total_cells: 2 > >> lago.providers.libvirt.cpu: DEBUG: numa: > >> > >> > >> > >> > >> > >> lago.providers.libvirt.cpu: DEBUG: numa > >> : cpus_per_cell: 1, total_cells: 2 > >> lago.providers.libvirt.cpu: DEBUG: numa: > >> > >> > >> > >> > >> > >> lago.providers.libvirt.cpu: DEBUG: numa > >> : cpus_per_cell: 1, total_cells: 2 > >> lago.providers.libvirt.cpu: DEBUG: numa: > >> > >> > >> > >> > >> > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> py.warnings: WARNING: * Unverified HTTPS request is being made. > >> Adding certificate verification is strongly advised. See: > >> https://urllib3.readthedocs.org/en/latest/security.html > >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/tokens/ > >> HTTP/1.1" 200 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ > >> HTTP/1.1" 200 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ > >> HTTP/1.1" 200 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ > >> HTTP/1.1" 200 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/networks/ > >> HTTP/1.1" 201 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/subnets/ > >> HTTP/1.1" 201 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "POST /v2.0/ports/ > >> HTTP/1.1" 201 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/networks/ > >> HTTP/1.1" 200 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/ports/ > >> HTTP/1.1" 200 None > >> requests.packages.urllib3.connectionpool: INFO: * Starting new > >> HTTPS connection (1): 192.168.201.4 > >> requests.packages.urllib3.connectionpool: DEBUG: "GET /v2.0/subnets/ > >> HTTP/1.1" 200 None > >> --------------------- >> end captured logging << --------------------- > >> > >> > >> > >> Note: we're seeing similar issues on the same patches in both the > >> 'master' and the 4.2 change queues. > > > > > > I've tried to revert the suspected patch. Let us see if it makes OST > happy again > > http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2525/ > > Indeed, the revert fixes OST. > However, we can wait for Marcin to properly fix it the problem > tomorrow. I don't think anybody but him needs green OST for > ovirt-provider-ovn, so merging the revert can wait. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sleviim at redhat.com Tue Apr 10 11:51:39 2018 From: sleviim at redhat.com (Shani Leviim) Date: Tue, 10 Apr 2018 14:51:39 +0300 Subject: [ovirt-devel] make check on master fails due to UnicodeDecodeError Message-ID: Hi there, I'm trying to run make check, and I have ~13 tests on vdsm/tests which failes due to the following: File "/home/sleviim/git/vdsm/lib/vdsm/api/vdsmapi.py", line 212, in __init__ loaded_schema = pickle.load(f) File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) (Those lines are common to all failures) Here is an example: ====================================================================== ERROR: test_ok_response (vdsmapi_test.DataVerificationTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/sleviim/git/vdsm/tests/vdsmapi_test.py", line 96, in test_ok_response _schema.schema().verify_retval( File "/home/sleviim/git/vdsm/tests/vdsmapi_test.py", line 67, in schema self._schema = vdsmapi.Schema(paths, True) File "/home/sleviim/git/vdsm/lib/vdsm/api/vdsmapi.py", line 212, in __init__ loaded_schema = pickle.load(f) File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) I've also tried to git clean -dxf && ./autogen.sh --system but it didn't help. Can you please assist? Thanks! *Regards,* *Shani Leviim* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Tue Apr 10 13:09:19 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Tue, 10 Apr 2018 15:09:19 +0200 Subject: [ovirt-devel] dynamic ownership changes Message-ID: <20180410130916.GB37403@Alexandra.local> Hey, I've created a patch[0] that is finally able to activate libvirt's dynamic_ownership for VDSM while not negatively affecting functionality of our storage code. That of course comes with quite a bit of code removal, mostly in the area of host devices, hwrng and anything that touches devices; bunch of test changes and one XML generation caveat (storage is handled by VDSM, therefore disk relabelling needs to be disabled on the VDSM level). Because of the scope of the patch, I welcome storage/virt/network people to review the code and consider the implication this change has on current/future features. [0] https://gerrit.ovirt.org/#/c/89830/ mpolednik From alkaplan at redhat.com Tue Apr 10 13:14:16 2018 From: alkaplan at redhat.com (Alona Kaplan) Date: Tue, 10 Apr 2018 16:14:16 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Hi all, Looking at the log it seems that the new GetCapabilitiesAsync is responsible for the mess. - * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. catch (Throwable t) { getParameters().getCallback().onFailure(t); throw t; } * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. The following warning is printed to the log - WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' *- 08:30:51 a successful getCapabilitiesAsync is sent.* *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * * SetupNetworks takes the monitoring lock. *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* * When the first request is removed from the queue ('ResponseTracker.remove()'), the *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. * The following warning log is printed - WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. Thanks, Alona. On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg wrote: > On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: > >> >> >> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >> >>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>> Is it still failing? >>> >>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren wrote: >>> >>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>> > No, I am afraid that we have not managed to understand why setting and >>>> > ipv6 address too the host off the grid. We shall continue researching >>>> > this next week. >>>> > >>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but >>>> > could it possibly be related (I really doubt that)? >>>> > >>>> >>> >> Sorry, but I do not see how this problem is related to VDSM. >> There is nothing that indicates that there is a VDSM problem. >> >> Has the RPC connection between Engine and VDSM failed? >> >> > Further up the thread, Piotr noticed that (at least on one failure of this > test) that the Vdsm host lost connectivity to its storage, and Vdsm process > was restarted. However, this does not seems to happen in all cases where > this test fails. > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsoffer at redhat.com Tue Apr 10 13:19:32 2018 From: nsoffer at redhat.com (Nir Soffer) Date: Tue, 10 Apr 2018 13:19:32 +0000 Subject: [ovirt-devel] make check on master fails due to UnicodeDecodeError In-Reply-To: References: Message-ID: On Tue, Apr 10, 2018 at 2:52 PM Shani Leviim wrote: > Hi there, > I'm trying to run make check, and I have ~13 tests on vdsm/tests which > failes due to the following: > > File "/home/sleviim/git/vdsm/lib/vdsm/api/vdsmapi.py", line 212, in > __init__ > loaded_schema = pickle.load(f) > File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: > ordinal not in range(128) > > (Those lines are common to all failures) > > Here is an example: > > ====================================================================== > ERROR: test_ok_response (vdsmapi_test.DataVerificationTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/sleviim/git/vdsm/tests/vdsmapi_test.py", line 96, in > test_ok_response > _schema.schema().verify_retval( > File "/home/sleviim/git/vdsm/tests/vdsmapi_test.py", line 67, in schema > self._schema = vdsmapi.Schema(paths, True) > File "/home/sleviim/git/vdsm/lib/vdsm/api/vdsmapi.py", line 212, in > __init__ > loaded_schema = pickle.load(f) > File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: > ordinal not in range(128) > > I've also tried to git clean -dxf && ./autogen.sh --system but it didn't > help. > Did you clean in the root directory? cd vdsm-checkout-dir git clean -dxf ./autogen.sh --system make make check Also, on which system do you run the tests? Fedora 27? CentOS? RHEL? Nir -------------- next part -------------- An HTML attachment was scrubbed... URL: From sleviim at redhat.com Tue Apr 10 14:15:28 2018 From: sleviim at redhat.com (Shani Leviim) Date: Tue, 10 Apr 2018 17:15:28 +0300 Subject: [ovirt-devel] make check on master fails due to UnicodeDecodeError In-Reply-To: References: Message-ID: Hi, Yes, I did clean the root directory but it didn't solve the issue. I'm currently running the tests on fedora27, using python version 2.1.14. Thanks to Dan's help, it seems that we found the root cause: I had 2 pickle files under /var/cache/vdsm/schema: vdsm-api.pickle and vdsm-events.pickle. Removing them and re-running the tests using make check was successfully completed. It was probably derived from a different encoding for python 2 and 3 while loading the schema file. *Regards,* *Shani Leviim* On Tue, Apr 10, 2018 at 4:19 PM, Nir Soffer wrote: > On Tue, Apr 10, 2018 at 2:52 PM Shani Leviim wrote: > >> Hi there, >> I'm trying to run make check, and I have ~13 tests on vdsm/tests which >> failes due to the following: >> >> File "/home/sleviim/git/vdsm/lib/vdsm/api/vdsmapi.py", line 212, in >> __init__ >> loaded_schema = pickle.load(f) >> File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode >> return codecs.ascii_decode(input, self.errors)[0] >> UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: >> ordinal not in range(128) >> >> (Those lines are common to all failures) >> >> Here is an example: >> >> ====================================================================== >> ERROR: test_ok_response (vdsmapi_test.DataVerificationTests) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "/home/sleviim/git/vdsm/tests/vdsmapi_test.py", line 96, in >> test_ok_response >> _schema.schema().verify_retval( >> File "/home/sleviim/git/vdsm/tests/vdsmapi_test.py", line 67, in schema >> self._schema = vdsmapi.Schema(paths, True) >> File "/home/sleviim/git/vdsm/lib/vdsm/api/vdsmapi.py", line 212, in >> __init__ >> loaded_schema = pickle.load(f) >> File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode >> return codecs.ascii_decode(input, self.errors)[0] >> UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: >> ordinal not in range(128) >> >> I've also tried to git clean -dxf && ./autogen.sh --system but it didn't >> help. >> > > Did you clean in the root directory? > > cd vdsm-checkout-dir > git clean -dxf > ./autogen.sh --system > make > make check > > Also, on which system do you run the tests? Fedora 27? CentOS? RHEL? > > Nir > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsoffer at redhat.com Tue Apr 10 14:32:34 2018 From: nsoffer at redhat.com (Nir Soffer) Date: Tue, 10 Apr 2018 14:32:34 +0000 Subject: [ovirt-devel] make check on master fails due to UnicodeDecodeError In-Reply-To: References: Message-ID: On Tue, Apr 10, 2018 at 5:21 PM Shani Leviim wrote: > Hi, > > Yes, I did clean the root directory but it didn't solve the issue. > I'm currently running the tests on fedora27, using python version 2.1.14. > > Thanks to Dan's help, it seems that we found the root cause: > > I had 2 pickle files under /var/cache/vdsm/schema: vdsm-api.pickle and > vdsm-events.pickle. > Removing them and re-running the tests using make check was successfully > completed. > How did you have cached schema under /var/run? This directory is owned by root. Are you running the tests as root? This sounds like a bug in the code using the pickled schema. The pickled should not be used if the timestamp of the pickle do not match the timestamp of the source. Also in make check, we should not use host schema cache, but local schema cache generated by running "make". Nir -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbenhaim at redhat.com Tue Apr 10 15:52:40 2018 From: gbenhaim at redhat.com (Gal Ben Haim) Date: Tue, 10 Apr 2018 18:52:40 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: I'm seeing the same error in [1], during 006_migrations.migrate_vm. [1] http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/ On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan wrote: > Hi all, > > Looking at the log it seems that the new GetCapabilitiesAsync is > responsible for the mess. > > - > * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* > > > > *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* > > * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) > > * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. > > catch (Throwable t) { > getParameters().getCallback().onFailure(t); > throw t; > } > > * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') > > * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. > > The following warning is printed to the log - > > WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > > > > > *- 08:30:51 a successful getCapabilitiesAsync is sent.* > > > *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * > > * SetupNetworks takes the monitoring lock. > > *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* > > * When the first request is removed from the queue ('ResponseTracker.remove()'), the > *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* > > * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. > > * The following warning log is printed - > WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > > - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. > > > Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. > > > Thanks, > > Alona. > > > > > > > On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg wrote: > >> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >> >>> >>> >>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>> >>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>> Is it still failing? >>>> >>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>> wrote: >>>> >>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>> > No, I am afraid that we have not managed to understand why setting >>>>> and >>>>> > ipv6 address too the host off the grid. We shall continue researching >>>>> > this next week. >>>>> > >>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but >>>>> > could it possibly be related (I really doubt that)? >>>>> > >>>>> >>>> >>> Sorry, but I do not see how this problem is related to VDSM. >>> There is nothing that indicates that there is a VDSM problem. >>> >>> Has the RPC connection between Engine and VDSM failed? >>> >>> >> Further up the thread, Piotr noticed that (at least on one failure of >> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >> process was restarted. However, this does not seems to happen in all cases >> where this test fails. >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- *GAL bEN HAIM* RHV DEVOPS -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnori at redhat.com Tue Apr 10 19:53:50 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 10 Apr 2018 15:53:50 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Working on a patch will post a fix Thanks Ravi On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan wrote: > Hi all, > > Looking at the log it seems that the new GetCapabilitiesAsync is > responsible for the mess. > > - > * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* > > > > *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* > > * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) > > * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. > > catch (Throwable t) { > getParameters().getCallback().onFailure(t); > throw t; > } > > * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') > > * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. > > The following warning is printed to the log - > > WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > > > > > *- 08:30:51 a successful getCapabilitiesAsync is sent.* > > > *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * > > * SetupNetworks takes the monitoring lock. > > *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* > > * When the first request is removed from the queue ('ResponseTracker.remove()'), the > *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* > > * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. > > * The following warning log is printed - > WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > > - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. > > > Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. > > > Thanks, > > Alona. > > > > > > > On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg wrote: > >> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >> >>> >>> >>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>> >>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>> Is it still failing? >>>> >>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>> wrote: >>>> >>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>> > No, I am afraid that we have not managed to understand why setting >>>>> and >>>>> > ipv6 address too the host off the grid. We shall continue researching >>>>> > this next week. >>>>> > >>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, but >>>>> > could it possibly be related (I really doubt that)? >>>>> > >>>>> >>>> >>> Sorry, but I do not see how this problem is related to VDSM. >>> There is nothing that indicates that there is a VDSM problem. >>> >>> Has the RPC connection between Engine and VDSM failed? >>> >>> >> Further up the thread, Piotr noticed that (at least on one failure of >> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >> process was restarted. However, this does not seems to happen in all cases >> where this test fails. >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnori at redhat.com Tue Apr 10 22:04:40 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 10 Apr 2018 18:04:40 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: This [1] should fix the multiple release lock issue [1] https://gerrit.ovirt.org/#/c/90077/ On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori wrote: > Working on a patch will post a fix > > Thanks > > Ravi > > On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan wrote: > >> Hi all, >> >> Looking at the log it seems that the new GetCapabilitiesAsync is >> responsible for the mess. >> >> - >> * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* >> >> >> >> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* >> >> * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) >> >> * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. >> >> catch (Throwable t) { >> getParameters().getCallback().onFailure(t); >> throw t; >> } >> >> * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') >> >> * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. >> >> The following warning is printed to the log - >> >> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >> >> >> >> *- 08:30:51 a successful getCapabilitiesAsync is sent.* >> >> >> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * >> >> * SetupNetworks takes the monitoring lock. >> >> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* >> >> * When the first request is removed from the queue ('ResponseTracker.remove()'), the >> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* >> >> * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. >> >> * The following warning log is printed - >> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. >> >> >> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. >> >> >> Thanks, >> >> Alona. >> >> >> >> >> >> >> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg wrote: >> >>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >>> >>>> >>>> >>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>>> >>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>> Is it still failing? >>>>> >>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>>> wrote: >>>>> >>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>> and >>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>> researching >>>>>> > this next week. >>>>>> > >>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>> but >>>>>> > could it possibly be related (I really doubt that)? >>>>>> > >>>>>> >>>>> >>>> Sorry, but I do not see how this problem is related to VDSM. >>>> There is nothing that indicates that there is a VDSM problem. >>>> >>>> Has the RPC connection between Engine and VDSM failed? >>>> >>>> >>> Further up the thread, Piotr noticed that (at least on one failure of >>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>> process was restarted. However, this does not seems to happen in all cases >>> where this test fails. >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ederevea at redhat.com Wed Apr 11 02:11:14 2018 From: ederevea at redhat.com (Evgheni Dereveanchin) Date: Wed, 11 Apr 2018 04:11:14 +0200 Subject: [ovirt-devel] planned Jenkins restart Message-ID: Hi everyone, I'll be performing a planned Jenkins restart within the next hour. No new jobs will be scheduled during this maintenance period. I will inform you once it is over. Regards, Evgheni Dereveanchin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ederevea at redhat.com Wed Apr 11 02:56:17 2018 From: ederevea at redhat.com (Evgheni Dereveanchin) Date: Wed, 11 Apr 2018 04:56:17 +0200 Subject: [ovirt-devel] planned Jenkins restart In-Reply-To: References: Message-ID: Maintenance completed, Jenkins back up and running. The OS, Jenkins core and all plugins were updated: *https://ovirt-jira.atlassian.net/browse/OVIRT-1937 * As always - if you see any issues please report them to Jira. Regards, Evgheni Dereveanchin On Wed, Apr 11, 2018 at 4:11 AM, Evgheni Dereveanchin wrote: > Hi everyone, > > I'll be performing a planned Jenkins restart within the next hour. > No new jobs will be scheduled during this maintenance period. > I will inform you once it is over. > > Regards, > Evgheni Dereveanchin > -- Regards, Evgheni Dereveanchin -------------- next part -------------- An HTML attachment was scrubbed... URL: From sleviim at redhat.com Wed Apr 11 06:59:34 2018 From: sleviim at redhat.com (Shani Leviim) Date: Wed, 11 Apr 2018 09:59:34 +0300 Subject: [ovirt-devel] make check on master fails due to UnicodeDecodeError In-Reply-To: References: Message-ID: HI, *Regards,* *Shani Leviim* On Tue, Apr 10, 2018 at 5:32 PM, Nir Soffer wrote: > On Tue, Apr 10, 2018 at 5:21 PM Shani Leviim wrote: > >> Hi, >> >> Yes, I did clean the root directory but it didn't solve the issue. >> I'm currently running the tests on fedora27, using python version 2.1.14. >> >> Thanks to Dan's help, it seems that we found the root cause: >> >> I had 2 pickle files under /var/cache/vdsm/schema: vdsm-api.pickle and >> vdsm-events.pickle. >> Removing them and re-running the tests using make check was successfully >> completed. >> > > How did you have cached schema under /var/run? This directory is owned by > root. > Are you running the tests as root? > ?No, I'm running the tests over my laptop using my user.? > > This sounds like a bug in the code using the pickled schema. The pickled > should not > be used if the timestamp of the pickle do not match the timestamp of the > source. > ?There's a suspect that there's a different encoding for python 2 and python 3. While I checnged "with open(pickle_path) as f:" to "with open(pickle_path,'rb') as f: " (I was inspiered by [1]), the make check seems to complete successfully. ? ?[1] https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3 ? > Also in make check, we should not use host schema cache, but local schema > cache > generated by running "make". > > Nir > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alkaplan at redhat.com Wed Apr 11 08:41:57 2018 From: alkaplan at redhat.com (Alona Kaplan) Date: Wed, 11 Apr 2018 11:41:57 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Hi Ravi, Added comments to the patch. Regarding the lock - the lock shouldn't be released until the command and its callbacks were finished. The treatment of a delayed failure should be under lock since it doesn't make sense to take care of the failure while other monitoring process is possibly running. Besides the locking issue, IMO the main problem is the delayed failures. In case the vdsm is down, why is there an immediate exception and a delayed one? The delayed one is redundant. Anyway, 'callback.onFailure' shouldn't be executed twice. Thanks, Alona. On Wed, Apr 11, 2018 at 1:04 AM, Ravi Shankar Nori wrote: > This [1] should fix the multiple release lock issue > > [1] https://gerrit.ovirt.org/#/c/90077/ > > On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori > wrote: > >> Working on a patch will post a fix >> >> Thanks >> >> Ravi >> >> On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan >> wrote: >> >>> Hi all, >>> >>> Looking at the log it seems that the new GetCapabilitiesAsync is >>> responsible for the mess. >>> >>> - >>> * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* >>> >>> >>> >>> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* >>> >>> * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) >>> >>> * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. >>> >>> catch (Throwable t) { >>> getParameters().getCallback().onFailure(t); >>> throw t; >>> } >>> >>> * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') >>> >>> * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. >>> >>> The following warning is printed to the log - >>> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> >>> >>> >>> *- 08:30:51 a successful getCapabilitiesAsync is sent.* >>> >>> >>> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * >>> >>> * SetupNetworks takes the monitoring lock. >>> >>> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* >>> >>> * When the first request is removed from the queue ('ResponseTracker.remove()'), the >>> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* >>> >>> * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. >>> >>> * The following warning log is printed - >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. >>> >>> >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. >>> >>> >>> Thanks, >>> >>> Alona. >>> >>> >>> >>> >>> >>> >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg >>> wrote: >>> >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >>>> >>>>> >>>>> >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>>>> >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>>> Is it still failing? >>>>>> >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>>>> wrote: >>>>>> >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>>> and >>>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>>> researching >>>>>>> > this next week. >>>>>>> > >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>>> but >>>>>>> > could it possibly be related (I really doubt that)? >>>>>>> > >>>>>>> >>>>>> >>>>> Sorry, but I do not see how this problem is related to VDSM. >>>>> There is nothing that indicates that there is a VDSM problem. >>>>> >>>>> Has the RPC connection between Engine and VDSM failed? >>>>> >>>>> >>>> Further up the thread, Piotr noticed that (at least on one failure of >>>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>>> process was restarted. However, this does not seems to happen in all cases >>>> where this test fails. >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eedri at redhat.com Wed Apr 11 09:30:25 2018 From: eedri at redhat.com (Eyal Edri) Date: Wed, 11 Apr 2018 12:30:25 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: <20180410130916.GB37403@Alexandra.local> References: <20180410130916.GB37403@Alexandra.local> Message-ID: Please make sure to run as much OST suites on this patch as possible before merging ( using 'ci please build' ) On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik wrote: > Hey, > > I've created a patch[0] that is finally able to activate libvirt's > dynamic_ownership for VDSM while not negatively affecting > functionality of our storage code. > > That of course comes with quite a bit of code removal, mostly in the > area of host devices, hwrng and anything that touches devices; bunch > of test changes and one XML generation caveat (storage is handled by > VDSM, therefore disk relabelling needs to be disabled on the VDSM > level). > > Because of the scope of the patch, I welcome storage/virt/network > people to review the code and consider the implication this change has > on current/future features. > > [0] https://gerrit.ovirt.org/#/c/89830/ > > mpolednik > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsoffer at redhat.com Wed Apr 11 09:34:52 2018 From: nsoffer at redhat.com (Nir Soffer) Date: Wed, 11 Apr 2018 09:34:52 +0000 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: > Please make sure to run as much OST suites on this patch as possible > before merging ( using 'ci please build' ) > But note that OST is not a way to verify the patch. Such changes require testing with all storage types we support. Nir On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik > wrote: > >> Hey, >> >> I've created a patch[0] that is finally able to activate libvirt's >> dynamic_ownership for VDSM while not negatively affecting >> functionality of our storage code. >> >> That of course comes with quite a bit of code removal, mostly in the >> area of host devices, hwrng and anything that touches devices; bunch >> of test changes and one XML generation caveat (storage is handled by >> VDSM, therefore disk relabelling needs to be disabled on the VDSM >> level). >> >> Because of the scope of the patch, I welcome storage/virt/network >> people to review the code and consider the implication this change has >> on current/future features. >> >> [0] https://gerrit.ovirt.org/#/c/89830/ >> >> mpolednik >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > > Eyal edri > > > MANAGER > > RHV DevOps > > EMEA VIRTUALIZATION R&D > > > Red Hat EMEA > TRIED. TESTED. TRUSTED. > phone: +972-9-7692018 <+972%209-769-2018> > irc: eedri (on #tlv #rhev-dev #rhev-integ) > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From eedri at redhat.com Wed Apr 11 09:37:35 2018 From: eedri at redhat.com (Eyal Edri) Date: Wed, 11 Apr 2018 12:37:35 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: > On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: > >> Please make sure to run as much OST suites on this patch as possible >> before merging ( using 'ci please build' ) >> > > But note that OST is not a way to verify the patch. > > Such changes require testing with all storage types we support. > Well, we already have HE suite that runs on ISCSI, so at least we have NFS+ISCSI on nested, for real storage testing, you'll have to do it manually > > Nir > > On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >> wrote: >> >>> Hey, >>> >>> I've created a patch[0] that is finally able to activate libvirt's >>> dynamic_ownership for VDSM while not negatively affecting >>> functionality of our storage code. >>> >>> That of course comes with quite a bit of code removal, mostly in the >>> area of host devices, hwrng and anything that touches devices; bunch >>> of test changes and one XML generation caveat (storage is handled by >>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>> level). >>> >>> Because of the scope of the patch, I welcome storage/virt/network >>> people to review the code and consider the implication this change has >>> on current/future features. >>> >>> [0] https://gerrit.ovirt.org/#/c/89830/ >>> >>> mpolednik >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> >> -- >> >> Eyal edri >> >> >> MANAGER >> >> RHV DevOps >> >> EMEA VIRTUALIZATION R&D >> >> >> Red Hat EMEA >> TRIED. TESTED. TRUSTED. >> phone: +972-9-7692018 <+972%209-769-2018> >> irc: eedri (on #tlv #rhev-dev #rhev-integ) >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel > > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alkaplan at redhat.com Wed Apr 11 09:45:21 2018 From: alkaplan at redhat.com (Alona Kaplan) Date: Wed, 11 Apr 2018 12:45:21 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 10, 2018 at 6:52 PM, Gal Ben Haim wrote: > I'm seeing the same error in [1], during 006_migrations.migrate_vm. > > [1] http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/ > Seems like another bug. The migration failed since for some reason the vm is already defined on the destination host. 2018-04-10 11:08:08,685-0400 ERROR (jsonrpc/0) [api] FINISH create error=Virtual machine already exists (api:129) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in create raise exception.VMExists() VMExists: Virtual machine already exists > > > On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan wrote: > >> Hi all, >> >> Looking at the log it seems that the new GetCapabilitiesAsync is >> responsible for the mess. >> >> - >> * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* >> >> >> >> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* >> >> * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) >> >> * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. >> >> catch (Throwable t) { >> getParameters().getCallback().onFailure(t); >> throw t; >> } >> >> * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') >> >> * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. >> >> The following warning is printed to the log - >> >> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >> >> >> >> *- 08:30:51 a successful getCapabilitiesAsync is sent.* >> >> >> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * >> >> * SetupNetworks takes the monitoring lock. >> >> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* >> >> * When the first request is removed from the queue ('ResponseTracker.remove()'), the >> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* >> >> * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. >> >> * The following warning log is printed - >> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. >> >> >> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. >> >> >> Thanks, >> >> Alona. >> >> >> >> >> >> >> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg wrote: >> >>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >>> >>>> >>>> >>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>>> >>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>> Is it still failing? >>>>> >>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>>> wrote: >>>>> >>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>> and >>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>> researching >>>>>> > this next week. >>>>>> > >>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>> but >>>>>> > could it possibly be related (I really doubt that)? >>>>>> > >>>>>> >>>>> >>>> Sorry, but I do not see how this problem is related to VDSM. >>>> There is nothing that indicates that there is a VDSM problem. >>>> >>>> Has the RPC connection between Engine and VDSM failed? >>>> >>>> >>> Further up the thread, Piotr noticed that (at least on one failure of >>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>> process was restarted. However, this does not seems to happen in all cases >>> where this test fails. >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > *GAL bEN HAIM* > RHV DEVOPS > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahadas at redhat.com Wed Apr 11 10:01:36 2018 From: ahadas at redhat.com (Arik Hadas) Date: Wed, 11 Apr 2018 13:01:36 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Wed, Apr 11, 2018 at 12:45 PM, Alona Kaplan wrote: > > > On Tue, Apr 10, 2018 at 6:52 PM, Gal Ben Haim wrote: > >> I'm seeing the same error in [1], during 006_migrations.migrate_vm. >> >> [1] http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/ >> > > Seems like another bug. The migration failed since for some reason the vm > is already defined on the destination host. > > 2018-04-10 11:08:08,685-0400 ERROR (jsonrpc/0) [api] FINISH create > error=Virtual machine already exists (api:129) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in > method > ret = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in create > raise exception.VMExists() > VMExists: Virtual machine already exists > > Milan, Francesco, could it be that because of [1] that appears on the destination host right after shutting down the VM, it remained defined on that host? [1] 2018-04-10 11:01:40,005-0400 ERROR (libvirt/events) [vds] Error running VM callback (clientIF:683) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 646, in dispatchLibvirtEvents v.onLibvirtLifecycleEvent(event, detail, None) AttributeError: 'NoneType' object has no attribute 'onLibvirtLifecycleEvent' > > >> >> >> On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan >> wrote: >> >>> Hi all, >>> >>> Looking at the log it seems that the new GetCapabilitiesAsync is >>> responsible for the mess. >>> >>> - >>> * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* >>> >>> >>> >>> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* >>> >>> * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) >>> >>> * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. >>> >>> catch (Throwable t) { >>> getParameters().getCallback().onFailure(t); >>> throw t; >>> } >>> >>> * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') >>> >>> * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. >>> >>> The following warning is printed to the log - >>> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> >>> >>> >>> *- 08:30:51 a successful getCapabilitiesAsync is sent.* >>> >>> >>> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * >>> >>> * SetupNetworks takes the monitoring lock. >>> >>> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* >>> >>> * When the first request is removed from the queue ('ResponseTracker.remove()'), the >>> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* >>> >>> * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. >>> >>> * The following warning log is printed - >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. >>> >>> >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. >>> >>> >>> Thanks, >>> >>> Alona. >>> >>> >>> >>> >>> >>> >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg >>> wrote: >>> >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >>>> >>>>> >>>>> >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>>>> >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>>> Is it still failing? >>>>>> >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>>>> wrote: >>>>>> >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>>> and >>>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>>> researching >>>>>>> > this next week. >>>>>>> > >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>>> but >>>>>>> > could it possibly be related (I really doubt that)? >>>>>>> > >>>>>>> >>>>>> >>>>> Sorry, but I do not see how this problem is related to VDSM. >>>>> There is nothing that indicates that there is a VDSM problem. >>>>> >>>>> Has the RPC connection between Engine and VDSM failed? >>>>> >>>>> >>>> Further up the thread, Piotr noticed that (at least on one failure of >>>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>>> process was restarted. However, this does not seems to happen in all cases >>>> where this test fails. >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> >> -- >> *GAL bEN HAIM* >> RHV DEVOPS >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sleviim at redhat.com Wed Apr 11 10:55:48 2018 From: sleviim at redhat.com (Shani Leviim) Date: Wed, 11 Apr 2018 13:55:48 +0300 Subject: [ovirt-devel] make check on master fails due to UnicodeDecodeError In-Reply-To: References: Message-ID: A patch was uploaded: https://gerrit.ovirt.org/#/c/90093/ *Regards,* *Shani Leviim* On Wed, Apr 11, 2018 at 9:59 AM, Shani Leviim wrote: > HI, > > > > *Regards,* > > *Shani Leviim* > > On Tue, Apr 10, 2018 at 5:32 PM, Nir Soffer wrote: > >> On Tue, Apr 10, 2018 at 5:21 PM Shani Leviim wrote: >> >>> Hi, >>> >>> Yes, I did clean the root directory but it didn't solve the issue. >>> I'm currently running the tests on fedora27, using python version 2.1.14. >>> >>> Thanks to Dan's help, it seems that we found the root cause: >>> >>> I had 2 pickle files under /var/cache/vdsm/schema: vdsm-api.pickle and >>> vdsm-events.pickle. >>> Removing them and re-running the tests using make check was successfully >>> completed. >>> >> >> How did you have cached schema under /var/run? This directory is owned by >> root. >> Are you running the tests as root? >> > ?No, I'm running the tests over my laptop using my user.? > >> >> This sounds like a bug in the code using the pickled schema. The pickled >> should not >> be used if the timestamp of the pickle do not match the timestamp of the >> source. >> > > ?There's a suspect that there's a different encoding for python 2 and > python 3. > While I checnged "with open(pickle_path) as f:" to "with > open(pickle_path,'rb') as f: " (I was inspiered by [1]), > the make check seems to complete successfully. > ? > > ?[1] https://stackoverflow.com/questions/28218466/unpickling- > a-python-2-object-with-python-3? > > >> Also in make check, we should not use host schema cache, but local schema >> cache >> generated by running "make". >> >> Nir >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsoffer at redhat.com Wed Apr 11 12:27:52 2018 From: nsoffer at redhat.com (Nir Soffer) Date: Wed, 11 Apr 2018 12:27:52 +0000 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: On Wed, Apr 11, 2018 at 12:38 PM Eyal Edri wrote: > On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: > >> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >> >>> Please make sure to run as much OST suites on this patch as possible >>> before merging ( using 'ci please build' ) >>> >> >> But note that OST is not a way to verify the patch. >> >> Such changes require testing with all storage types we support. >> > > Well, we already have HE suite that runs on ISCSI, so at least we have > NFS+ISCSI on nested, > for real storage testing, you'll have to do it manually > We need glusterfs (both native and fuse based), and cinder/ceph storage. But we cannot practically test all flows with all types of storage for every patch. Nir > > >> >> Nir >> >> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>> wrote: >>> >>>> Hey, >>>> >>>> I've created a patch[0] that is finally able to activate libvirt's >>>> dynamic_ownership for VDSM while not negatively affecting >>>> functionality of our storage code. >>>> >>>> That of course comes with quite a bit of code removal, mostly in the >>>> area of host devices, hwrng and anything that touches devices; bunch >>>> of test changes and one XML generation caveat (storage is handled by >>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>> level). >>>> >>>> Because of the scope of the patch, I welcome storage/virt/network >>>> people to review the code and consider the implication this change has >>>> on current/future features. >>>> >>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>> >>>> mpolednik >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> >>> -- >>> >>> Eyal edri >>> >>> >>> MANAGER >>> >>> RHV DevOps >>> >>> EMEA VIRTUALIZATION R&D >>> >>> >>> Red Hat EMEA >>> TRIED. TESTED. TRUSTED. >>> >>> phone: +972-9-7692018 <+972%209-769-2018> >>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >> >> > > > -- > > Eyal edri > > > MANAGER > > RHV DevOps > > EMEA VIRTUALIZATION R&D > > > Red Hat EMEA > TRIED. TESTED. TRUSTED. > phone: +972-9-7692018 <+972%209-769-2018> > irc: eedri (on #tlv #rhev-dev #rhev-integ) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Wed Apr 11 12:30:02 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Wed, 11 Apr 2018 14:30:02 +0200 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: <20180411123000.GA43553@Alexandra.local> On 11/04/18 12:27 +0000, Nir Soffer wrote: >On Wed, Apr 11, 2018 at 12:38 PM Eyal Edri wrote: > >> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: >> >>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>> >>>> Please make sure to run as much OST suites on this patch as possible >>>> before merging ( using 'ci please build' ) >>>> >>> >>> But note that OST is not a way to verify the patch. >>> >>> Such changes require testing with all storage types we support. >>> >> >> Well, we already have HE suite that runs on ISCSI, so at least we have >> NFS+ISCSI on nested, >> for real storage testing, you'll have to do it manually >> > >We need glusterfs (both native and fuse based), and cinder/ceph storage. > >But we cannot practically test all flows with all types of storage for >every patch. That leads to a question - how do I go around verifying such patch without sufficient environment? Is there someone from storage QA that could assist with this? >Nir > > >> >> >>> >>> Nir >>> >>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>> dynamic_ownership for VDSM while not negatively affecting >>>>> functionality of our storage code. >>>>> >>>>> That of course comes with quite a bit of code removal, mostly in the >>>>> area of host devices, hwrng and anything that touches devices; bunch >>>>> of test changes and one XML generation caveat (storage is handled by >>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>> level). >>>>> >>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>> people to review the code and consider the implication this change has >>>>> on current/future features. >>>>> >>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>> >>>>> mpolednik >>>>> _______________________________________________ >>>>> Devel mailing list >>>>> Devel at ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Eyal edri >>>> >>>> >>>> MANAGER >>>> >>>> RHV DevOps >>>> >>>> EMEA VIRTUALIZATION R&D >>>> >>>> >>>> Red Hat EMEA >>>> TRIED. TESTED. TRUSTED. >>>> >>>> phone: +972-9-7692018 <+972%209-769-2018> >>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>> >>> >> >> >> -- >> >> Eyal edri >> >> >> MANAGER >> >> RHV DevOps >> >> EMEA VIRTUALIZATION R&D >> >> >> Red Hat EMEA >> TRIED. TESTED. TRUSTED. >> phone: +972-9-7692018 <+972%209-769-2018> >> irc: eedri (on #tlv #rhev-dev #rhev-integ) >> From nsoffer at redhat.com Wed Apr 11 12:35:39 2018 From: nsoffer at redhat.com (Nir Soffer) Date: Wed, 11 Apr 2018 12:35:39 +0000 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: <20180411123000.GA43553@Alexandra.local> References: <20180410130916.GB37403@Alexandra.local> <20180411123000.GA43553@Alexandra.local> Message-ID: On Wed, Apr 11, 2018 at 3:30 PM Martin Polednik wrote: > On 11/04/18 12:27 +0000, Nir Soffer wrote: > >On Wed, Apr 11, 2018 at 12:38 PM Eyal Edri wrote: > > > >> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer > wrote: > >> > >>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: > >>> > >>>> Please make sure to run as much OST suites on this patch as possible > >>>> before merging ( using 'ci please build' ) > >>>> > >>> > >>> But note that OST is not a way to verify the patch. > >>> > >>> Such changes require testing with all storage types we support. > >>> > >> > >> Well, we already have HE suite that runs on ISCSI, so at least we have > >> NFS+ISCSI on nested, > >> for real storage testing, you'll have to do it manually > >> > > > >We need glusterfs (both native and fuse based), and cinder/ceph storage. > > > >But we cannot practically test all flows with all types of storage for > >every patch. > > That leads to a question - how do I go around verifying such patch > without sufficient environment? Is there someone from storage QA that > could assist with this? > Good question! I hope Denis can help with verifying the glusterfs changes. With cinder/ceph, maybe Elad can provide a setup for testing, or run some automation tests on the patch? Elad also have other automated tests for NFS/iSCSI that are worth running before we merge such changes. Nir > > >Nir > > > > > >> > >> > >>> > >>> Nir > >>> > >>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik > > >>>> wrote: > >>>> > >>>>> Hey, > >>>>> > >>>>> I've created a patch[0] that is finally able to activate libvirt's > >>>>> dynamic_ownership for VDSM while not negatively affecting > >>>>> functionality of our storage code. > >>>>> > >>>>> That of course comes with quite a bit of code removal, mostly in the > >>>>> area of host devices, hwrng and anything that touches devices; bunch > >>>>> of test changes and one XML generation caveat (storage is handled by > >>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM > >>>>> level). > >>>>> > >>>>> Because of the scope of the patch, I welcome storage/virt/network > >>>>> people to review the code and consider the implication this change > has > >>>>> on current/future features. > >>>>> > >>>>> [0] https://gerrit.ovirt.org/#/c/89830/ > >>>>> > >>>>> mpolednik > >>>>> _______________________________________________ > >>>>> Devel mailing list > >>>>> Devel at ovirt.org > >>>>> http://lists.ovirt.org/mailman/listinfo/devel > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> > >>>> Eyal edri > >>>> > >>>> > >>>> MANAGER > >>>> > >>>> RHV DevOps > >>>> > >>>> EMEA VIRTUALIZATION R&D > >>>> > >>>> > >>>> Red Hat EMEA > >>>> TRIED. TESTED. TRUSTED. > >>>> > >>>> phone: +972-9-7692018 <+972%209-769-2018> <+972%209-769-2018> > >>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) > >>>> _______________________________________________ > >>>> Devel mailing list > >>>> Devel at ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/devel > >>> > >>> > >> > >> > >> -- > >> > >> Eyal edri > >> > >> > >> MANAGER > >> > >> RHV DevOps > >> > >> EMEA VIRTUALIZATION R&D > >> > >> > >> Red Hat EMEA > >> TRIED. TESTED. TRUSTED. < > https://redhat.com/trusted> > >> phone: +972-9-7692018 <+972%209-769-2018> <+972%209-769-2018> > >> irc: eedri (on #tlv #rhev-dev #rhev-integ) > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mzamazal at redhat.com Wed Apr 11 12:52:59 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Wed, 11 Apr 2018 14:52:59 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: (Arik Hadas's message of "Wed, 11 Apr 2018 13:01:36 +0300") References: Message-ID: <87r2nl52hg.fsf@redhat.com> Arik Hadas writes: > On Wed, Apr 11, 2018 at 12:45 PM, Alona Kaplan wrote: > >> >> >> On Tue, Apr 10, 2018 at 6:52 PM, Gal Ben Haim wrote: >> >>> I'm seeing the same error in [1], during 006_migrations.migrate_vm. >>> >>> [1] http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/ >>> >> >> Seems like another bug. The migration failed since for some reason the vm >> is already defined on the destination host. >> >> 2018-04-10 11:08:08,685-0400 ERROR (jsonrpc/0) [api] FINISH create >> error=Virtual machine already exists (api:129) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in >> method >> ret = func(*args, **kwargs) >> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in create >> raise exception.VMExists() >> VMExists: Virtual machine already exists >> >> > Milan, Francesco, could it be that because of [1] that appears on the > destination host right after shutting down the VM, it remained defined on > that host? I can't see any destroy call in the logs after the successful preceding migration from the given host. That would explain ?VMExists? error. > [1] 2018-04-10 11:01:40,005-0400 ERROR (libvirt/events) [vds] Error running > VM callback (clientIF:683) > > Traceback (most recent call last): > > File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 646, in > dispatchLibvirtEvents > > v.onLibvirtLifecycleEvent(event, detail, None) > > AttributeError: 'NoneType' object has no attribute 'onLibvirtLifecycleEvent' That means that a life cycle event on an unknown VM has arrived, in this case apparently destroy event, following the destroy call after the failed incoming migration. The reported AttributeError is a minor bug, already fixed in master. So it's most likely unrelated to the discussed problem. >>> On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan >>> wrote: >>> >>>> Hi all, >>>> >>>> Looking at the log it seems that the new GetCapabilitiesAsync is >>>> responsible for the mess. >>>> >>>> - >>>> * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* >>>> >>>> >>>> >>>> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* >>>> >>>> * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) >>>> >>>> * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. >>>> >>>> catch (Throwable t) { >>>> getParameters().getCallback().onFailure(t); >>>> throw t; >>>> } >>>> >>>> * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') >>>> >>>> * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. >>>> >>>> The following warning is printed to the log - >>>> >>>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>>> >>>> >>>> >>>> >>>> *- 08:30:51 a successful getCapabilitiesAsync is sent.* >>>> >>>> >>>> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * >>>> >>>> * SetupNetworks takes the monitoring lock. >>>> >>>> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* >>>> >>>> * When the first request is removed from the queue ('ResponseTracker.remove()'), the >>>> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* >>>> >>>> * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. >>>> >>>> * The following warning log is printed - >>>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>>> >>>> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. >>>> >>>> >>>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. >>>> >>>> >>>> Thanks, >>>> >>>> Alona. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg >>>> wrote: >>>> >>>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >>>>> >>>>>> >>>>>> >>>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>>>>> >>>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>>>> Is it still failing? >>>>>>> >>>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>>>>> wrote: >>>>>>> >>>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>>>> and >>>>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>>>> researching >>>>>>>> > this next week. >>>>>>>> > >>>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>>>> but >>>>>>>> > could it possibly be related (I really doubt that)? >>>>>>>> > >>>>>>>> >>>>>>> >>>>>> Sorry, but I do not see how this problem is related to VDSM. >>>>>> There is nothing that indicates that there is a VDSM problem. >>>>>> >>>>>> Has the RPC connection between Engine and VDSM failed? >>>>>> >>>>>> >>>>> Further up the thread, Piotr noticed that (at least on one failure of >>>>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>>>> process was restarted. However, this does not seems to happen in all cases >>>>> where this test fails. >>>>> >>>>> _______________________________________________ >>>>> Devel mailing list >>>>> Devel at ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> >>> -- >>> *GAL bEN HAIM* >>> RHV DEVOPS >>> >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> From ykaul at redhat.com Wed Apr 11 13:12:12 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Wed, 11 Apr 2018 16:12:12 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: On Wed, Apr 11, 2018 at 3:27 PM, Nir Soffer wrote: > On Wed, Apr 11, 2018 at 12:38 PM Eyal Edri wrote: > >> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: >> >>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>> >>>> Please make sure to run as much OST suites on this patch as possible >>>> before merging ( using 'ci please build' ) >>>> >>> >>> But note that OST is not a way to verify the patch. >>> >>> Such changes require testing with all storage types we support. >>> >> >> Well, we already have HE suite that runs on ISCSI, so at least we have >> NFS+ISCSI on nested, >> for real storage testing, you'll have to do it manually >> > > We need glusterfs (both native and fuse based), and cinder/ceph storage. > We have Gluster in o-s-t as well, as part of the HC suite. It doesn't use Fuse though. > > But we cannot practically test all flows with all types of storage for > every patch. > Indeed. But we could add easily do some, and we should at least execute the minimal set that we are able to easily via o-s-t. Y. > > Nir > > >> >> >>> >>> Nir >>> >>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>> dynamic_ownership for VDSM while not negatively affecting >>>>> functionality of our storage code. >>>>> >>>>> That of course comes with quite a bit of code removal, mostly in the >>>>> area of host devices, hwrng and anything that touches devices; bunch >>>>> of test changes and one XML generation caveat (storage is handled by >>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>> level). >>>>> >>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>> people to review the code and consider the implication this change has >>>>> on current/future features. >>>>> >>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>> >>>>> mpolednik >>>>> _______________________________________________ >>>>> Devel mailing list >>>>> Devel at ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Eyal edri >>>> >>>> >>>> MANAGER >>>> >>>> RHV DevOps >>>> >>>> EMEA VIRTUALIZATION R&D >>>> >>>> >>>> Red Hat EMEA >>>> TRIED. TESTED. TRUSTED. >>>> >>>> phone: +972-9-7692018 <+972%209-769-2018> >>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>> >>> >> >> >> -- >> >> Eyal edri >> >> >> MANAGER >> >> RHV DevOps >> >> EMEA VIRTUALIZATION R&D >> >> >> Red Hat EMEA >> TRIED. TESTED. TRUSTED. >> phone: +972-9-7692018 <+972%209-769-2018> >> irc: eedri (on #tlv #rhev-dev #rhev-integ) >> > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Wed Apr 11 13:28:53 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Wed, 11 Apr 2018 16:28:53 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: > On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: > >> Please make sure to run as much OST suites on this patch as possible >> before merging ( using 'ci please build' ) >> > > But note that OST is not a way to verify the patch. > > Such changes require testing with all storage types we support. > > Nir > > On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >> wrote: >> >>> Hey, >>> >>> I've created a patch[0] that is finally able to activate libvirt's >>> dynamic_ownership for VDSM while not negatively affecting >>> functionality of our storage code. >>> >>> That of course comes with quite a bit of code removal, mostly in the >>> area of host devices, hwrng and anything that touches devices; bunch >>> of test changes and one XML generation caveat (storage is handled by >>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>> level). >>> >>> Because of the scope of the patch, I welcome storage/virt/network >>> people to review the code and consider the implication this change has >>> on current/future features. >>> >>> [0] https://gerrit.ovirt.org/#/c/89830/ >>> >> In particular: dynamic_ownership was set to 0 prehistorically (as part of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because libvirt, running as root, was not able to play properly with root-squash nfs mounts. Have you attempted this use case? I join to Nir's request to run this with storage QE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Wed Apr 11 13:43:05 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Wed, 11 Apr 2018 15:43:05 +0200 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: <20180411134243.GB43553@Alexandra.local> On 11/04/18 16:28 +0300, Dan Kenigsberg wrote: >On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: > >> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >> >>> Please make sure to run as much OST suites on this patch as possible >>> before merging ( using 'ci please build' ) >>> >> >> But note that OST is not a way to verify the patch. >> >> Such changes require testing with all storage types we support. >> >> Nir >> >> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>> wrote: >>> >>>> Hey, >>>> >>>> I've created a patch[0] that is finally able to activate libvirt's >>>> dynamic_ownership for VDSM while not negatively affecting >>>> functionality of our storage code. >>>> >>>> That of course comes with quite a bit of code removal, mostly in the >>>> area of host devices, hwrng and anything that touches devices; bunch >>>> of test changes and one XML generation caveat (storage is handled by >>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>> level). >>>> >>>> Because of the scope of the patch, I welcome storage/virt/network >>>> people to review the code and consider the implication this change has >>>> on current/future features. >>>> >>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>> >>> >In particular: dynamic_ownership was set to 0 prehistorically (as part of >https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because libvirt, >running as root, was not able to play properly with root-squash nfs mounts. > >Have you attempted this use case? I have not. Added this to my to-do list. The important part to note about this patch (compared to my previous attempts in the past) is that it explicitly disables dynamic_ownership for FILE/BLOCK-backed disks. That means, unless `seclabel` is broken on libivrt side, the behavior would be unchanged for storage. >I join to Nir's request to run this with storage QE. From ebenahar at redhat.com Wed Apr 11 13:52:20 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Wed, 11 Apr 2018 16:52:20 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, will have to check, since usually, we don't execute our automation on them. On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir wrote: > +Elad > > On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg wrote: > >> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: >> >>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>> >>>> Please make sure to run as much OST suites on this patch as possible >>>> before merging ( using 'ci please build' ) >>>> >>> >>> But note that OST is not a way to verify the patch. >>> >>> Such changes require testing with all storage types we support. >>> >>> Nir >>> >>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>> dynamic_ownership for VDSM while not negatively affecting >>>>> functionality of our storage code. >>>>> >>>>> That of course comes with quite a bit of code removal, mostly in the >>>>> area of host devices, hwrng and anything that touches devices; bunch >>>>> of test changes and one XML generation caveat (storage is handled by >>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>> level). >>>>> >>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>> people to review the code and consider the implication this change has >>>>> on current/future features. >>>>> >>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>> >>>> >> In particular: dynamic_ownership was set to 0 prehistorically (as part >> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because libvirt, >> running as root, was not able to play properly with root-squash nfs mounts. >> >> Have you attempted this use case? >> >> I join to Nir's request to run this with storage QE. >> > > > > -- > > > Raz Tamir > Manager, RHV QE > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mzamazal at redhat.com Thu Apr 12 13:17:19 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Thu, 12 Apr 2018 15:17:19 +0200 Subject: [ovirt-devel] OST: Enabling DEBUG logging level in Vdsm? Message-ID: <87efjk7ee8.fsf@redhat.com> Hi, it's quite inconvenient that DEBUG messages are missing in OST Vdsm logs. Would it be possible to enable them some way? Thanks, Milan From dron at redhat.com Thu Apr 12 15:15:23 2018 From: dron at redhat.com (Dafna Ron) Date: Thu, 12 Apr 2018 16:15:23 +0100 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: <87r2nl52hg.fsf@redhat.com> References: <87r2nl52hg.fsf@redhat.com> Message-ID: hi, we are failing randomly on test 006_migrations.migrate_vm with what seems to be the same issue. the vm seems to be migrated successfully but engine thinks that it failed and re-calls migration getting a response of vm already exists. I don't think this is an issue with the test but rather a regression so I opened a bug: https://bugzilla.redhat.com/show_bug.cgi?id=1566594 Thanks, Dafna On Wed, Apr 11, 2018 at 1:52 PM, Milan Zamazal wrote: > Arik Hadas writes: > > > On Wed, Apr 11, 2018 at 12:45 PM, Alona Kaplan > wrote: > > > >> > >> > >> On Tue, Apr 10, 2018 at 6:52 PM, Gal Ben Haim > wrote: > >> > >>> I'm seeing the same error in [1], during 006_migrations.migrate_vm. > >>> > >>> [1] http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/ > >>> > >> > >> Seems like another bug. The migration failed since for some reason the > vm > >> is already defined on the destination host. > >> > >> 2018-04-10 11:08:08,685-0400 ERROR (jsonrpc/0) [api] FINISH create > >> error=Virtual machine already exists (api:129) > >> Traceback (most recent call last): > >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, > in > >> method > >> ret = func(*args, **kwargs) > >> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in > create > >> raise exception.VMExists() > >> VMExists: Virtual machine already exists > >> > >> > > Milan, Francesco, could it be that because of [1] that appears on the > > destination host right after shutting down the VM, it remained defined on > > that host? > > I can't see any destroy call in the logs after the successful preceding > migration from the given host. That would explain ?VMExists? error. > > > [1] 2018-04-10 11:01:40,005-0400 ERROR (libvirt/events) [vds] Error > running > > VM callback (clientIF:683) > > > > Traceback (most recent call last): > > > > File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 646, in > > dispatchLibvirtEvents > > > > v.onLibvirtLifecycleEvent(event, detail, None) > > > > AttributeError: 'NoneType' object has no attribute > 'onLibvirtLifecycleEvent' > > That means that a life cycle event on an unknown VM has arrived, in this > case apparently destroy event, following the destroy call after the > failed incoming migration. The reported AttributeError is a minor bug, > already fixed in master. So it's most likely unrelated to the discussed > problem. > > >>> On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan > >>> wrote: > >>> > >>>> Hi all, > >>>> > >>>> Looking at the log it seems that the new GetCapabilitiesAsync is > >>>> responsible for the mess. > >>>> > >>>> - > >>>> * 08:29:47 - engine loses connectivity to host > 'lago-basic-suite-4-2-host-0'.* > >>>> > >>>> > >>>> > >>>> *- Every 3 seconds a getCapabalititiesAsync request is sent to the > host (unsuccessfully).* > >>>> > >>>> * before each "getCapabilitiesAsync" the monitoring lock is > taken (VdsManager,refreshImpl) > >>>> > >>>> * "getCapabilitiesAsync" immediately fails and throws > 'VDSNetworkException: java.net.ConnectException: Connection refused'. The > exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' > which calls 'onFailure' of the callback and re-throws the exception. > >>>> > >>>> catch (Throwable t) { > >>>> getParameters().getCallback().onFailure(t); > >>>> throw t; > >>>> } > >>>> > >>>> * The 'onFailure' of the callback releases the "monitoringLock" > ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) > lockManager.releaseLock(monitoringLock);') > >>>> > >>>> * 'VdsManager,refreshImpl' catches the network exception, marks > 'releaseLock = true' and *tries to release the already released lock*. > >>>> > >>>> The following warning is printed to the log - > >>>> > >>>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release > exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2- > c4aa4e19bd93VDS_INIT' > >>>> > >>>> > >>>> > >>>> > >>>> *- 08:30:51 a successful getCapabilitiesAsync is sent.* > >>>> > >>>> > >>>> *- 08:32:55 - The failing test starts (Setup Networks for setting > ipv6). * > >>>> > >>>> * SetupNetworks takes the monitoring lock. > >>>> > >>>> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync > requests from 4 minutes ago from its queue and prints a > VDSNetworkException: Vds timeout occured.* > >>>> > >>>> * When the first request is removed from the queue > ('ResponseTracker.remove()'), the > >>>> *'Callback.onFailure' is invoked (for the second time) -> monitoring > lock is released (the lock taken by the SetupNetworks!).* > >>>> > >>>> * *The other requests removed from the queue also try to > release the monitoring lock*, but there is nothing to release. > >>>> > >>>> * The following warning log is printed - > >>>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release > exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2- > c4aa4e19bd93VDS_INIT' > >>>> > >>>> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is > started*. Why? I'm not 100% sure but I guess the late processing of the > 'getCapabilitiesAsync' that causes losing of the monitoring lock and the > late + mupltiple processing of failure is root cause. > >>>> > >>>> > >>>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is > trying to be released three times. Please share your opinion regarding how > it should be fixed. > >>>> > >>>> > >>>> Thanks, > >>>> > >>>> Alona. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg > >>>> wrote: > >>>> > >>>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas > wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: > >>>>>> > >>>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. > >>>>>>> Is it still failing? > >>>>>>> > >>>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren > >>>>>>> wrote: > >>>>>>> > >>>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg > wrote: > >>>>>>>> > No, I am afraid that we have not managed to understand why > setting > >>>>>>>> and > >>>>>>>> > ipv6 address too the host off the grid. We shall continue > >>>>>>>> researching > >>>>>>>> > this next week. > >>>>>>>> > > >>>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks > old, > >>>>>>>> but > >>>>>>>> > could it possibly be related (I really doubt that)? > >>>>>>>> > > >>>>>>>> > >>>>>>> > >>>>>> Sorry, but I do not see how this problem is related to VDSM. > >>>>>> There is nothing that indicates that there is a VDSM problem. > >>>>>> > >>>>>> Has the RPC connection between Engine and VDSM failed? > >>>>>> > >>>>>> > >>>>> Further up the thread, Piotr noticed that (at least on one failure of > >>>>> this test) that the Vdsm host lost connectivity to its storage, and > Vdsm > >>>>> process was restarted. However, this does not seems to happen in all > cases > >>>>> where this test fails. > >>>>> > >>>>> _______________________________________________ > >>>>> Devel mailing list > >>>>> Devel at ovirt.org > >>>>> http://lists.ovirt.org/mailman/listinfo/devel > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Devel mailing list > >>>> Devel at ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/devel > >>>> > >>> > >>> > >>> > >>> -- > >>> *GAL bEN HAIM* > >>> RHV DEVOPS > >>> > >> > >> > >> _______________________________________________ > >> Devel mailing list > >> Devel at ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/devel > >> > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michal.skrivanek at redhat.com Thu Apr 12 16:30:12 2018 From: michal.skrivanek at redhat.com (Michal Skrivanek) Date: Thu, 12 Apr 2018 18:30:12 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: <87r2nl52hg.fsf@redhat.com> Message-ID: <16AA5304-6E97-44A5-BDA7-E0B68DAC9632@redhat.com> > On 12 Apr 2018, at 17:15, Dafna Ron wrote: > > hi, > > we are failing randomly on test 006_migrations.migrate_vm with what seems to be the same issue. > > the vm seems to be migrated successfully but engine thinks that it failed and re-calls migration getting a response of vm already exists. > > I don't think this is an issue with the test but rather a regression so I opened a bug: i do not think so, I?ve heard someone removed a test in between migrating A->B and B->A? If that?s the case that is the real issue. You can?t migrate back to A without waiting for A to be cleared out properly https://gerrit.ovirt.org/#/c/90166/ should fix it > > https://bugzilla.redhat.com/show_bug.cgi?id=1566594 > > Thanks, > Dafna > > > On Wed, Apr 11, 2018 at 1:52 PM, Milan Zamazal > wrote: > Arik Hadas > writes: > > > On Wed, Apr 11, 2018 at 12:45 PM, Alona Kaplan > wrote: > > > >> > >> > >> On Tue, Apr 10, 2018 at 6:52 PM, Gal Ben Haim > wrote: > >> > >>> I'm seeing the same error in [1], during 006_migrations.migrate_vm. > >>> > >>> [1] http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/ > >>> > >> > >> Seems like another bug. The migration failed since for some reason the vm > >> is already defined on the destination host. > >> > >> 2018-04-10 11:08:08,685-0400 ERROR (jsonrpc/0) [api] FINISH create > >> error=Virtual machine already exists (api:129) > >> Traceback (most recent call last): > >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in > >> method > >> ret = func(*args, **kwargs) > >> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 191, in create > >> raise exception.VMExists() > >> VMExists: Virtual machine already exists > >> > >> > > Milan, Francesco, could it be that because of [1] that appears on the > > destination host right after shutting down the VM, it remained defined on > > that host? > > I can't see any destroy call in the logs after the successful preceding > migration from the given host. That would explain ?VMExists? error. > > > [1] 2018-04-10 11:01:40,005-0400 ERROR (libvirt/events) [vds] Error running > > VM callback (clientIF:683) > > > > Traceback (most recent call last): > > > > File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 646, in > > dispatchLibvirtEvents > > > > v.onLibvirtLifecycleEvent(event, detail, None) > > > > AttributeError: 'NoneType' object has no attribute 'onLibvirtLifecycleEvent' > > That means that a life cycle event on an unknown VM has arrived, in this > case apparently destroy event, following the destroy call after the > failed incoming migration. The reported AttributeError is a minor bug, > already fixed in master. So it's most likely unrelated to the discussed > problem. > > >>> On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan > > >>> wrote: > >>> > >>>> Hi all, > >>>> > >>>> Looking at the log it seems that the new GetCapabilitiesAsync is > >>>> responsible for the mess. > >>>> > >>>> - > >>>> * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.* > >>>> > >>>> > >>>> > >>>> *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).* > >>>> > >>>> * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl) > >>>> > >>>> * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception. > >>>> > >>>> catch (Throwable t) { > >>>> getParameters().getCallback().onFailure(t); > >>>> throw t; > >>>> } > >>>> > >>>> * The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) lockManager.releaseLock(monitoringLock);') > >>>> > >>>> * 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*. > >>>> > >>>> The following warning is printed to the log - > >>>> > >>>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > >>>> > >>>> > >>>> > >>>> > >>>> *- 08:30:51 a successful getCapabilitiesAsync is sent.* > >>>> > >>>> > >>>> *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6). * > >>>> > >>>> * SetupNetworks takes the monitoring lock. > >>>> > >>>> *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.* > >>>> > >>>> * When the first request is removed from the queue ('ResponseTracker.remove()'), the > >>>> *'Callback.onFailure' is invoked (for the second time) -> monitoring lock is released (the lock taken by the SetupNetworks!).* > >>>> > >>>> * *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release. > >>>> > >>>> * The following warning log is printed - > >>>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > >>>> > >>>> - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause. > >>>> > >>>> > >>>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed. > >>>> > >>>> > >>>> Thanks, > >>>> > >>>> Alona. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg > > >>>> wrote: > >>>> > >>>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas > wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri > wrote: > >>>>>> > >>>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851 . > >>>>>>> Is it still failing? > >>>>>>> > >>>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren > > >>>>>>> wrote: > >>>>>>> > >>>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg > wrote: > >>>>>>>> > No, I am afraid that we have not managed to understand why setting > >>>>>>>> and > >>>>>>>> > ipv6 address too the host off the grid. We shall continue > >>>>>>>> researching > >>>>>>>> > this next week. > >>>>>>>> > > >>>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, > >>>>>>>> but > >>>>>>>> > could it possibly be related (I really doubt that)? > >>>>>>>> > > >>>>>>>> > >>>>>>> > >>>>>> Sorry, but I do not see how this problem is related to VDSM. > >>>>>> There is nothing that indicates that there is a VDSM problem. > >>>>>> > >>>>>> Has the RPC connection between Engine and VDSM failed? > >>>>>> > >>>>>> > >>>>> Further up the thread, Piotr noticed that (at least on one failure of > >>>>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm > >>>>> process was restarted. However, this does not seems to happen in all cases > >>>>> where this test fails. > >>>>> > >>>>> _______________________________________________ > >>>>> Devel mailing list > >>>>> Devel at ovirt.org > >>>>> http://lists.ovirt.org/mailman/listinfo/devel > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Devel mailing list > >>>> Devel at ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/devel > >>>> > >>> > >>> > >>> > >>> -- > >>> *GAL bEN HAIM* > >>> RHV DEVOPS > >>> > >> > >> > >> _______________________________________________ > >> Devel mailing list > >> Devel at ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/devel > >> > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From fromani at redhat.com Fri Apr 13 07:44:40 2018 From: fromani at redhat.com (Francesco Romani) Date: Fri, 13 Apr 2018 09:44:40 +0200 Subject: [ovirt-devel] OST: Enabling DEBUG logging level in Vdsm? In-Reply-To: <87efjk7ee8.fsf@redhat.com> References: <87efjk7ee8.fsf@redhat.com> Message-ID: <94bb637b-9275-92a0-a7ea-15014b1f1603@redhat.com> On 04/12/2018 03:17 PM, Milan Zamazal wrote: > Hi, > > it's quite inconvenient that DEBUG messages are missing in OST Vdsm > logs. Would it be possible to enable them some way? Maybe just call vdsm-client once Vdsm is up? we can toggle the log verbosiness at runtime. I strongly support this idea - OST is probably one of the few places outside developers' environments where DEBUG level by default fully makes sense However, I'd like to reminder that since the addition of the metadata.py module, virt logs can be quite verbose. Raising this just because it could eat some storage space on OST workers, so let's keep this item on the watchlist. -- Francesco Romani Senior SW Eng., Virtualization R&D Red Hat IRC: fromani github: @fromanirh From mzamazal at redhat.com Fri Apr 13 13:36:58 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Fri, 13 Apr 2018 15:36:58 +0200 Subject: [ovirt-devel] OST: Enabling DEBUG logging level in Vdsm? In-Reply-To: <94bb637b-9275-92a0-a7ea-15014b1f1603@redhat.com> (Francesco Romani's message of "Fri, 13 Apr 2018 09:44:40 +0200") References: <87efjk7ee8.fsf@redhat.com> <94bb637b-9275-92a0-a7ea-15014b1f1603@redhat.com> Message-ID: <87vacv1b45.fsf@redhat.com> Francesco Romani writes: > On 04/12/2018 03:17 PM, Milan Zamazal wrote: >> Hi, >> >> it's quite inconvenient that DEBUG messages are missing in OST Vdsm >> logs. Would it be possible to enable them some way? > > Maybe just call vdsm-client once Vdsm is up? we can toggle the log > verbosiness at runtime. A good idea since it doesn't require Vdsm restart. We just need to track available loggers. > I strongly support this idea - OST is probably one of the few places > outside developers' environments where DEBUG level by default fully > makes sense > > However, I'd like to reminder that since the addition of the metadata.py > module, virt logs can be quite verbose. Raising this just because it could > eat some storage space on OST workers, so let's keep this item on the > watchlist. Hopefully not a big issue, we can discuss it in gerrit: https://gerrit.ovirt.org/90262 Thanks, Milan From dron at redhat.com Fri Apr 13 14:51:38 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 13 Apr 2018 15:51:38 +0100 Subject: [ovirt-devel] OST Failure - Weekly update [07/04/2018-13/04/2018] Message-ID: On Fri, Apr 13, 2018 at 3:49 PM, Dafna Ron wrote: > Hello, > > I would like to update on this week's failures and OST current status. > > We had some issues from previous week. once is still sporadically failing > but since there were a few patches merged and another one in the processes > I decided we can close this sprint with no backlog. > > this week's failures were one regression, sporadic failures and one > packaging issue. > > We are still seeing sporadic failures on migration tests but Gal has a fix > which would probably be merged soon. https://gerrit.ovirt.org/#/c/90166/ > - master,4.2: Wait for migration to end > > we have found and reported a regression change: > https://gerrit.ovirt.org/#/c/89581/ - Subnet name should be an optional > field > This was fixed quickly by: https://gerrit.ovirt.org/#/c/89980/ - Fix > mistaken mapping for retrieving subnet name from row > > We moved to Ansible 2.5 but the packages were not updated in the repos. > this was fixed by this change: https://gerrit.ovirt.org/#/c/90015/ - 4.2: > Take ansible from epel > > > > *Below you can see the chart for this week's resolved issues but cause of > failure:**Code* = regression of working components/functionalities > *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power > outages > *OST* *Tests *- package related issues, failed build artifacts > > > > > > > > > > > *Below is a chart of resolved failures based on ovirt version:* > > > > *Below is a chart showing failures by suite type: * > > > *Below you can see the issues opened this week vs the backlog issues from > previous week: * > > > Thanks, > Dafna > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25095 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5025 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25061 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 29123 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25241 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 55313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5622 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7759 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5435 bytes Desc: not available URL: From gbenhaim at redhat.com Mon Apr 16 07:56:57 2018 From: gbenhaim at redhat.com (Gal Ben Haim) Date: Mon, 16 Apr 2018 10:56:57 +0300 Subject: [ovirt-devel] [OST][HC] HE fails to deploy In-Reply-To: References: Message-ID: Any update on https://gerrit.ovirt.org/#/c/88887/ ? The HC suites still failing and it's hard to understand why without the logs from the engine VM. On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose wrote: > > > On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi > wrote: > >> >> >> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose wrote: >> >>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>> >>> Both the 4.2 and master suites are failing on getting local VM IP. >>> Any idea what changed or if I have to change the test? >>> >>> thanks! >>> >> >> Hi Sahina, >> 4.2 and master suite non HC are correctly running this morning. >> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >> rt-system-tests_he-basic-ansible-suite-master/146/ >> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >> rt-system-tests_he-basic-ansible-suite-4.2/76/ >> >> I'll try to check the difference with HC suites. >> >> Are you using more than one subnet in the HC suites? >> > > No, I'm not. And we havent's changed anything related to network in the > test suite. > > > -- *GAL bEN HAIM* RHV DEVOPS -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Mon Apr 16 14:17:52 2018 From: bkorren at redhat.com (Barak Korren) Date: Mon, 16 Apr 2018 17:17:52 +0300 Subject: [ovirt-devel] [ANNOUNCE] Introducing STDCI V2 Message-ID: The CI team is thrilled to announce the general availability of the second version of the oVirt CI standard. Work on this version included almost a complete rewrite of the CI backend. The major user-visible features are: - Project maintainers no longer need to maintain YAML in the ?jenkins? repository. Details that were specified there, including targeted distributions, architectures and oVirt versions should now be specified in a YAML file under the project?s own repository (In a different syntax). - We now support ?sub stages? which provide the ability to run multiple different scripts in parallel within the same STDCI stage. There is also a conditional syntax which allows controlling which scripts get executed according to which files were changed in the patch being tested. - The STDCI script file names and locations can now be customized via the above mentioned YAML file. This means that for e.g. using the same script for different stages can now be done by assigning it to the stages in the YAML file instead of by using symlinks. Inspecting job results in STDCI V2 ---------------------------------- As already mentioned, the work on STDCI V2 consisted of a major rewrite of the CI backend, one of the changes made is switching from using multiple ?FreeStyle? type jobs per project to just two (pre-merge and post-merge) pipeline jobs. This has implications for the way job results are to be inspected. Since all the different parallel tasks now happen within the same job, looking at the job output can be rather confusing as it includes the merged output of all the tasks. Instead, the ?Blue Ocean? view should be used. The ?Blue Ocean? view displays a graphical layout of the job execution allowing one to quickly learn which parts of the job failed. It also allows drilling down and viewing the logs of individual parts of the job. Apart from using the ?Blue Ocean? view, job logs are also stored as artifact files. The ?exported-artifacts? directory seen in the job results will now include different subdirectories for the different parts of the job. Assuming we have a ?check-patch? stage script running on ?el7/x86_64?, we can find its output under ?exported-artifacts? in: check-patch.el7.x86_64/mock_logs/script/stdout_stderr.log Any additional artifacts generated by the script would be present in the ?check-patch.el7.x86-64? directory as well. I have a CI YAML file in my project already, is this really new? ---------------------------------------------------------------- We?ve been working on this for a while, and occasionally introduced V2 features into individual projects as needed. In particular, our GitHub support was always based on STDCI V2 code, so all GitHub projects (except Lago, which is ?special??) are already using STDCI V2. A few Gerrit-based projects have already been converted to V2 as well as part of our efforts to test and debug the V2 code. Most notably, the ?OST? and ?Jenkins? projects have been switched, although they are running the STDCI V1 jobs as well for the time being. What is the process for switching my project to STDCI V2? --------------------------------------------------------- The CI team is going to proactively work with project maintainers to switch them to V2. The process for switching is as follows: - Send a one-line patch to the ?jenkins? repo to enable the V2 jobs for the project - at this point the V2 jobs will run side-by-side with the V1 jobs, and will execute the STDCI scripts on el7/x86_64. - Create an STDCI YAML file to define the target distributions, architectures and oVirt versions for the project. (See below for a sample file that would be equivalent to what many projects have defined in V1 currently). As soon as a patch with the new YAML file is submitted to the project, the V2 job will parse it and follow the instructions in it. This allows for an easy verification of the file functionality in CI. - Remove the STDCI V1 job configuration from the ?jenkins? repo. This should be the last patch project maintainers have to send to the ?jenkins? repo. What does the new YAML file look like? -------------------------------------- We defined multiple optional names for the file, so that each project owner can choose which name seems most adequate. The following names can be used: - stdci.yaml - automation.yaml - ovirtci.yaml A dot (.) can also optionally be added at the beginning of the file name to make the file hidden, the file extension could also be ?yml?. If multiple matching files exist in the project repo, the first matching file according to the order listed above will be used. The file conforms to the YAML syntax. The key names in the file are case-agnostic, and hyphens (-) underscores (_) and spaces ( ) in key names are ignored. Additionally we support multiple forms of the same word so you don?t need to remember if the key should be ?distro?, ?distros?, ?distributions?, ?operating-systems? or ?OperatingSystems? as all these forms (and others) will work and mean the same thing. To create complex test/build matrices, ?stage?, ?distribution?, ?architecture? and ?sub-stage? definitions can be nested within one another. We find this to be more intuitive than having to maintain tedious ?exclude? lists as was needed in V1. Here is an example of an STDCI V2 YAML file that is compatible with the current master branch V1 configuration of many oVirt projects: --- Architectures: - x86_64: Distributions: [ "el7", "fc27" ] - ppc64le: Distribution: el7 - s390x: Distribution: fc27 Release Branches: master: ovirt-master Note: since the file is committed into the project?s own repo, having different configuration for different branches can be done by simply having different files in the different branches, so there is no need for a big convoluted file to configure all branches. Since the above file does not mention stages, any STDCI scripts that exists in the project repo and belong to a particular stage will be run on all specified distribution and architecture combinations. Since it is sometimes desired to run ?check-patch.sh? on less platforms then build-artifacts for example, a slightly different file would be needed: --- Architectures: - x86_64: Distributions: [ ?el7?, ?fc27? ] - ppc64le: Distribution: el7 - s390x: Distribution: fc27 Stages: - check-patch: Architecture: x86_64 Distribution: el7 - build-artifacts Release Branches: master: ovirt-master The above file makes ?check-patch? run only on el7/x86_64, while build-artifacts runs on all platforms specified and check-merged would not run at all because it is not listed in the file. Great efforts have been made to make the file format very flexible but intuitive to use. Additionally there are many defaults in place to allow specifying complex behaviours with very brief YAML code. For further details about the file format, please see the documentation linked below. About the relation between STDCI V2 and the change-queue -------------------------------------------------------- In STDCI V1 the change queue that would run the OST tests and release a given patch was determined by looking at the ?version? part of the name of the project?s build-artifacts jobs that got invoked for the patch. This was confusing for people as most people understood ?version? to mean the internal version for their own project rather then the oVirt version. In V2 we decided to be more explicit and simply include a map from branches to change queues in the YAML configuration under the ?release-branches? option, as can be seen in the examples above. We also chose to no longer allow specifying the oVirt version as a shorthand for the equivalent queue name (E.g specifying ?4.2? instead of ?ovirt-4.2?), this should reduce the chance for confusing between project versions and queue names, and also allows us to create and use change queues for projects that are not oVirt. A project can choose not to include a ?release-branches? option, in which case its patches will not get submitted to any queues. Further information ------------------- The documentation for STDCI can be found at [1]. The documentation update for V2 are still in progress and expected to be merged soon. In the meatine, the GitHub-specific documentation [2] already provides a great deal of information which is relevant for V2. [1]: http://ovirt-infra-docs.readthedocs.io/en/latest/CI/Build_and_test_standards [2]: http://ovirt-infra-docs.readthedocs.io/en/latest/CI/Using_STDCI_with_GitHub --- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From gbenhaim at redhat.com Tue Apr 17 08:31:46 2018 From: gbenhaim at redhat.com (Gal Ben Haim) Date: Tue, 17 Apr 2018 11:31:46 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 17.04.18 ] [ 002_bootstrap.add_qos ] Message-ID: Hi, >From last night, the Change Queue fails on every run because of "add_qos" test. >From the engine log: 2018-04-16 10:31:51,813-04 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (default task-30) [] An exception has occurred while trying to create a command object for command 'AddCpuQos' with parameters 'QosParametersBase:{commandId='ccb4d6f4-ca86-4706-921c-f87439e0550f', user='null', commandType='Unknown'}': WELD-001407: Cannot declare an injection point with a type variable: [BackedAnnotatedField] @Inject private org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao at org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao(QosCommandBase.java:0) 2018-04-16 10:31:51,819-04 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (default task-30) [] Exception: org.jboss.weld.exceptions.DefinitionException: WELD-001407: Cannot declare an injection point with a type variable: [BackedAnnotatedField] @Inject private org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao at org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao(QosCommandBase.java:0) StackTrace at org.jboss.weld.bootstrap.Validator.validateInjectionPointForDefinitionErrors(Validator.java:295) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bootstrap.Validator.validateInjectionPoint(Validator.java:281) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bootstrap.Validator.validateProducer(Validator.java:400) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.injection.producer.InjectionTargetService.validateProducer(InjectionTargetService.java:36) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.manager.InjectionTargetFactoryImpl.validate(InjectionTargetFactoryImpl.java:135) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.manager.InjectionTargetFactoryImpl.createInjectionTarget(InjectionTargetFactoryImpl.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.manager.InjectionTargetFactoryImpl.createInjectionTarget(InjectionTargetFactoryImpl.java:69) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.manager.BeanManagerImpl.createInjectionTarget(BeanManagerImpl.java:1140) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.util.ForwardingBeanManager.createInjectionTarget(ForwardingBeanManager.java:201) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.di.Injector.injectMembers(Injector.java:29) [vdsbroker.jar:] at org.ovirt.engine.core.bll.CommandsFactory.createCommand(CommandsFactory.java:100) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:433) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:387) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_161] Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6839/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/ -- *GAL bEN HAIM* RHV DEVOPS -------------- next part -------------- An HTML attachment was scrubbed... URL: From tnisan at redhat.com Tue Apr 17 10:25:21 2018 From: tnisan at redhat.com (Tal Nisan) Date: Tue, 17 Apr 2018 13:25:21 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 17.04.18 ] [ 002_bootstrap.add_qos ] In-Reply-To: References: Message-ID: Looking at it, seems like WELD doesn't allow injections of from generic declared types, will most likely revert the original patch as I can't think of an elegant solution for the problem On Tue, Apr 17, 2018 at 11:31 AM, Gal Ben Haim wrote: > Hi, > > From last night, the Change Queue fails on every run because of "add_qos" > test. > > From the engine log: > > 2018-04-16 10:31:51,813-04 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (default task-30) [] An exception has occurred while trying to create a command object for command 'AddCpuQos' with parameters 'QosParametersBase:{commandId='ccb4d6f4-ca86-4706-921c-f87439e0550f', user='null', commandType='Unknown'}': WELD-001407: Cannot declare an injection point with a type variable: [BackedAnnotatedField] @Inject private org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao > at org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao(QosCommandBase.java:0) > > > 2018-04-16 10:31:51,819-04 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (default task-30) [] Exception: org.jboss.weld.exceptions.DefinitionException: WELD-001407: Cannot declare an injection point with a type variable: [BackedAnnotatedField] @Inject private org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao > at org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao(QosCommandBase.java:0) > StackTrace > at org.jboss.weld.bootstrap.Validator.validateInjectionPointForDefinitionErrors(Validator.java:295) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.bootstrap.Validator.validateInjectionPoint(Validator.java:281) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.bootstrap.Validator.validateProducer(Validator.java:400) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.injection.producer.InjectionTargetService.validateProducer(InjectionTargetService.java:36) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.manager.InjectionTargetFactoryImpl.validate(InjectionTargetFactoryImpl.java:135) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.manager.InjectionTargetFactoryImpl.createInjectionTarget(InjectionTargetFactoryImpl.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.manager.InjectionTargetFactoryImpl.createInjectionTarget(InjectionTargetFactoryImpl.java:69) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.manager.BeanManagerImpl.createInjectionTarget(BeanManagerImpl.java:1140) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.jboss.weld.util.ForwardingBeanManager.createInjectionTarget(ForwardingBeanManager.java:201) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] > at org.ovirt.engine.core.di.Injector.injectMembers(Injector.java:29) [vdsbroker.jar:] > at org.ovirt.engine.core.bll.CommandsFactory.createCommand(CommandsFactory.java:100) [bll.jar:] > at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:433) [bll.jar:] > at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:387) [bll.jar:] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_161] > > > Link to all logs: > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6839/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/ > > > > -- > *GAL bEN HAIM* > RHV DEVOPS > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tnisan at redhat.com Tue Apr 17 11:09:58 2018 From: tnisan at redhat.com (Tal Nisan) Date: Tue, 17 Apr 2018 14:09:58 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 17.04.18 ] [ 002_bootstrap.add_qos ] In-Reply-To: References: Message-ID: Suggested fix: https://gerrit.ovirt.org/#/c/90376/ Waiting for OST result On Tue, Apr 17, 2018 at 1:25 PM, Tal Nisan wrote: > Looking at it, seems like WELD doesn't allow injections of from generic > declared types, will most likely revert the original patch as I can't think > of an elegant solution for the problem > > On Tue, Apr 17, 2018 at 11:31 AM, Gal Ben Haim > wrote: > >> Hi, >> >> From last night, the Change Queue fails on every run because of "add_qos" >> test. >> >> From the engine log: >> >> 2018-04-16 10:31:51,813-04 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (default task-30) [] An exception has occurred while trying to create a command object for command 'AddCpuQos' with parameters 'QosParametersBase:{commandId='ccb4d6f4-ca86-4706-921c-f87439e0550f', user='null', commandType='Unknown'}': WELD-001407: Cannot declare an injection point with a type variable: [BackedAnnotatedField] @Inject private org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao >> at org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao(QosCommandBase.java:0) >> >> >> 2018-04-16 10:31:51,819-04 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (default task-30) [] Exception: org.jboss.weld.exceptions.DefinitionException: WELD-001407: Cannot declare an injection point with a type variable: [BackedAnnotatedField] @Inject private org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao >> at org.ovirt.engine.core.bll.qos.QosCommandBase.qosDao(QosCommandBase.java:0) >> StackTrace >> at org.jboss.weld.bootstrap.Validator.validateInjectionPointForDefinitionErrors(Validator.java:295) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.bootstrap.Validator.validateInjectionPoint(Validator.java:281) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.bootstrap.Validator.validateProducer(Validator.java:400) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.injection.producer.InjectionTargetService.validateProducer(InjectionTargetService.java:36) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.manager.InjectionTargetFactoryImpl.validate(InjectionTargetFactoryImpl.java:135) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.manager.InjectionTargetFactoryImpl.createInjectionTarget(InjectionTargetFactoryImpl.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.manager.InjectionTargetFactoryImpl.createInjectionTarget(InjectionTargetFactoryImpl.java:69) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.manager.BeanManagerImpl.createInjectionTarget(BeanManagerImpl.java:1140) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.jboss.weld.util.ForwardingBeanManager.createInjectionTarget(ForwardingBeanManager.java:201) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] >> at org.ovirt.engine.core.di.Injector.injectMembers(Injector.java:29) [vdsbroker.jar:] >> at org.ovirt.engine.core.bll.CommandsFactory.createCommand(CommandsFactory.java:100) [bll.jar:] >> at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:433) [bll.jar:] >> at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:387) [bll.jar:] >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_161] >> >> >> Link to all logs: >> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6839/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/ >> >> >> >> -- >> *GAL bEN HAIM* >> RHV DEVOPS >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eedri at redhat.com Wed Apr 18 07:37:26 2018 From: eedri at redhat.com (Eyal Edri) Date: Wed, 18 Apr 2018 10:37:26 +0300 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) Message-ID: FYI, I've disabled the 4.2 and master HC suites nightly run on CI as they are constantly failing for almost 3 weeks and spamming the mailing lists. I think this should get higher priority for a fix if we want it to provide any value, Work can continue using the manual jobs or via check-patch. On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim wrote: > Any update on https://gerrit.ovirt.org/#/c/88887/ ? > The HC suites still failing and it's hard to understand why without the > logs from the engine VM. > > On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose wrote: > >> >> >> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi >> wrote: >> >>> >>> >>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose wrote: >>> >>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>> >>>> Both the 4.2 and master suites are failing on getting local VM IP. >>>> Any idea what changed or if I have to change the test? >>>> >>>> thanks! >>>> >>> >>> Hi Sahina, >>> 4.2 and master suite non HC are correctly running this morning. >>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>> rt-system-tests_he-basic-ansible-suite-master/146/ >>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>> >>> I'll try to check the difference with HC suites. >>> >>> Are you using more than one subnet in the HC suites? >>> >> >> No, I'm not. And we havent's changed anything related to network in the >> test suite. >> >> >> > > > -- > *GAL bEN HAIM* > RHV DEVOPS > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Wed Apr 18 08:16:30 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Wed, 18 Apr 2018 10:16:30 +0200 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: <20180418081628.GA1704@Alexandra.local> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, will >have to check, since usually, we don't execute our automation on them. Any update on this? I believe the gluster tests were successful, OST passes fine and unit tests pass fine, that makes the storage backends test the last required piece. >On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir wrote: > >> +Elad >> >> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg wrote: >> >>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: >>> >>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>>> >>>>> Please make sure to run as much OST suites on this patch as possible >>>>> before merging ( using 'ci please build' ) >>>>> >>>> >>>> But note that OST is not a way to verify the patch. >>>> >>>> Such changes require testing with all storage types we support. >>>> >>>> Nir >>>> >>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>>>> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>> functionality of our storage code. >>>>>> >>>>>> That of course comes with quite a bit of code removal, mostly in the >>>>>> area of host devices, hwrng and anything that touches devices; bunch >>>>>> of test changes and one XML generation caveat (storage is handled by >>>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>>> level). >>>>>> >>>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>>> people to review the code and consider the implication this change has >>>>>> on current/future features. >>>>>> >>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>> >>>>> >>> In particular: dynamic_ownership was set to 0 prehistorically (as part >>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because libvirt, >>> running as root, was not able to play properly with root-squash nfs mounts. >>> >>> Have you attempted this use case? >>> >>> I join to Nir's request to run this with storage QE. >>> >> >> >> >> -- >> >> >> Raz Tamir >> Manager, RHV QE >> From ebenahar at redhat.com Wed Apr 18 08:37:00 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Wed, 18 Apr 2018 11:37:00 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: <20180418081628.GA1704@Alexandra.local> References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> Message-ID: Hi, sorry if I misunderstood, I waited for more input regarding what areas have to be tested here. On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik wrote: > On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: > >> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, will >> have to check, since usually, we don't execute our automation on them. >> > > Any update on this? I believe the gluster tests were successful, OST > passes fine and unit tests pass fine, that makes the storage backends > test the last required piece. > > > On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir wrote: >> >> +Elad >>> >>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>> wrote: >>> >>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: >>>> >>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>>>> >>>>> Please make sure to run as much OST suites on this patch as possible >>>>>> before merging ( using 'ci please build' ) >>>>>> >>>>>> >>>>> But note that OST is not a way to verify the patch. >>>>> >>>>> Such changes require testing with all storage types we support. >>>>> >>>>> Nir >>>>> >>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>>> > >>>>> >>>>>> wrote: >>>>>> >>>>>> Hey, >>>>>>> >>>>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>> functionality of our storage code. >>>>>>> >>>>>>> That of course comes with quite a bit of code removal, mostly in the >>>>>>> area of host devices, hwrng and anything that touches devices; bunch >>>>>>> of test changes and one XML generation caveat (storage is handled by >>>>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>>>> level). >>>>>>> >>>>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>>>> people to review the code and consider the implication this change >>>>>>> has >>>>>>> on current/future features. >>>>>>> >>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>> >>>>>>> >>>>>> In particular: dynamic_ownership was set to 0 prehistorically (as >>>> part >>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>> libvirt, >>>> running as root, was not able to play properly with root-squash nfs >>>> mounts. >>>> >>>> Have you attempted this use case? >>>> >>>> I join to Nir's request to run this with storage QE. >>>> >>>> >>> >>> >>> -- >>> >>> >>> Raz Tamir >>> Manager, RHV QE >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Wed Apr 18 11:17:34 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Wed, 18 Apr 2018 13:17:34 +0200 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> Message-ID: <20180418111733.GA4702@Alexandra.local> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >Hi, sorry if I misunderstood, I waited for more input regarding what areas >have to be tested here. I'd say that you have quite a bit of freedom in this regard. GlusterFS should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite that covers basic operations (start & stop VM, migrate it), snapshots and merging them, and whatever else would be important for storage sanity. mpolednik >On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik >wrote: > >> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >> >>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, will >>> have to check, since usually, we don't execute our automation on them. >>> >> >> Any update on this? I believe the gluster tests were successful, OST >> passes fine and unit tests pass fine, that makes the storage backends >> test the last required piece. >> >> >> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir wrote: >>> >>> +Elad >>>> >>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>> wrote: >>>> >>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: >>>>> >>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>>>>> >>>>>> Please make sure to run as much OST suites on this patch as possible >>>>>>> before merging ( using 'ci please build' ) >>>>>>> >>>>>>> >>>>>> But note that OST is not a way to verify the patch. >>>>>> >>>>>> Such changes require testing with all storage types we support. >>>>>> >>>>>> Nir >>>>>> >>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>>>> > >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hey, >>>>>>>> >>>>>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>> functionality of our storage code. >>>>>>>> >>>>>>>> That of course comes with quite a bit of code removal, mostly in the >>>>>>>> area of host devices, hwrng and anything that touches devices; bunch >>>>>>>> of test changes and one XML generation caveat (storage is handled by >>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>>>>> level). >>>>>>>> >>>>>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>>>>> people to review the code and consider the implication this change >>>>>>>> has >>>>>>>> on current/future features. >>>>>>>> >>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>> >>>>>>>> >>>>>>> In particular: dynamic_ownership was set to 0 prehistorically (as >>>>> part >>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>> libvirt, >>>>> running as root, was not able to play properly with root-squash nfs >>>>> mounts. >>>>> >>>>> Have you attempted this use case? >>>>> >>>>> I join to Nir's request to run this with storage QE. >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Raz Tamir >>>> Manager, RHV QE >>>> >>>> From sbonazzo at redhat.com Wed Apr 18 12:14:36 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Wed, 18 Apr 2018 14:14:36 +0200 Subject: [ovirt-devel] Broken repoclosure in today 4.2.3 RC2 compose Message-ID: Hi, today oVirt 4.2.3 RC2 release is blocked on repository closure error detected by ovirt-engine-appliance build: 13:49:14 11:49:13,944 INFO program:Error: Package: ovirt-engine-backend-4.2.3.2-1.el7.centos.noarch (ovirt-4.2-pre)13:49:14 11:49:13,946 INFO program:Requires: vdsm-jsonrpc-java >= 1.4.1213:49:14 11:49:13,947 INFO program:Available: vdsm-jsonrpc-java-1.4.11-1.el7.noarch (ovirt-4.2-centos-ovirt42)13:49:14 11:49:13,949 INFO program:vdsm-jsonrpc-java = 1.4.11-1.el713:49:14 11:49:13,950 INFO program:Available: vdsm-jsonrpc-java-1.4.11-1.el7.centos.noarch (ovirt-4.2)13:49:14 11:49:13,951 INFO program:vdsm-jsonrpc-java = 1.4.11-1.el7.centos Looks like vdsm-jsonrpc-java >= 1.4.12 has not been released yet. Piotr please check its status and add it to release configuration file for 4.2.3 RC2. You can see an example in https://gerrit.ovirt.org/#/c/90436/ Thanks, -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbonazzo at redhat.com Wed Apr 18 12:40:36 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Wed, 18 Apr 2018 14:40:36 +0200 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: 2018-04-18 9:37 GMT+02:00 Eyal Edri : > FYI, > > I've disabled the 4.2 and master HC suites nightly run on CI as they are > constantly failing for almost 3 weeks and spamming the mailing lists. > HC uses gdeploy 2.0.6 which was released in December and was based on ansible 2.4. ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is not supporting ansible-2.5 properly. I had no time to validate my guess with proof, so please Sahina cross check this. > > I think this should get higher priority for a fix if we want it to provide > any value, > Work can continue using the manual jobs or via check-patch. > > > On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim > wrote: > >> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >> The HC suites still failing and it's hard to understand why without the >> logs from the engine VM. >> >> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose wrote: >> >>> >>> >>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi >>> wrote: >>> >>>> >>>> >>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose wrote: >>>> >>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>> >>>>> Both the 4.2 and master suites are failing on getting local VM IP. >>>>> Any idea what changed or if I have to change the test? >>>>> >>>>> thanks! >>>>> >>>> >>>> Hi Sahina, >>>> 4.2 and master suite non HC are correctly running this morning. >>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>> >>>> I'll try to check the difference with HC suites. >>>> >>>> Are you using more than one subnet in the HC suites? >>>> >>> >>> No, I'm not. And we havent's changed anything related to network in the >>> test suite. >>> >>> >>> >> >> >> -- >> *GAL bEN HAIM* >> RHV DEVOPS >> > > > > -- > > Eyal edri > > > MANAGER > > RHV DevOps > > EMEA VIRTUALIZATION R&D > > > Red Hat EMEA > TRIED. TESTED. TRUSTED. > phone: +972-9-7692018 > irc: eedri (on #tlv #rhev-dev #rhev-integ) > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbonazzo at redhat.com Wed Apr 18 13:23:45 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Wed, 18 Apr 2018 15:23:45 +0200 Subject: [ovirt-devel] Fwd: [CentOS-devel] [SIG Process] Point-Release Freeze of SIG Content while CR is Active In-Reply-To: <20180417133035.nzkvgbmyl5fwcd3n@ender.home.bstinson.com> References: <20180417133035.nzkvgbmyl5fwcd3n@ender.home.bstinson.com> Message-ID: FYI ---------- Forwarded message ---------- From: Brian Stinson Date: 2018-04-17 15:30 GMT+02:00 Subject: [CentOS-devel] [SIG Process] Point-Release Freeze of SIG Content while CR is Active To: centos-devel at centos.org Hi Folks, At the CBS meeting yesterday we discussed a proposal to freeze the Sign and Push-to-mirror processes during the period between when CR is active and a point release. https://lists.centos.org/pipermail/centos-devel/2018-April/016610.html You may recall that we did something similar for 7.1708 We need to Freeze because we make CR content available in CBS for builds, and in CI for testing purposes. But, releasing content built against CR to mirror.centos.org would have had some unintended consequences (e.g. we don't want to require that end-users enable CR to get content from the SIGs). The Point-release Freeze does not affect content moving from the testing tags in CBS to buildlogs.c.o, that will continue to show up as it is built. Do note: since we do want to give testers and developers an opportunity to build and test against an upcoming point-release before it goes GA, content coming from buildlogs.centos.org may require enabling the CR repo. Any new requests for sign+push (adding new tags to mirror.c.o) during Point-release Freeze will be held until the point release goes GA. We approved this as part of the CR checklist during the meeting yesterday, and will include another note about this in the CR release announcement. https://www.centos.org/minutes/2018/April/centos-devel.2018-04-16-14.00.html If there are any questions, please let us know and discuss here on the list. Cheers! -- Brian Stinson _______________________________________________ CentOS-devel mailing list CentOS-devel at centos.org https://lists.centos.org/mailman/listinfo/centos-devel -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkliczew at redhat.com Wed Apr 18 13:28:57 2018 From: pkliczew at redhat.com (Piotr Kliczewski) Date: Wed, 18 Apr 2018 15:28:57 +0200 Subject: [ovirt-devel] Broken repoclosure in today 4.2.3 RC2 compose In-Reply-To: References: Message-ID: Sandro, Patch [1] pushed. Thanks, Piotr [1] https://gerrit.ovirt.org/#/c/90450 On Wed, Apr 18, 2018 at 2:14 PM, Sandro Bonazzola wrote: > Hi, > > today oVirt 4.2.3 RC2 release is blocked on repository closure error detected by ovirt-engine-appliance build: > > > 13:49:14 11:49:13,944 INFO program:Error: Package: ovirt-engine-backend-4.2.3.2-1.el7.centos.noarch (ovirt-4.2-pre)13:49:14 11:49:13,946 INFO program:Requires: vdsm-jsonrpc-java >= 1.4.1213:49:14 11:49:13,947 INFO program:Available: vdsm-jsonrpc-java-1.4.11-1.el7.noarch (ovirt-4.2-centos-ovirt42)13:49:14 11:49:13,949 INFO program:vdsm-jsonrpc-java = 1.4.11-1.el713:49:14 11:49:13,950 INFO program:Available: vdsm-jsonrpc-java-1.4.11-1.el7.centos.noarch (ovirt-4.2)13:49:14 11:49:13,951 INFO program:vdsm-jsonrpc-java = 1.4.11-1.el7.centos > > > Looks like vdsm-jsonrpc-java >= 1.4.12 has not been released yet. > Piotr please check its status and add it to release configuration file for > 4.2.3 RC2. You can see an example in https://gerrit.ovirt.org/#/c/90436/ > > Thanks, > -- > > SANDRO BONAZZOLA > > ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D > > Red Hat EMEA > > sbonazzo at redhat.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebenahar at redhat.com Thu Apr 19 11:54:40 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Thu, 19 Apr 2018 14:54:40 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: <20180418111733.GA4702@Alexandra.local> References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> Message-ID: Hi Martin, I see [1] requires a rebase, can you please take care? At the moment, our automation is stable only on iSCSI, NFS, Gluster and FC. Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's not stable enough at the moment. [1] https://gerrit.ovirt.org/#/c/89830/ Thanks On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik wrote: > On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: > >> Hi, sorry if I misunderstood, I waited for more input regarding what areas >> have to be tested here. >> > > I'd say that you have quite a bit of freedom in this regard. GlusterFS > should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite > that covers basic operations (start & stop VM, migrate it), snapshots > and merging them, and whatever else would be important for storage > sanity. > > mpolednik > > > On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik >> wrote: >> >> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>> >>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, >>>> will >>>> have to check, since usually, we don't execute our automation on them. >>>> >>>> >>> Any update on this? I believe the gluster tests were successful, OST >>> passes fine and unit tests pass fine, that makes the storage backends >>> test the last required piece. >>> >>> >>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir wrote: >>> >>>> >>>> +Elad >>>> >>>>> >>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>> wrote: >>>>> >>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>> wrote: >>>>> >>>>>> >>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>>>>> >>>>>>> >>>>>>> Please make sure to run as much OST suites on this patch as possible >>>>>>> >>>>>>>> before merging ( using 'ci please build' ) >>>>>>>> >>>>>>>> >>>>>>>> But note that OST is not a way to verify the patch. >>>>>>> >>>>>>> Such changes require testing with all storage types we support. >>>>>>> >>>>>>> Nir >>>>>>> >>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>> mpolednik at redhat.com >>>>>>> > >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hey, >>>>>>>> >>>>>>>>> >>>>>>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>> functionality of our storage code. >>>>>>>>> >>>>>>>>> That of course comes with quite a bit of code removal, mostly in >>>>>>>>> the >>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>> bunch >>>>>>>>> of test changes and one XML generation caveat (storage is handled >>>>>>>>> by >>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>>>>>> level). >>>>>>>>> >>>>>>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>>>>>> people to review the code and consider the implication this change >>>>>>>>> has >>>>>>>>> on current/future features. >>>>>>>>> >>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>> >>>>>>>>> >>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically (as >>>>>>>> >>>>>>> part >>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>> libvirt, >>>>>> running as root, was not able to play properly with root-squash nfs >>>>>> mounts. >>>>>> >>>>>> Have you attempted this use case? >>>>>> >>>>>> I join to Nir's request to run this with storage QE. >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> Raz Tamir >>>>> Manager, RHV QE >>>>> >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Thu Apr 19 12:07:10 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Thu, 19 Apr 2018 14:07:10 +0200 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> Message-ID: <20180419120709.GB9449@Alexandra.local> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >Hi Martin, > >I see [1] requires a rebase, can you please take care? Should be rebased. >At the moment, our automation is stable only on iSCSI, NFS, Gluster and FC. >Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's not >stable enough at the moment. That is still pretty good. >[1] https://gerrit.ovirt.org/#/c/89830/ > > >Thanks > >On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >wrote: > >> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >> >>> Hi, sorry if I misunderstood, I waited for more input regarding what areas >>> have to be tested here. >>> >> >> I'd say that you have quite a bit of freedom in this regard. GlusterFS >> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >> that covers basic operations (start & stop VM, migrate it), snapshots >> and merging them, and whatever else would be important for storage >> sanity. >> >> mpolednik >> >> >> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik >>> wrote: >>> >>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>> >>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, >>>>> will >>>>> have to check, since usually, we don't execute our automation on them. >>>>> >>>>> >>>> Any update on this? I believe the gluster tests were successful, OST >>>> passes fine and unit tests pass fine, that makes the storage backends >>>> test the last required piece. >>>> >>>> >>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir wrote: >>>> >>>>> >>>>> +Elad >>>>> >>>>>> >>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>>> wrote: >>>>>> >>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >>>>>>> >>>>>>>> >>>>>>>> Please make sure to run as much OST suites on this patch as possible >>>>>>>> >>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>> >>>>>>>>> >>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>> >>>>>>>> Such changes require testing with all storage types we support. >>>>>>>> >>>>>>>> Nir >>>>>>>> >>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>> mpolednik at redhat.com >>>>>>>> > >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hey, >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I've created a patch[0] that is finally able to activate libvirt's >>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>> functionality of our storage code. >>>>>>>>>> >>>>>>>>>> That of course comes with quite a bit of code removal, mostly in >>>>>>>>>> the >>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>> bunch >>>>>>>>>> of test changes and one XML generation caveat (storage is handled >>>>>>>>>> by >>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>>>>>>> level). >>>>>>>>>> >>>>>>>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>>>>>>> people to review the code and consider the implication this change >>>>>>>>>> has >>>>>>>>>> on current/future features. >>>>>>>>>> >>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically (as >>>>>>>>> >>>>>>>> part >>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>> libvirt, >>>>>>> running as root, was not able to play properly with root-squash nfs >>>>>>> mounts. >>>>>>> >>>>>>> Have you attempted this use case? >>>>>>> >>>>>>> I join to Nir's request to run this with storage QE. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> Raz Tamir >>>>>> Manager, RHV QE >>>>>> >>>>>> >>>>>> From ebenahar at redhat.com Thu Apr 19 14:34:13 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Thu, 19 Apr 2018 17:34:13 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: <20180419120709.GB9449@Alexandra.local> References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> Message-ID: Triggered a sanity tier1 execution [1] using [2], which covers all the requested areas, on iSCSI, NFS and Gluster. I'll update with the results. [1] https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4. 2_dev/job/rhv-4.2-ge-flow-storage/1161/ [2] https://gerrit.ovirt.org/#/c/89830/ vdsm-4.30.0-291.git77aef9a.el7.x86_64 On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik wrote: > On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: > >> Hi Martin, >> >> I see [1] requires a rebase, can you please take care? >> > > Should be rebased. > > At the moment, our automation is stable only on iSCSI, NFS, Gluster and FC. >> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's not >> stable enough at the moment. >> > > That is still pretty good. > > > [1] https://gerrit.ovirt.org/#/c/89830/ >> >> >> Thanks >> >> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >> wrote: >> >> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>> >>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>> areas >>>> have to be tested here. >>>> >>>> >>> I'd say that you have quite a bit of freedom in this regard. GlusterFS >>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>> that covers basic operations (start & stop VM, migrate it), snapshots >>> and merging them, and whatever else would be important for storage >>> sanity. >>> >>> mpolednik >>> >>> >>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik >>> >>>> wrote: >>>> >>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>> >>>>> >>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, >>>>> >>>>>> will >>>>>> have to check, since usually, we don't execute our automation on them. >>>>>> >>>>>> >>>>>> Any update on this? I believe the gluster tests were successful, OST >>>>> passes fine and unit tests pass fine, that makes the storage backends >>>>> test the last required piece. >>>>> >>>>> >>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir wrote: >>>>> >>>>> >>>>>> +Elad >>>>>> >>>>>> >>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>>>> wrote: >>>>>>> >>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>> possible >>>>>>>>> >>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>> >>>>>>>>> Nir >>>>>>>>> >>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>> mpolednik at redhat.com >>>>>>>>> > >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hey, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>> libvirt's >>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>> functionality of our storage code. >>>>>>>>>>> >>>>>>>>>>> That of course comes with quite a bit of code removal, mostly in >>>>>>>>>>> the >>>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>>> bunch >>>>>>>>>>> of test changes and one XML generation caveat (storage is handled >>>>>>>>>>> by >>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>>>>>>>>> level). >>>>>>>>>>> >>>>>>>>>>> Because of the scope of the patch, I welcome storage/virt/network >>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>> change >>>>>>>>>>> has >>>>>>>>>>> on current/future features. >>>>>>>>>>> >>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically >>>>>>>>>>> (as >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> part >>>>>>>>> >>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>>> libvirt, >>>>>>>> running as root, was not able to play properly with root-squash nfs >>>>>>>> mounts. >>>>>>>> >>>>>>>> Have you attempted this use case? >>>>>>>> >>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> Raz Tamir >>>>>>> Manager, RHV QE >>>>>>> >>>>>>> >>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkotas at redhat.com Fri Apr 20 08:57:01 2018 From: pkotas at redhat.com (Petr Kotas) Date: Fri, 20 Apr 2018 10:57:01 +0200 Subject: [ovirt-devel] Wrong pep8 library on centos Message-ID: Hi, Sending this to spread information. I have encountered and issue with Centos 7.4.1708 (Core) when compiling the oVirt engine. The pep8 validation was not passing correctly. It turned out, that the Centos has the old and bugged version present by default. The solution is to use the same version as used in oVirt CI. Just add http://resources.ovirt.org/repos/ci-tools/el7/ to yum.repos.d and it will start passing again. Best, Petr -------------- next part -------------- An HTML attachment was scrubbed... URL: From mperina at redhat.com Fri Apr 20 09:41:23 2018 From: mperina at redhat.com (Martin Perina) Date: Fri, 20 Apr 2018 11:41:23 +0200 Subject: [ovirt-devel] Wrong pep8 library on centos In-Reply-To: References: Message-ID: On Fri, Apr 20, 2018 at 10:57 AM, Petr Kotas wrote: > Hi, > > Sending this to spread information. > > I have encountered and issue with Centos 7.4.1708 (Core) when compiling > the oVirt engine. > The pep8 validation was not passing correctly. > > It turned out, that the Centos has the old and bugged version present by > default. > > The solution is to use the same version as used in oVirt CI. > > Just add http://resources.ovirt.org/repos/ci-tools/el7/ to yum.repos.d > and it will start > passing again. > > ?Every engine developer should install http://plain.resources.ovirt.org/pub/yum-repo/ovirt-release-master.rpm which contains all repos providing required software ... ? > > Best, > Petr > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Fri Apr 20 15:03:21 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 20 Apr 2018 16:03:21 +0100 Subject: [ovirt-devel] OST Failure - Weekly update [14/04/2018-20/04/2018] Message-ID: Hi, I wanted to give a short status on this week's failures and OST current status. We had one regression this week which was reported by Gal to the list and was fix quickly. The regression was caused by change: https://gerrit.ovirt.org/#/c/90376/ - core: @Inject DAOs to QoS CRUD commands Tal debugged the issue and found that WELD doesn't allow injections of from generic declared types and revered the change until the issue can be fixed ( https://gerrit.ovirt.org/#/c/90376/ - Revert "core: @Inject DAOs to QoS CRUD commands"). We have an issue with CQ alerts ( https://ovirt-jira.atlassian.net/browse/OVIRT-1974) and we have been monitoring CQ manually until the issue is resolved. This is why I am unable to issue this week's statistics as they are based on CQ reports on failures. I am sure this would be fixed soon and I will be able to issue reports as usual next week. for now, please note that CQ looks good with no failures in the past few hours. Thank you, Dafna -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 78696 bytes Desc: not available URL: From mzamazal at redhat.com Fri Apr 20 18:29:16 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Fri, 20 Apr 2018 20:29:16 +0200 Subject: [ovirt-devel] OST failure: 002_bootstrap.configure_high_perf_vm2 Message-ID: <874lk520lf.fsf@redhat.com> Hi, I experienced the following OST failure on my OST patch: http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/5217/ The failure is unrelated to my patch, since it occurred before reaching its changes. OST passed when I retriggered it later. The error was: 2018-04-20 05:08:46,748-04 WARN [org.ovirt.engine.core.bll.numa.vm.AddVmNumaNodesCommand] (default task-6) [a8a3d723-655c-43ba-9776-70eb9317e66c] Validation of action 'AddVmNumaNodes' failed for user admin at internal-authz. Reasons: ACTION_TYPE_FAILED_VM_PINNED_TO_MULTIPLE_HOSTS 2018-04-20 05:08:46,753-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-6) [] Operation Failed: [Cannot ${action} ${type}. VM must be pinned to a single host.] From ykaul at redhat.com Fri Apr 20 19:23:20 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Fri, 20 Apr 2018 22:23:20 +0300 Subject: [ovirt-devel] OST failure: 002_bootstrap.configure_high_perf_vm2 In-Reply-To: <874lk520lf.fsf@redhat.com> References: <874lk520lf.fsf@redhat.com> Message-ID: On Fri, Apr 20, 2018 at 9:29 PM, Milan Zamazal wrote: > Hi, I experienced the following OST failure on my OST patch: > > http://jenkins.ovirt.org/job/ovirt-system-tests_master_ > check-patch-el7-x86_64/5217/ > > The failure is unrelated to my patch, since it occurred before reaching > its changes. OST passed when I retriggered it later. > > The error was: > > 2018-04-20 05:08:46,748-04 WARN [org.ovirt.engine.core.bll.numa.vm.AddVmNumaNodesCommand] > (default task-6) [a8a3d723-655c-43ba-9776-70eb9317e66c] Validation of > action 'AddVmNumaNodes' failed for user admin at internal-authz. Reasons: > ACTION_TYPE_FAILED_VM_PINNED_TO_MULTIPLE_HOSTS > 2018-04-20 05:08:46,753-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] > (default task-6) [] Operation Failed: [Cannot ${action} ${type}. VM must be > pinned to a single host.] > If this is indeed a limitation (not sure why), I indeed assume it's a race - I take all hosts that are up and pin it to them. Sometimes there's only 1 up, sometimes more than 1. I'll post a patch. Y. _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykaul at redhat.com Sat Apr 21 06:58:30 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Sat, 21 Apr 2018 09:58:30 +0300 Subject: [ovirt-devel] OST failure: 002_bootstrap.configure_high_perf_vm2 In-Reply-To: References: <874lk520lf.fsf@redhat.com> Message-ID: On Fri, Apr 20, 2018 at 10:23 PM, Yaniv Kaul wrote: > > > On Fri, Apr 20, 2018 at 9:29 PM, Milan Zamazal > wrote: > >> Hi, I experienced the following OST failure on my OST patch: >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check >> -patch-el7-x86_64/5217/ >> >> The failure is unrelated to my patch, since it occurred before reaching >> its changes. OST passed when I retriggered it later. >> >> The error was: >> >> 2018-04-20 05:08:46,748-04 WARN [org.ovirt.engine.core.bll.numa.vm.AddVmNumaNodesCommand] >> (default task-6) [a8a3d723-655c-43ba-9776-70eb9317e66c] Validation of >> action 'AddVmNumaNodes' failed for user admin at internal-authz. Reasons: >> ACTION_TYPE_FAILED_VM_PINNED_TO_MULTIPLE_HOSTS >> 2018-04-20 05:08:46,753-04 ERROR [org.ovirt.engine.api.restapi. >> resource.AbstractBackendResource] (default task-6) [] Operation Failed: >> [Cannot ${action} ${type}. VM must be pinned to a single host.] >> > > If this is indeed a limitation (not sure why), I indeed assume it's a race > - I take all hosts that are up and pin it to them. Sometimes there's only 1 > up, sometimes more than 1. I'll post a patch. > I've sent a patch[1] - now obviously it's failing with ha_recovery test... Y. [1] https://gerrit.ovirt.org/#/c/90500/ > Y. > > _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Sat Apr 21 20:54:04 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Sat, 21 Apr 2018 16:54:04 -0400 Subject: [ovirt-devel] [ANNOUNCE] Introducing STDCI V2 In-Reply-To: References: Message-ID: Hey, I just got around to studying this. - Nice clear email! - Everything really makes sense. - Thank you for fixing the -excludes thing in the yaml. That was rough :) - The graph view in Blue Ocean is easy to see and understand. - "We now support ?sub stages? which provide the ability to run multiple different scripts in parallel" -- what kind of races should we watch out for? :) For example in OST, I think I'll have to adapt docker stuff to be aware that another set of containers could be running at the same time -- not positive though. It looks like the substages replace change_resolver in OST. Can you go into that in more detail? How does this impact running run mock_runner locally? When I run it locally it doesn't appear to paralleilize like it does in jenkins / Blue Ocean. Best wishes, Greg On Mon, Apr 16, 2018 at 10:17 AM, Barak Korren wrote: > The CI team is thrilled to announce the general availability of the second > version of the oVirt CI standard. Work on this version included almost a > complete rewrite of the CI backend. The major user-visible features are: > > - Project maintainers no longer need to maintain YAML in the ?jenkins? > repository. Details that were specified there, including targeted > distributions, architectures and oVirt versions should now be specified > in a > YAML file under the project?s own repository (In a different syntax). > > - We now support ?sub stages? which provide the ability to run multiple > different scripts in parallel within the same STDCI stage. There is also > a > conditional syntax which allows controlling which scripts get executed > according to which files were changed in the patch being tested. > > - The STDCI script file names and locations can now be customized via the > above > mentioned YAML file. This means that for e.g. using the same script for > different stages can now be done by assigning it to the stages in the > YAML > file instead of by using symlinks. > > Inspecting job results in STDCI V2 > ---------------------------------- > As already mentioned, the work on STDCI V2 consisted of a major rewrite of > the > CI backend, one of the changes made is switching from using multiple > ?FreeStyle? > type jobs per project to just two (pre-merge and post-merge) pipeline > jobs. This > has implications for the way job results are to be inspected. > > Since all the different parallel tasks now happen within the same job, > looking > at the job output can be rather confusing as it includes the merged output > of > all the tasks. Instead, the ?Blue Ocean? view should be used. The ?Blue > Ocean? > view displays a graphical layout of the job execution allowing one to > quickly > learn which parts of the job failed. It also allows drilling down and > viewing > the logs of individual parts of the job. > > Apart from using the ?Blue Ocean? view, job logs are also stored as > artifact > files. The ?exported-artifacts? directory seen in the job results will now > include different subdirectories for the different parts of the job. > Assuming we > have a ?check-patch? stage script running on ?el7/x86_64?, we can find its > output under ?exported-artifacts? in: > > check-patch.el7.x86_64/mock_logs/script/stdout_stderr.log > > Any additional artifacts generated by the script would be present in the > ?check-patch.el7.x86-64? directory as well. > > I have a CI YAML file in my project already, is this really new? > ---------------------------------------------------------------- > We?ve been working on this for a while, and occasionally introduced V2 > features > into individual projects as needed. In particular, our GitHub support was > always > based on STDCI V2 code, so all GitHub projects (except Lago, which is > ?special??) are already using STDCI V2. > > A few Gerrit-based projects have already been converted to V2 as well as > part of > our efforts to test and debug the V2 code. Most notably, the ?OST? and > ?Jenkins? > projects have been switched, although they are running the STDCI V1 jobs > as well > for the time being. > > What is the process for switching my project to STDCI V2? > --------------------------------------------------------- > The CI team is going to proactively work with project maintainers to > switch them > to V2. The process for switching is as follows: > > - Send a one-line patch to the ?jenkins? repo to enable the V2 jobs for the > project - at this point the V2 jobs will run side-by-side with the V1 > jobs, > and will execute the STDCI scripts on el7/x86_64. > > - Create an STDCI YAML file to define the target distributions, > architectures > and oVirt versions for the project. (See below for a sample file that > would be > equivalent to what many projects have defined in V1 currently). As soon > as a > patch with the new YAML file is submitted to the project, the V2 job will > parse it and follow the instructions in it. This allows for an easy > verification of the file functionality in CI. > > - Remove the STDCI V1 job configuration from the ?jenkins? repo. This > should be > the last patch project maintainers have to send to the ?jenkins? repo. > > What does the new YAML file look like? > -------------------------------------- > We defined multiple optional names for the file, so that each project > owner can > choose which name seems most adequate. The following names can be used: > > - stdci.yaml > - automation.yaml > - ovirtci.yaml > > A dot (.) can also optionally be added at the beginning of the file name > to make > the file hidden, the file extension could also be ?yml?. If multiple > matching > files exist in the project repo, the first matching file according to the > order > listed above will be used. > > The file conforms to the YAML syntax. The key names in the file are > case-agnostic, and hyphens (-) underscores (_) and spaces ( ) in key names > are > ignored. Additionally we support multiple forms of the same word so you > don?t > need to remember if the key should be ?distro?, ?distros?, ?distributions?, > ?operating-systems? or ?OperatingSystems? as all these forms (and others) > will > work and mean the same thing. > > To create complex test/build matrices, ?stage?, ?distribution?, > ?architecture? > and ?sub-stage? definitions can be nested within one another. We find this > to be > more intuitive than having to maintain tedious ?exclude? lists as was > needed in > V1. > > Here is an example of an STDCI V2 YAML file that is compatible with the > current > master branch V1 configuration of many oVirt projects: > > --- > Architectures: > - x86_64: > Distributions: [ "el7", "fc27" ] > - ppc64le: > Distribution: el7 > - s390x: > Distribution: fc27 > Release Branches: > master: ovirt-master > > Note: since the file is committed into the project?s own repo, having > different > configuration for different branches can be done by simply having different > files in the different branches, so there is no need for a big convoluted > file > to configure all branches. > > Since the above file does not mention stages, any STDCI scripts that > exists in > the project repo and belong to a particular stage will be run on all > specified > distribution and architecture combinations. Since it is sometimes desired > to run > ?check-patch.sh? on less platforms then build-artifacts for example, a > slightly > different file would be needed: > > --- > Architectures: > - x86_64: > Distributions: [ ?el7?, ?fc27? ] > - ppc64le: > Distribution: el7 > - s390x: > Distribution: fc27 > Stages: > - check-patch: > Architecture: x86_64 > Distribution: el7 > - build-artifacts > Release Branches: > master: ovirt-master > > The above file makes ?check-patch? run only on el7/x86_64, while > build-artifacts > runs on all platforms specified and check-merged would not run at all > because it > is not listed in the file. > > Great efforts have been made to make the file format very flexible but > intuitive > to use. Additionally there are many defaults in place to allow specifying > complex behaviours with very brief YAML code. For further details about > the file > format, please see the documentation linked below. > > About the relation between STDCI V2 and the change-queue > -------------------------------------------------------- > In STDCI V1 the change queue that would run the OST tests and release a > given > patch was determined by looking at the ?version? part of the name of the > project?s build-artifacts jobs that got invoked for the patch. > > This was confusing for people as most people understood ?version? to mean > the > internal version for their own project rather then the oVirt version. > > In V2 we decided to be more explicit and simply include a map from > branches to > change queues in the YAML configuration under the ?release-branches? > option, as > can be seen in the examples above. > > We also chose to no longer allow specifying the oVirt version as a > shorthand for > the equivalent queue name (E.g specifying ?4.2? instead of ?ovirt-4.2?), > this > should reduce the chance for confusing between project versions and queue > names, > and also allows us to create and use change queues for projects that are > not > oVirt. > > A project can choose not to include a ?release-branches? option, in which > case > its patches will not get submitted to any queues. > > Further information > ------------------- > The documentation for STDCI can be found at [1]. > > The documentation update for V2 are still in progress and expected to be > merged > soon. In the meatine, the GitHub-specific documentation [2] already > provides a > great deal of information which is relevant for V2. > > [1]: http://ovirt-infra-docs.readthedocs.io/en/latest/CI/ > Build_and_test_standards > [2]: http://ovirt-infra-docs.readthedocs.io/en/latest/CI/ > Using_STDCI_with_GitHub > > --- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel -- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA gshereme at redhat.com IRC: gshereme -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbelenky at redhat.com Sun Apr 22 07:18:24 2018 From: dbelenky at redhat.com (Daniel Belenky) Date: Sun, 22 Apr 2018 10:18:24 +0300 Subject: [ovirt-devel] [ANNOUNCE] Introducing STDCI V2 In-Reply-To: References: Message-ID: > > Hey, > > I just got around to studying this. > > - Nice clear email! > - Everything really makes sense. > - Thank you for fixing the -excludes thing in the yaml. That was rough :) > - The graph view in Blue Ocean is easy to see and understand. > - "We now support ?sub stages? which provide the ability to run multiple different > scripts in parallel" -- what kind of races should we watch out for? :) For > example in OST, I think I'll have to adapt docker stuff to be aware that > another set of containers could be running at the same time -- not positive > though. > You shouldn't expect any races due to that change. Sub stages are there to allow triggering of more than one task/job on a single CI event such as check-patch when a patch is created/updated, check-merged/build-artifacts when a patch is merged. Sub stages run in parallel but on *different slaves**. *With sub-stages you can, for example, run different scripts in parallel and on different slaves to do different tasks such as running unit-tests in parallel with docs generation and build verification. > > It looks like the substages replace change_resolver in OST. Can you go > into that in more detail? How does this impact running run mock_runner > locally? When I run it locally it doesn't appear to paralleilize like it > does in jenkins / Blue Ocean. > That is true. In STDCI V1 we used to run change_resolver in check-patch to check the commit and resolve the relevant changes. STDCI V2 has this feature integrated in one of it's core components which is called usrc.py. We haven't decided yet how/if we will integrate this tool into OST, or how we will achieve the same behaviour when running OST locally with mock runner. For now, you can keep using the "old" check-patch.sh with mock_runner which will call change_resolver. I'd recommend to send a patch and let Jenkins do the checks for you. It will be faster in many cases where you'll have to run several suites in parallel. We'll send a proper announcement regarding the new (STDCI V2 based) jobs for OST including instructions of debugging and how this change affects you as an OST developer. Thanks, - DANIEL BELENKY RHV DEVOPS -------------- next part -------------- An HTML attachment was scrubbed... URL: From ratamir at redhat.com Wed Apr 11 13:38:07 2018 From: ratamir at redhat.com (Raz Tamir) Date: Wed, 11 Apr 2018 16:38:07 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> Message-ID: +Elad On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg wrote: > On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer wrote: > >> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri wrote: >> >>> Please make sure to run as much OST suites on this patch as possible >>> before merging ( using 'ci please build' ) >>> >> >> But note that OST is not a way to verify the patch. >> >> Such changes require testing with all storage types we support. >> >> Nir >> >> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik >>> wrote: >>> >>>> Hey, >>>> >>>> I've created a patch[0] that is finally able to activate libvirt's >>>> dynamic_ownership for VDSM while not negatively affecting >>>> functionality of our storage code. >>>> >>>> That of course comes with quite a bit of code removal, mostly in the >>>> area of host devices, hwrng and anything that touches devices; bunch >>>> of test changes and one XML generation caveat (storage is handled by >>>> VDSM, therefore disk relabelling needs to be disabled on the VDSM >>>> level). >>>> >>>> Because of the scope of the patch, I welcome storage/virt/network >>>> people to review the code and consider the implication this change has >>>> on current/future features. >>>> >>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>> >>> > In particular: dynamic_ownership was set to 0 prehistorically (as part of > https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because libvirt, > running as root, was not able to play properly with root-squash nfs mounts. > > Have you attempted this use case? > > I join to Nir's request to run this with storage QE. > -- Raz Tamir Manager, RHV QE -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbelenky at redhat.com Sun Apr 22 11:13:19 2018 From: dbelenky at redhat.com (Daniel Belenky) Date: Sun, 22 Apr 2018 14:13:19 +0300 Subject: [ovirt-devel] Disabling STDCI V1 OST jobs Message-ID: Hello, You might have noticed that in the last two weeks, you see a new job reporting when sending a patch to OST called "*ovirt-system-tests_standard-check-patch*". This is an STDCI V2 based check-patch job for OST. We have tested it in the past 2.5 weeks on OST project to ensure the reporting is correct and accurate. We've now come to a point where we will disable the old job in favor of the new one. *How it will affect you when sending a patch to OST:* The major advantage of STDCI V2 is the ability to run tasks in *parallel*. We utilize this feature in OST to run all the suites that were affected by your change in parallel. It means that patches that change some common file that triggers 4 suites, should finish checking now within ~30-40 min instead of ~2 hours. Another advantage of STDCI V2 based job is that in case of a failure in a particular suite, we keep running the rest of the suites, so when the job finishes, by looking at the results we already know exactly which suites were affected and which are still working (unlike in STDCI V1 where the first failing suite would have failed the entire job). *Note:* suites will run in parallel on different nodes, so they won't affect each others environment. *How to debug:* *Logs location* All suite logs are still collected as they used to, and located under *exported-artifacts/check-patch.{suite-name}.el7.x86_64*. *Blue ocean view* You can now see a graphical representation of the suites running in parallel for your change. At the left side menu in your check-patch job, click on "*Open Blue Ocean*" Loading may take few seconds. Let STDCI compute the relevant suites for your patch, and start running them: If you want to download the *console output* from a certain suite, press the circle above the suite's name in "Blue Ocean" view, and below the graph view, you can find the following icons: The right rectangle will open a new window with the job's console output and be pressing the arrow you can download it. ? If any questions will raise, please feel free contacting OST team directly or by opening a ticket by sending an email to infra-support at ovirt.org. -- DANIEL BELENKY RHV DEVOPS -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: logs-download-icons.png Type: image/png Size: 3382 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blueoceanviewjobs.png Type: image/png Size: 35091 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blueoceanicon.png Type: image/png Size: 23777 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: all-logs.png Type: image/png Size: 44481 bytes Desc: not available URL: From ebenahar at redhat.com Sun Apr 22 22:29:43 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Mon, 23 Apr 2018 01:29:43 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> Message-ID: Sorry, this is the new execution link: https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-storage/1048/testReport/ On Mon, Apr 23, 2018 at 1:23 AM, Elad Ben Aharon wrote: > Hi, I've triggered another execution [1] due to some issues I saw in the > first which are not related to the patch. > > The success rate is 78% which is low comparing to tier1 executions with > code from downstream builds (95-100% success rates) [2]. > > From what I could see so far, there is an issue with move and copy > operations to and from Gluster domains. For example [3]. > > The logs are attached. > > > [1] > *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/ > * > > > > [2] > https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv- > 4.2-ge-runner-tier1-after-upgrade/7/ > > > > [3] > 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH > deleteImage error=Image does not exist in domain: > 'image=cabb8846-7a4b-4244-9835-5f603e682f33, > domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' from=: > :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935, > task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) > 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] > (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, > in _run > return fn(*args, **kargs) > File "", line 2, in deleteImage > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in > method > ret = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, > in deleteImage > raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) > ImageDoesNotExistInSD: Image does not exist in domain: > 'image=cabb8846-7a4b-4244-9835-5f603e682f33, > domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' > 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task] > (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted: > "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835- > 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268 > (task:1181) > 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH > deleteImage error=Image does not exist in domain: > 'image=cabb8846-7a4b-4244-9835-5f603e682f33, > domain=e5fd29c8-52ba-467e-be09-ca40ff054d > d4' (dispatcher:82) > > > > On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon > wrote: > >> Triggered a sanity tier1 execution [1] using [2], which covers all the >> requested areas, on iSCSI, NFS and Gluster. >> I'll update with the results. >> >> [1] >> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 >> _dev/job/rhv-4.2-ge-flow-storage/1161/ >> >> [2] >> https://gerrit.ovirt.org/#/c/89830/ >> vdsm-4.30.0-291.git77aef9a.el7.x86_64 >> >> >> >> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik >> wrote: >> >>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >>> >>>> Hi Martin, >>>> >>>> I see [1] requires a rebase, can you please take care? >>>> >>> >>> Should be rebased. >>> >>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and >>>> FC. >>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's >>>> not >>>> stable enough at the moment. >>>> >>> >>> That is still pretty good. >>> >>> >>> [1] https://gerrit.ovirt.org/#/c/89830/ >>>> >>>> >>>> Thanks >>>> >>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >>>> wrote: >>>> >>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>>> >>>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>>>> areas >>>>>> have to be tested here. >>>>>> >>>>>> >>>>> I'd say that you have quite a bit of freedom in this regard. GlusterFS >>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>>>> that covers basic operations (start & stop VM, migrate it), snapshots >>>>> and merging them, and whatever else would be important for storage >>>>> sanity. >>>>> >>>>> mpolednik >>>>> >>>>> >>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik < >>>>> mpolednik at redhat.com> >>>>> >>>>>> wrote: >>>>>> >>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>>>> >>>>>>> >>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, >>>>>>> >>>>>>>> will >>>>>>>> have to check, since usually, we don't execute our automation on >>>>>>>> them. >>>>>>>> >>>>>>>> >>>>>>>> Any update on this? I believe the gluster tests were successful, OST >>>>>>> passes fine and unit tests pass fine, that makes the storage backends >>>>>>> test the last required piece. >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>>> +Elad >>>>>>>> >>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>>>>> > >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>>>> possible >>>>>>>>>>> >>>>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>>>> >>>>>>>>>>> Nir >>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>>>> mpolednik at redhat.com >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hey, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>>>> libvirt's >>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>>>> functionality of our storage code. >>>>>>>>>>>>> >>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly >>>>>>>>>>>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>>>>> bunch >>>>>>>>>>>>> of test changes and one XML generation caveat (storage is >>>>>>>>>>>>> handled >>>>>>>>>>>>> by >>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the >>>>>>>>>>>>> VDSM >>>>>>>>>>>>> level). >>>>>>>>>>>>> >>>>>>>>>>>>> Because of the scope of the patch, I welcome >>>>>>>>>>>>> storage/virt/network >>>>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>>>> change >>>>>>>>>>>>> has >>>>>>>>>>>>> on current/future features. >>>>>>>>>>>>> >>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically >>>>>>>>>>>>> (as >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> part >>>>>>>>>>> >>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>>>>> libvirt, >>>>>>>>>> running as root, was not able to play properly with root-squash >>>>>>>>>> nfs >>>>>>>>>> mounts. >>>>>>>>>> >>>>>>>>>> Have you attempted this use case? >>>>>>>>>> >>>>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>> Raz Tamir >>>>>>>>> Manager, RHV QE >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebenahar at redhat.com Sun Apr 22 22:34:03 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Mon, 23 Apr 2018 01:34:03 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> Message-ID: Also, snapshot preview failed (2nd snapshot): 2018-04-22 18:01:06,253+0300 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Volume.create succeeded in 0.84 seconds (__init__:311) 2018-04-22 18:01:06,261+0300 INFO (tasks/6) [storage.ThreadPool.WorkerThread] START task 6823d724-cb1b-4706-a58a-83428363cce5 (cmd=>, args=None) (threadPool :208) 2018-04-22 18:01:06,906+0300 WARN (check/loop) [storage.asyncutils] Call > delayed by 0.51 seconds (asyncutils:138) 2018-04-22 18:01:07,082+0300 WARN (tasks/6) [storage.ResourceManager] Resource factory failed to create resource '01_img_7df9d2b2-52b5-4ac2-a9f0-a1d1e93eb6d2.095ad9d6-3154-449c-868c-f975dcdcb729'. Canceling request. (resourceManager:543 ) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 539, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 198, in getChain uuidlist = volclass.getImageVolumes(sdUUID, imgUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1537, in getImageVolumes return cls.manifestClass.getImageVolumes(sdUUID, imgUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 337, in getImageVolumes if (sd.produceVolume(imgUUID, volid).getImage() == imgUUID): File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 438, in produceVolume volUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 69, in __init__ volUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86, in __init__ self.validate() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 112, in validate self.validateVolumePath() File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 129, in validateVolumePath raise se.VolumeDoesNotExist(self.volUUID) VolumeDoesNotExist: Volume does not exist: (u'a404bfc9-57ef-4dcc-9f1b-458dfb08ad74',) 2018-04-22 18:01:07,083+0300 WARN (tasks/6) [storage.ResourceManager.Request] (ResName='01_img_7df9d2b2-52b5-4ac2-a9f0-a1d1e93eb6d2.095ad9d6-3154-449c-868c-f975dcdcb729', ReqID='79c96e70-7334-4402-a390-dc87f939b7d2') Tried to cancel a p rocessed request (resourceManager:187) 2018-04-22 18:01:07,084+0300 ERROR (tasks/6) [storage.TaskManager.Task] (Task='6823d724-cb1b-4706-a58a-83428363cce5') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1939, in createVolume with rm.acquireResource(img_ns, imgUUID, rm.EXCLUSIVE): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 1025, in acquireResource return _manager.acquireResource(namespace, name, lockType, timeout=timeout) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 475, in acquireResource raise se.ResourceAcqusitionFailed() ResourceAcqusitionFailed: Could not acquire resource. Probably resource factory threw an exception.: () 2018-04-22 18:01:07,735+0300 INFO (tasks/6) [storage.ThreadPool.WorkerThread] FINISH task 6823d724-cb1b-4706-a58a-83428363cce5 (threadPool:210) *Steps from [1]:* *17:54:41* 2018-04-22 17:54:41,574 INFO Test Setup 2: Creating VM vm_TestCase11660_2217544157*17:54:55* 2018-04-22 17:54:55,593 INFO 049: storage/rhevmtests.storage.storage_snapshots.test_live_snapshot.TestCase11660.test_live_snapshot[glusterfs]*17:54:55* 2018-04-22 17:54:55,593 INFO Create a snapshot while VM is running*17:54:55* 2018-04-22 17:54:55,593 INFO STORAGE: GLUSTERFS*17:58:04* 2018-04-22 17:58:04,761 INFO Test Step 3: Start writing continuously on VM vm_TestCase11660_2217544157 via dd*17:58:35* 2018-04-22 17:58:35,334 INFO Test Step 4: Creating live snapshot on a VM vm_TestCase11660_2217544157*17:58:35* 2018-04-22 17:58:35,334 INFO Test Step 5: Adding new snapshot to VM vm_TestCase11660_2217544157 with all disks*17:58:35* 2018-04-22 17:58:35,337 INFO Test Step 6: Add snapshot to VM vm_TestCase11660_2217544157 with {'description': 'snap_TestCase11660_2217545559', 'wait': True}*17:59:26* 2018-04-22 17:59:26,179 INFO Test Step 7: Writing files to VM's vm_TestCase11660_2217544157 disk*18:00:33* 2018-04-22 18:00:33,117 INFO Test Step 8: Shutdown vm vm_TestCase11660_2217544157 with {'async': 'false'}*18:01:04* 2018-04-22 18:01:04,038 INFO Test Step 9: Previewing snapshot snap_TestCase11660_2217545559 on VM vm_TestCase11660_2217544157 [1] https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-storage/1048/consoleFull On Mon, Apr 23, 2018 at 1:29 AM, Elad Ben Aharon wrote: > Sorry, this is the new execution link: > https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ > rhv-4.2-ge-runner-storage/1048/testReport/ > > On Mon, Apr 23, 2018 at 1:23 AM, Elad Ben Aharon > wrote: > >> Hi, I've triggered another execution [1] due to some issues I saw in the >> first which are not related to the patch. >> >> The success rate is 78% which is low comparing to tier1 executions with >> code from downstream builds (95-100% success rates) [2]. >> >> From what I could see so far, there is an issue with move and copy >> operations to and from Gluster domains. For example [3]. >> >> The logs are attached. >> >> >> [1] >> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/ >> * >> >> >> >> [2] >> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv- >> 4.2-ge-runner-tier1-after-upgrade/7/ >> >> >> >> [3] >> 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH >> deleteImage error=Image does not exist in domain: >> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' from=: >> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935, >> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) >> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] >> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error >> (task:875) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, >> in _run >> return fn(*args, **kargs) >> File "", line 2, in deleteImage >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in >> method >> ret = func(*args, **kwargs) >> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, >> in deleteImage >> raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) >> ImageDoesNotExistInSD: Image does not exist in domain: >> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >> 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task] >> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted: >> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835- >> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268 >> (task:1181) >> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] >> FINISH deleteImage error=Image does not exist in domain: >> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >> domain=e5fd29c8-52ba-467e-be09-ca40ff054d >> d4' (dispatcher:82) >> >> >> >> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon >> wrote: >> >>> Triggered a sanity tier1 execution [1] using [2], which covers all the >>> requested areas, on iSCSI, NFS and Gluster. >>> I'll update with the results. >>> >>> [1] >>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 >>> _dev/job/rhv-4.2-ge-flow-storage/1161/ >>> >>> [2] >>> https://gerrit.ovirt.org/#/c/89830/ >>> vdsm-4.30.0-291.git77aef9a.el7.x86_64 >>> >>> >>> >>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik >>> wrote: >>> >>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >>>> >>>>> Hi Martin, >>>>> >>>>> I see [1] requires a rebase, can you please take care? >>>>> >>>> >>>> Should be rebased. >>>> >>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and >>>>> FC. >>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's >>>>> not >>>>> stable enough at the moment. >>>>> >>>> >>>> That is still pretty good. >>>> >>>> >>>> [1] https://gerrit.ovirt.org/#/c/89830/ >>>>> >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >>>> > >>>>> wrote: >>>>> >>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>>>> >>>>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>>>>> areas >>>>>>> have to be tested here. >>>>>>> >>>>>>> >>>>>> I'd say that you have quite a bit of freedom in this regard. GlusterFS >>>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>>>>> that covers basic operations (start & stop VM, migrate it), snapshots >>>>>> and merging them, and whatever else would be important for storage >>>>>> sanity. >>>>>> >>>>>> mpolednik >>>>>> >>>>>> >>>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik < >>>>>> mpolednik at redhat.com> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>>>>> >>>>>>>> >>>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and >>>>>>>> cinder, >>>>>>>> >>>>>>>>> will >>>>>>>>> have to check, since usually, we don't execute our automation on >>>>>>>>> them. >>>>>>>>> >>>>>>>>> >>>>>>>>> Any update on this? I believe the gluster tests were successful, >>>>>>>> OST >>>>>>>> passes fine and unit tests pass fine, that makes the storage >>>>>>>> backends >>>>>>>> test the last required piece. >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> +Elad >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg < >>>>>>>>>> danken at redhat.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>>>>> possible >>>>>>>>>>>> >>>>>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>>>>> >>>>>>>>>>>> Nir >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>>>>> mpolednik at redhat.com >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hey, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>>>>> libvirt's >>>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>>>>> functionality of our storage code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly >>>>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>>>>>> bunch >>>>>>>>>>>>>> of test changes and one XML generation caveat (storage is >>>>>>>>>>>>>> handled >>>>>>>>>>>>>> by >>>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the >>>>>>>>>>>>>> VDSM >>>>>>>>>>>>>> level). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Because of the scope of the patch, I welcome >>>>>>>>>>>>>> storage/virt/network >>>>>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>>>>> change >>>>>>>>>>>>>> has >>>>>>>>>>>>>> on current/future features. >>>>>>>>>>>>>> >>>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> In particular: dynamic_ownership was set to 0 >>>>>>>>>>>>>> prehistorically (as >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> part >>>>>>>>>>>> >>>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>>>>>> libvirt, >>>>>>>>>>> running as root, was not able to play properly with root-squash >>>>>>>>>>> nfs >>>>>>>>>>> mounts. >>>>>>>>>>> >>>>>>>>>>> Have you attempted this use case? >>>>>>>>>>> >>>>>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Raz Tamir >>>>>>>>>> Manager, RHV QE >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebenahar at redhat.com Sun Apr 22 22:23:41 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Mon, 23 Apr 2018 01:23:41 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180410130916.GB37403@Alexandra.local> <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> Message-ID: Hi, I've triggered another execution [1] due to some issues I saw in the first which are not related to the patch. The success rate is 78% which is low comparing to tier1 executions with code from downstream builds (95-100% success rates) [2]. >From what I could see so far, there is an issue with move and copy operations to and from Gluster domains. For example [3]. The logs are attached. [1] *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/ * [2] https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ rhv-4.2-ge-runner-tier1-after-upgrade/7/ [3] 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH deleteImage error=Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' from=: :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935, task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "", line 2, in deleteImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, in deleteImage raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) ImageDoesNotExistInSD: Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task] (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted: "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835- 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268 (task:1181) 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH deleteImage error=Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-5f603e682f33, domain=e5fd29c8-52ba-467e-be09 -ca40ff054d d4' (dispatcher:82) On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon wrote: > Triggered a sanity tier1 execution [1] using [2], which covers all the > requested areas, on iSCSI, NFS and Gluster. > I'll update with the results. > > [1] > https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 > _dev/job/rhv-4.2-ge-flow-storage/1161/ > > [2] > https://gerrit.ovirt.org/#/c/89830/ > vdsm-4.30.0-291.git77aef9a.el7.x86_64 > > > > On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik > wrote: > >> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >> >>> Hi Martin, >>> >>> I see [1] requires a rebase, can you please take care? >>> >> >> Should be rebased. >> >> At the moment, our automation is stable only on iSCSI, NFS, Gluster and >>> FC. >>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's not >>> stable enough at the moment. >>> >> >> That is still pretty good. >> >> >> [1] https://gerrit.ovirt.org/#/c/89830/ >>> >>> >>> Thanks >>> >>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >>> wrote: >>> >>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>> >>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>>> areas >>>>> have to be tested here. >>>>> >>>>> >>>> I'd say that you have quite a bit of freedom in this regard. GlusterFS >>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>>> that covers basic operations (start & stop VM, migrate it), snapshots >>>> and merging them, and whatever else would be important for storage >>>> sanity. >>>> >>>> mpolednik >>>> >>>> >>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik >>> > >>>> >>>>> wrote: >>>>> >>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>>> >>>>>> >>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, >>>>>> >>>>>>> will >>>>>>> have to check, since usually, we don't execute our automation on >>>>>>> them. >>>>>>> >>>>>>> >>>>>>> Any update on this? I believe the gluster tests were successful, OST >>>>>> passes fine and unit tests pass fine, that makes the storage backends >>>>>> test the last required piece. >>>>>> >>>>>> >>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir >>>>>> wrote: >>>>>> >>>>>> >>>>>>> +Elad >>>>>>> >>>>>>> >>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>>> possible >>>>>>>>>> >>>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>>> >>>>>>>>>> Nir >>>>>>>>>> >>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>>> mpolednik at redhat.com >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hey, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>>> libvirt's >>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>>> functionality of our storage code. >>>>>>>>>>>> >>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly in >>>>>>>>>>>> the >>>>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>>>> bunch >>>>>>>>>>>> of test changes and one XML generation caveat (storage is >>>>>>>>>>>> handled >>>>>>>>>>>> by >>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the >>>>>>>>>>>> VDSM >>>>>>>>>>>> level). >>>>>>>>>>>> >>>>>>>>>>>> Because of the scope of the patch, I welcome >>>>>>>>>>>> storage/virt/network >>>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>>> change >>>>>>>>>>>> has >>>>>>>>>>>> on current/future features. >>>>>>>>>>>> >>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically >>>>>>>>>>>> (as >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> part >>>>>>>>>> >>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>>>> libvirt, >>>>>>>>> running as root, was not able to play properly with root-squash nfs >>>>>>>>> mounts. >>>>>>>>> >>>>>>>>> Have you attempted this use case? >>>>>>>>> >>>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> Raz Tamir >>>>>>>> Manager, RHV QE >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: logs.tar.gz Type: application/gzip Size: 7107767 bytes Desc: not available URL: From lostonamountain at gmail.com Mon Apr 23 02:39:48 2018 From: lostonamountain at gmail.com (Andrew DeMaria) Date: Sun, 22 Apr 2018 22:39:48 -0400 Subject: [ovirt-devel] Removing validateQemuReadable check Message-ID: Hi all, I am using ovirt with an nfs server that enforces sec=krb5p. To do so, I have created a service account of a sort for ovirt to use when accessing the kerberized nfs server. Things seem to work pretty well except for the the ISO storage domain. After adding an ISO image, it does not show in the storage domain. The problem is the following check when searching for iso files within the nfs mount: def validateQemuReadable(self, targetPath): """ Validate that qemu process can read file """ gids = (grp.getgrnam(constants.DISKIMAGE_GROUP).gr_gid, grp.getgrnam(constants.METADATA_GROUP).gr_gid) st = _IOProcessOs(self._iop).stat(targetPath) if not (st.st_gid in gids and st.st_mode & stat.S_IRGRP or st.st_mode & stat.S_IROTH): raise OSError(errno.EACCES, os.strerror(errno.EACCES)) Although my vdsm and qemu user can read and write to the iso file, this check fails as the file is not group owned by either, but by my service account: -bash-4.2$ whoami vdsm -bash-4.2$ sha256sum Fedora-Workstation-netinst-x86_64-27-1.6.iso 18ef4a6f9f470b40bd0cdf21e6c8f5c43c28e3a2200dcc8578ec9da25a6b376b Fedora-Workstation-netinst-x86_64-27-1.6.iso -bash-4.2$ touch Fedora-Workstation-netinst-x86_64-27-1.6.iso -bash-4.2$ ls -alh Fedora-Workstation-netinst-x86_64-27-1.6.iso -rw-r-----. 1 autovirt autovirt 508M Apr 22 20:31 Fedora-Workstation-netinst-x86_64-27-1.6.iso -bash-4.2$ klist Ticket cache: KEYRING:persistent:36:36 Default principal: autovirt at SOMEDOMAIN.NET Valid starting Expires Service principal 04/22/2018 20:03:57 04/23/2018 20:03:57 krbtgt/ SOMEDOMAIN.NET at SOMEDOMAIN.NET After modifying the validateQemuReadable functions (fileUtils.py and outOfProcess.py) to be a noop return True, the ISO file showed up and I was able to use it in a VM. Would a patch be considered to remove this validateQemuReadable check? As shown above, the current implementation causes more harm than good. I'd rather for non readable ISO files to show up in the list and get a failure during VM runtime anyways. Thanks, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Mon Apr 23 08:36:27 2018 From: bkorren at redhat.com (bkorren at redhat.com) Date: Mon, 23 Apr 2018 08:36:27 +0000 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) Message-ID: <0000000000000bdd5f056a7fef2e@google.com> You have been invited to the following event. Title: oVirt STDCI v2 deep dive Introduction to the 2nd version of oVirt's CI standard - What is it, what can it do, how to use it and how does it work. BJ link: https://bluejeans.com/8705030462 When: Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem Where: raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 Calendar: devel at ovirt.org Who: * bkorren at redhat.com - organizer * devel at ovirt.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=MHBvcmNoYWQ5YmtzNmlmdWw2M25jNzM5djMgZGV2ZWxAb3ZpcnQub3Jn&tok=MTgjYmtvcnJlbkByZWRoYXQuY29tODBiMjViYzFjZmZhYWYzMmJiNmNlNWU3NTA3OGRjOGQwYmJiNTBhOA&ctz=Asia%2FJerusalem&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account devel at ovirt.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to modify your RSVP response. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1711 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1746 bytes Desc: not available URL: From gshereme at redhat.com Mon Apr 23 11:34:31 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Mon, 23 Apr 2018 07:34:31 -0400 Subject: [ovirt-devel] OST ha_recovery test failing Message-ID: I'm seeing this fail periodically on a patch I'm working on, and it's not related to my work. Any ideas? gshereme at redhat.com IRC: gshereme -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbonazzo at redhat.com Mon Apr 23 11:36:57 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Mon, 23 Apr 2018 13:36:57 +0200 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) In-Reply-To: <0000000000000bdd5f056a7fef2e@google.com> References: <0000000000000bdd5f056a7fef2e@google.com> Message-ID: >From past experience, sending calendar invitation to ovirt mailing lists doesn't work well. see https://lists.ovirt.org/pipermail/users/2018-March/087616.html for reference. I would recommend to track this on https://ovirt.org/events/ by adding the event following https://github.com/OSAS/rh-events/wiki/Adding-and-modifying-events I would also recommend to send personal invitation to oVirt team leads to be sure they see it. If you need to track who's going to join, I would recommend a ticketing system like eventbrite. 2018-04-23 10:36 GMT+02:00 : > more details ? > > oVirt STDCI v2 deep dive > > *When* > Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem > *Where* > raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 (map > > ) > *Calendar* > devel at ovirt.org > *Who* > ? > bkorren at redhat.com - organizer > ? > devel at ovirt.org > > Introduction to the 2nd version of oVirt's CI standard - What is it, what > can it do, how to use it and how does it work. > > BJ link: > https://bluejeans.com/8705030462 > > > Going? *Yes > > - Maybe > > - No > * > more options ? > > > Invitation from Google Calendar > > You are receiving this courtesy email at the account devel at ovirt.org > because you are an attendee of this event. > > To stop receiving future updates for this event, decline this event. > Alternatively you can sign up for a Google account at > https://www.google.com/calendar/ and control your notification settings > for your entire calendar. > > Forwarding this invitation could allow any recipient to modify your RSVP > response. Learn More > . > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Mon Apr 23 11:43:41 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Mon, 23 Apr 2018 07:43:41 -0400 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) In-Reply-To: References: <0000000000000bdd5f056a7fef2e@google.com> Message-ID: Is this the only meeting planned? 4am US time, I'll have to get up a few mins early :) On Mon, Apr 23, 2018 at 7:36 AM, Sandro Bonazzola wrote: > From past experience, sending calendar invitation to ovirt mailing lists > doesn't work well. see https://lists.ovirt.org/pipermail/users/2018-March/ > 087616.html for reference. > > I would recommend to track this on https://ovirt.org/events/ by adding > the event following https://github.com/OSAS/rh-events/wiki/Adding- > and-modifying-events > > I would also recommend to send personal invitation to oVirt team leads to > be sure they see it. > > If you need to track who's going to join, I would recommend a ticketing > system like eventbrite. > > 2018-04-23 10:36 GMT+02:00 : > >> more details ? >> >> oVirt STDCI v2 deep dive >> >> *When* >> Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem >> *Where* >> raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 (map >> >> ) >> *Calendar* >> devel at ovirt.org >> *Who* >> ? >> bkorren at redhat.com - organizer >> ? >> devel at ovirt.org >> >> Introduction to the 2nd version of oVirt's CI standard - What is it, what >> can it do, how to use it and how does it work. >> >> BJ link: >> https://bluejeans.com/8705030462 >> >> >> Going? *Yes >> >> - Maybe >> >> - No >> * >> more options ? >> >> >> Invitation from Google Calendar >> >> You are receiving this courtesy email at the account devel at ovirt.org >> because you are an attendee of this event. >> >> To stop receiving future updates for this event, decline this event. >> Alternatively you can sign up for a Google account at >> https://www.google.com/calendar/ and control your notification settings >> for your entire calendar. >> >> Forwarding this invitation could allow any recipient to modify your RSVP >> response. Learn More >> . >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > > SANDRO BONAZZOLA > > ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D > > Red Hat EMEA > > sbonazzo at redhat.com > > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Mon Apr 23 12:10:15 2018 From: dron at redhat.com (Dafna Ron) Date: Mon, 23 Apr 2018 13:10:15 +0100 Subject: [ovirt-devel] OST ha_recovery test failing In-Reply-To: References: Message-ID: Tal, can you have a look at the logs? http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/363/artifact/exported-artifacts/check-patch.basic_suite_master.el7.x86_64/test_logs/basic-suite-master/post-004_basic_sanity.py/ I can see in the logs that the vm fails to re-start with vm path does not exist. but what I found odd is that we try to start the vm on the same host that it ran on before and I think that the issue is actually that we set the HA vm as a pin to host vm (which should not be a possibility since HA should start the vm on a different host). Can you please have a look? Thanks. Dafna On Mon, Apr 23, 2018 at 12:34 PM, Greg Sheremeta wrote: > I'm seeing this fail periodically on a patch I'm working on, and it's not > related to my work. Any ideas? > > > > in ha_recovery > lambda: > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 271, > in assert_true_within_long > assert_equals_within_long(func, True, allowed_exceptions) > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 258, > in assert_equals_within_long > func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, > in assert_equals_within > '%s != %s after %s seconds' % (res, value, timeout) > 'False != True after 600 seconds > > > http://jenkins.ovirt.org/job/ovirt-system-tests_standard- > check-patch/363/artifact/exported-artifacts/check- > patch.basic_suite_master.el7.x86_64/004_basic_sanity.py.junit.xml > > > > -- > > GREG SHEREMETA > > SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX > > Red Hat NA > > > > gshereme at redhat.com IRC: gshereme > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eedri at redhat.com Mon Apr 23 12:11:28 2018 From: eedri at redhat.com (Eyal Edri) Date: Mon, 23 Apr 2018 15:11:28 +0300 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: Sahina, Any update on this? On Wed, Apr 18, 2018 at 3:40 PM, Sandro Bonazzola wrote: > > > 2018-04-18 9:37 GMT+02:00 Eyal Edri : > >> FYI, >> >> I've disabled the 4.2 and master HC suites nightly run on CI as they are >> constantly failing for almost 3 weeks and spamming the mailing lists. >> > > > HC uses gdeploy 2.0.6 which was released in December and was based on > ansible 2.4. > ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is not > supporting ansible-2.5 properly. > I had no time to validate my guess with proof, so please Sahina cross > check this. > > > >> >> I think this should get higher priority for a fix if we want it to >> provide any value, >> Work can continue using the manual jobs or via check-patch. >> >> >> On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim >> wrote: >> >>> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >>> The HC suites still failing and it's hard to understand why without the >>> logs from the engine VM. >>> >>> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose wrote: >>> >>>> >>>> >>>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi >>>> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose wrote: >>>>> >>>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>>> >>>>>> Both the 4.2 and master suites are failing on getting local VM IP. >>>>>> Any idea what changed or if I have to change the test? >>>>>> >>>>>> thanks! >>>>>> >>>>> >>>>> Hi Sahina, >>>>> 4.2 and master suite non HC are correctly running this morning. >>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>>> >>>>> I'll try to check the difference with HC suites. >>>>> >>>>> Are you using more than one subnet in the HC suites? >>>>> >>>> >>>> No, I'm not. And we havent's changed anything related to network in the >>>> test suite. >>>> >>>> >>>> >>> >>> >>> -- >>> *GAL bEN HAIM* >>> RHV DEVOPS >>> >> >> >> >> -- >> >> Eyal edri >> >> >> MANAGER >> >> RHV DevOps >> >> EMEA VIRTUALIZATION R&D >> >> >> Red Hat EMEA >> TRIED. TESTED. TRUSTED. >> phone: +972-9-7692018 >> irc: eedri (on #tlv #rhev-dev #rhev-integ) >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > > SANDRO BONAZZOLA > > ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D > > Red Hat EMEA > > sbonazzo at redhat.com > > > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Mon Apr 23 12:56:54 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Mon, 23 Apr 2018 14:56:54 +0200 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> Message-ID: <20180423125653.GA19987@Alexandra.local> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote: >Hi, I've triggered another execution [1] due to some issues I saw in the >first which are not related to the patch. > >The success rate is 78% which is low comparing to tier1 executions with >code from downstream builds (95-100% success rates) [2]. Could you run the current master (without the dynamic_ownership patch) so that we have viable comparision? >From what I could see so far, there is an issue with move and copy >operations to and from Gluster domains. For example [3]. > >The logs are attached. > > >[1] >*https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/ >* > > > >[2] >https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ >rhv-4.2-ge-runner-tier1-after-upgrade/7/ > > > >[3] >2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH >deleteImage error=Image does not exist in domain: >'image=cabb8846-7a4b-4244-9835-5f603e682f33, >domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >from=: >:ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935, >task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) >2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] >(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875) >Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in >_run > return fn(*args, **kargs) > File "", line 2, in deleteImage > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in >method > ret = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, in >deleteImage > raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) >ImageDoesNotExistInSD: Image does not exist in domain: >'image=cabb8846-7a4b-4244-9835-5f603e682f33, >domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' > >2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task] >(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted: >"Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835- >5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268 >(task:1181) >2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH >deleteImage error=Image does not exist in domain: >'image=cabb8846-7a4b-4244-9835-5f603e682f33, domain=e5fd29c8-52ba-467e-be09 >-ca40ff054d >d4' (dispatcher:82) > > > >On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon >wrote: > >> Triggered a sanity tier1 execution [1] using [2], which covers all the >> requested areas, on iSCSI, NFS and Gluster. >> I'll update with the results. >> >> [1] >> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 >> _dev/job/rhv-4.2-ge-flow-storage/1161/ >> >> [2] >> https://gerrit.ovirt.org/#/c/89830/ >> vdsm-4.30.0-291.git77aef9a.el7.x86_64 >> >> >> >> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik >> wrote: >> >>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >>> >>>> Hi Martin, >>>> >>>> I see [1] requires a rebase, can you please take care? >>>> >>> >>> Should be rebased. >>> >>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and >>>> FC. >>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's not >>>> stable enough at the moment. >>>> >>> >>> That is still pretty good. >>> >>> >>> [1] https://gerrit.ovirt.org/#/c/89830/ >>>> >>>> >>>> Thanks >>>> >>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >>>> wrote: >>>> >>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>>> >>>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>>>> areas >>>>>> have to be tested here. >>>>>> >>>>>> >>>>> I'd say that you have quite a bit of freedom in this regard. GlusterFS >>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>>>> that covers basic operations (start & stop VM, migrate it), snapshots >>>>> and merging them, and whatever else would be important for storage >>>>> sanity. >>>>> >>>>> mpolednik >>>>> >>>>> >>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik >>>> > >>>>> >>>>>> wrote: >>>>>> >>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>>>> >>>>>>> >>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and cinder, >>>>>>> >>>>>>>> will >>>>>>>> have to check, since usually, we don't execute our automation on >>>>>>>> them. >>>>>>>> >>>>>>>> >>>>>>>> Any update on this? I believe the gluster tests were successful, OST >>>>>>> passes fine and unit tests pass fine, that makes the storage backends >>>>>>> test the last required piece. >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>>> +Elad >>>>>>>> >>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>>>> possible >>>>>>>>>>> >>>>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>>>> >>>>>>>>>>> Nir >>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>>>> mpolednik at redhat.com >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hey, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>>>> libvirt's >>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>>>> functionality of our storage code. >>>>>>>>>>>>> >>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly in >>>>>>>>>>>>> the >>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>>>>> bunch >>>>>>>>>>>>> of test changes and one XML generation caveat (storage is >>>>>>>>>>>>> handled >>>>>>>>>>>>> by >>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the >>>>>>>>>>>>> VDSM >>>>>>>>>>>>> level). >>>>>>>>>>>>> >>>>>>>>>>>>> Because of the scope of the patch, I welcome >>>>>>>>>>>>> storage/virt/network >>>>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>>>> change >>>>>>>>>>>>> has >>>>>>>>>>>>> on current/future features. >>>>>>>>>>>>> >>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically >>>>>>>>>>>>> (as >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> part >>>>>>>>>>> >>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>>>>> libvirt, >>>>>>>>>> running as root, was not able to play properly with root-squash nfs >>>>>>>>>> mounts. >>>>>>>>>> >>>>>>>>>> Have you attempted this use case? >>>>>>>>>> >>>>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>> Raz Tamir >>>>>>>>> Manager, RHV QE >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >> From sabose at redhat.com Mon Apr 23 12:58:06 2018 From: sabose at redhat.com (Sahina Bose) Date: Mon, 23 Apr 2018 18:28:06 +0530 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: On Mon, Apr 23, 2018 at 5:41 PM, Eyal Edri wrote: > Sahina, > Any update on this? > Sorry, haven't been able to spend any time on this. The last I checked the HE install was failing at task - Get Local VM IP. and there were no logs from HE VM to debug. Will spend sometime on this tomorrow > On Wed, Apr 18, 2018 at 3:40 PM, Sandro Bonazzola > wrote: > >> >> >> 2018-04-18 9:37 GMT+02:00 Eyal Edri : >> >>> FYI, >>> >>> I've disabled the 4.2 and master HC suites nightly run on CI as they are >>> constantly failing for almost 3 weeks and spamming the mailing lists. >>> >> >> >> HC uses gdeploy 2.0.6 which was released in December and was based on >> ansible 2.4. >> ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is not >> supporting ansible-2.5 properly. >> I had no time to validate my guess with proof, so please Sahina cross >> check this. >> >> >> >>> >>> I think this should get higher priority for a fix if we want it to >>> provide any value, >>> Work can continue using the manual jobs or via check-patch. >>> >>> >>> On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim >>> wrote: >>> >>>> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >>>> The HC suites still failing and it's hard to understand why without the >>>> logs from the engine VM. >>>> >>>> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose wrote: >>>> >>>>> >>>>> >>>>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi >>>> > wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose >>>>>> wrote: >>>>>> >>>>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>>>> >>>>>>> Both the 4.2 and master suites are failing on getting local VM IP. >>>>>>> Any idea what changed or if I have to change the test? >>>>>>> >>>>>>> thanks! >>>>>>> >>>>>> >>>>>> Hi Sahina, >>>>>> 4.2 and master suite non HC are correctly running this morning. >>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>>>> >>>>>> I'll try to check the difference with HC suites. >>>>>> >>>>>> Are you using more than one subnet in the HC suites? >>>>>> >>>>> >>>>> No, I'm not. And we havent's changed anything related to network in >>>>> the test suite. >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> *GAL bEN HAIM* >>>> RHV DEVOPS >>>> >>> >>> >>> >>> -- >>> >>> Eyal edri >>> >>> >>> MANAGER >>> >>> RHV DevOps >>> >>> EMEA VIRTUALIZATION R&D >>> >>> >>> Red Hat EMEA >>> TRIED. TESTED. TRUSTED. >>> >>> phone: +972-9-7692018 >>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> >> -- >> >> SANDRO BONAZZOLA >> >> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >> >> Red Hat EMEA >> >> sbonazzo at redhat.com >> >> >> > > > > -- > > Eyal edri > > > MANAGER > > RHV DevOps > > EMEA VIRTUALIZATION R&D > > > Red Hat EMEA > TRIED. TESTED. TRUSTED. > phone: +972-9-7692018 > irc: eedri (on #tlv #rhev-dev #rhev-integ) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykaul at redhat.com Mon Apr 23 13:59:31 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Mon, 23 Apr 2018 16:59:31 +0300 Subject: [ovirt-devel] OST ha_recovery test failing In-Reply-To: References: Message-ID: On Mon, Apr 23, 2018 at 3:10 PM, Dafna Ron wrote: > Tal, can you have a look at the logs? http://jenkins.ovirt.org/job/ > ovirt-system-tests_standard-check-patch/363/artifact/ > exported-artifacts/check-patch.basic_suite_master.el7. > x86_64/test_logs/basic-suite-master/post-004_basic_sanity.py/ > > I can see in the logs that the vm fails to re-start with vm path does not > exist. > but what I found odd is that we try to start the vm on the same host that > it ran on before and I think that the issue is actually that we set the HA > vm as a pin to host vm (which should not be a possibility since HA should > start the vm on a different host). > > Can you please have a look? > Is it still the issue around the high-perf VM? It has to be (on one hand) pinned to a single host (if we define NUMA topo for it) and on the other hand cannot be used for HA recovery test... Can you revert the whole commit around high perf VM? I assume we'll need to re-think on how to do it. Y. > Thanks. > Dafna > > > On Mon, Apr 23, 2018 at 12:34 PM, Greg Sheremeta > wrote: > >> I'm seeing this fail periodically on a patch I'm working on, and it's not >> related to my work. Any ideas? >> >> >> >> > in ha_recovery >> lambda: >> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> 271, in assert_true_within_long >> assert_equals_within_long(func, True, allowed_exceptions) >> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> 258, in assert_equals_within_long >> func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions >> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >> 237, in assert_equals_within >> '%s != %s after %s seconds' % (res, value, timeout) >> 'False != True after 600 seconds >> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-che >> ck-patch/363/artifact/exported-artifacts/check-patch.basic_ >> suite_master.el7.x86_64/004_basic_sanity.py.junit.xml >> >> >> >> -- >> >> GREG SHEREMETA >> >> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX >> >> Red Hat NA >> >> >> >> gshereme at redhat.com IRC: gshereme >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mzamazal at redhat.com Mon Apr 23 14:07:14 2018 From: mzamazal at redhat.com (Milan Zamazal) Date: Mon, 23 Apr 2018 16:07:14 +0200 Subject: [ovirt-devel] Vdsm: Intent to create ovirt-4.2.3 branch Message-ID: <87vaci117s.fsf@redhat.com> Hi, since oVirt 4.2.3 has been stabilizing, we will make ovirt-4.2.3 Vdsm branch in a couple of days, once the next tag is created. This is in accordance with the stable branch policy outlined and proposed by Francesco[1]. Until the next Vdsm tag is created, we accept only Vdsm patches intended for oVirt 4.2.3 to ovirt-4.2 branch. After the next tag is created, we'll make separate ovirt-4.2.3 branch and accept all qualified Vdsm patches to ovirt-4.2 again. Then Vdsm patches for 4.2.3 must be additionally backported to the newly created ovirt-4.2.3 branch. Thanks, Milan Footnotes: [1] http://lists.ovirt.org/pipermail/devel/2018-March/032820.html From ebenahar at redhat.com Mon Apr 23 21:37:37 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Tue, 24 Apr 2018 00:37:37 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: <20180423125653.GA19987@Alexandra.local> References: <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> <20180423125653.GA19987@Alexandra.local> Message-ID: I will update with the results of the next tier1 execution on latest 4.2.3 On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik wrote: > On 23/04/18 01:23 +0300, Elad Ben Aharon wrote: > >> Hi, I've triggered another execution [1] due to some issues I saw in the >> first which are not related to the patch. >> >> The success rate is 78% which is low comparing to tier1 executions with >> code from downstream builds (95-100% success rates) [2]. >> > > Could you run the current master (without the dynamic_ownership patch) > so that we have viable comparision? > > From what I could see so far, there is an issue with move and copy >> operations to and from Gluster domains. For example [3]. >> >> The logs are attached. >> >> >> [1] >> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv >> -4.2-ge-runner-tier1-after-upgrade/7/testReport/ >> > -4.2-ge-runner-tier1-after-upgrade/7/testReport/>* >> >> >> >> [2] >> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ >> >> rhv-4.2-ge-runner-tier1-after-upgrade/7/ >> >> >> >> [3] >> 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH >> deleteImage error=Image does not exist in domain: >> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >> from=: >> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935, >> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) >> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] >> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, >> in >> _run >> return fn(*args, **kargs) >> File "", line 2, in deleteImage >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in >> method >> ret = func(*args, **kwargs) >> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, >> in >> deleteImage >> raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) >> ImageDoesNotExistInSD: Image does not exist in domain: >> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >> >> 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task] >> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted: >> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835- >> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268 >> (task:1181) >> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH >> deleteImage error=Image does not exist in domain: >> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >> domain=e5fd29c8-52ba-467e-be09 >> -ca40ff054d >> d4' (dispatcher:82) >> >> >> >> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon >> wrote: >> >> Triggered a sanity tier1 execution [1] using [2], which covers all the >>> requested areas, on iSCSI, NFS and Gluster. >>> I'll update with the results. >>> >>> [1] >>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 >>> _dev/job/rhv-4.2-ge-flow-storage/1161/ >>> >>> [2] >>> https://gerrit.ovirt.org/#/c/89830/ >>> vdsm-4.30.0-291.git77aef9a.el7.x86_64 >>> >>> >>> >>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik >>> wrote: >>> >>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >>>> >>>> Hi Martin, >>>>> >>>>> I see [1] requires a rebase, can you please take care? >>>>> >>>>> >>>> Should be rebased. >>>> >>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and >>>> >>>>> FC. >>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's >>>>> not >>>>> stable enough at the moment. >>>>> >>>>> >>>> That is still pretty good. >>>> >>>> >>>> [1] https://gerrit.ovirt.org/#/c/89830/ >>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >>>> > >>>>> wrote: >>>>> >>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>>> >>>>>> >>>>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>>>> >>>>>>> areas >>>>>>> have to be tested here. >>>>>>> >>>>>>> >>>>>>> I'd say that you have quite a bit of freedom in this regard. >>>>>> GlusterFS >>>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>>>>> that covers basic operations (start & stop VM, migrate it), snapshots >>>>>> and merging them, and whatever else would be important for storage >>>>>> sanity. >>>>>> >>>>>> mpolednik >>>>>> >>>>>> >>>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik < >>>>>> mpolednik at redhat.com >>>>>> > >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>>>>> >>>>>>> >>>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and >>>>>>>> cinder, >>>>>>>> >>>>>>>> will >>>>>>>>> have to check, since usually, we don't execute our automation on >>>>>>>>> them. >>>>>>>>> >>>>>>>>> >>>>>>>>> Any update on this? I believe the gluster tests were successful, >>>>>>>>> OST >>>>>>>>> >>>>>>>> passes fine and unit tests pass fine, that makes the storage >>>>>>>> backends >>>>>>>> test the last required piece. >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> +Elad >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>>>>>> > >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>>>>> possible >>>>>>>>>>>> >>>>>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>>>>> >>>>>>>>>>>> Nir >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>>>>> mpolednik at redhat.com >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hey, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>>>>> libvirt's >>>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>>>>> functionality of our storage code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly >>>>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>>>>>> bunch >>>>>>>>>>>>>> of test changes and one XML generation caveat (storage is >>>>>>>>>>>>>> handled >>>>>>>>>>>>>> by >>>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the >>>>>>>>>>>>>> VDSM >>>>>>>>>>>>>> level). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Because of the scope of the patch, I welcome >>>>>>>>>>>>>> storage/virt/network >>>>>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>>>>> change >>>>>>>>>>>>>> has >>>>>>>>>>>>>> on current/future features. >>>>>>>>>>>>>> >>>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically >>>>>>>>>>>>>> (as >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> part >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>>>>>> libvirt, >>>>>>>>>>> running as root, was not able to play properly with root-squash >>>>>>>>>>> nfs >>>>>>>>>>> mounts. >>>>>>>>>>> >>>>>>>>>>> Have you attempted this use case? >>>>>>>>>>> >>>>>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Raz Tamir >>>>>>>>>> Manager, RHV QE >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Tue Apr 24 04:11:39 2018 From: bkorren at redhat.com (Barak Korren) Date: Tue, 24 Apr 2018 07:11:39 +0300 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) In-Reply-To: References: <0000000000000bdd5f056a7fef2e@google.com> Message-ID: On 23 April 2018 at 14:36, Sandro Bonazzola wrote: > From past experience, sending calendar invitation to ovirt mailing lists > doesn't work well. see https://lists.ovirt.org/pi > permail/users/2018-March/087616.html for reference. > I agree the experience is less then optional - I do seem to get confirmation emails, so some people manage to use it anyway. > I would recommend to track this on https://ovirt.org/events/ by adding > the event following https://github.com/OSAS/rh-events/wiki/Adding-and- > modifying-events > > Are you sure that is the right place for this? that repo seems to list events at the scale of conferences... > I would also recommend to send personal invitation to oVirt team leads to > be sure they see it. > Sure of you can give me a list... > > If you need to track who's going to join, I would recommend a ticketing > system like eventbrite. > Not sure I want to force people to use yet another 3rd party platform for the benefit of me having some tracking information. Do people prefer this? > > 2018-04-23 10:36 GMT+02:00 : > >> more details ? >> >> oVirt STDCI v2 deep dive >> >> *When* >> Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem >> *Where* >> raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 (map >> >> ) >> *Calendar* >> devel at ovirt.org >> *Who* >> ? >> bkorren at redhat.com - organizer >> ? >> devel at ovirt.org >> >> Introduction to the 2nd version of oVirt's CI standard - What is it, what >> can it do, how to use it and how does it work. >> >> BJ link: >> https://bluejeans.com/8705030462 >> >> >> Going? *Yes >> >> - Maybe >> >> - No >> * >> more options ? >> >> >> Invitation from Google Calendar >> >> You are receiving this courtesy email at the account devel at ovirt.org >> because you are an attendee of this event. >> >> To stop receiving future updates for this event, decline this event. >> Alternatively you can sign up for a Google account at >> https://www.google.com/calendar/ and control your notification settings >> for your entire calendar. >> >> Forwarding this invitation could allow any recipient to modify your RSVP >> response. Learn More >> . >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > > SANDRO BONAZZOLA > > ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D > > Red Hat EMEA > > sbonazzo at redhat.com > > > -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbonazzo at redhat.com Tue Apr 24 07:27:39 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Tue, 24 Apr 2018 09:27:39 +0200 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) In-Reply-To: References: <0000000000000bdd5f056a7fef2e@google.com> Message-ID: 2018-04-24 6:11 GMT+02:00 Barak Korren : > > > On 23 April 2018 at 14:36, Sandro Bonazzola wrote: > >> From past experience, sending calendar invitation to ovirt mailing lists >> doesn't work well. see https://lists.ovirt.org/pi >> permail/users/2018-March/087616.html for reference. >> > > I agree the experience is less then optional - I do seem to get > confirmation emails, so some people manage to use it anyway. > > >> I would recommend to track this on https://ovirt.org/events/ by adding >> the event following https://github.com/OSAS/rh-events/wiki/Adding-and- >> modifying-events >> >> > Are you sure that is the right place for this? that repo seems to list > events at the scale of conferences... > Well, this is a kind of conference with just 1 talk :-) > > >> I would also recommend to send personal invitation to oVirt team leads to >> be sure they see it. >> > > Sure of you can give me a list... > Sandro Bonazzola - Integration / Release Engineering Ryan Barry - Node Michal Skrivanek - Virtualization Shirly Radco - Metrics / Data Warehouse Tal Nisan - Storage Martin Sivak - SLA Eyal Edri - Project infrastructure Martin Perina - Infra Dan Kenigsberg - Network Sahina Bose - Gluster Tomas Jelinek - UX I'm missing a reference person for other teams listed in https://ovirt.org/develop/#ovirt-teams: Docs I18N Marketing Spice > > >> >> If you need to track who's going to join, I would recommend a ticketing >> system like eventbrite. >> > > Not sure I want to force people to use yet another 3rd party platform for > the benefit of me having some tracking information. Do people prefer this? > > >> >> 2018-04-23 10:36 GMT+02:00 : >> >>> more details ? >>> >>> oVirt STDCI v2 deep dive >>> >>> *When* >>> Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem >>> *Where* >>> raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 (map >>> >>> ) >>> *Calendar* >>> devel at ovirt.org >>> *Who* >>> ? >>> bkorren at redhat.com - organizer >>> ? >>> devel at ovirt.org >>> >>> Introduction to the 2nd version of oVirt's CI standard - What is it, >>> what can it do, how to use it and how does it work. >>> >>> BJ link: >>> https://bluejeans.com/8705030462 >>> >>> >>> Going? *Yes >>> >>> - Maybe >>> >>> - No >>> * >>> more options ? >>> >>> >>> Invitation from Google Calendar >>> >>> You are receiving this courtesy email at the account devel at ovirt.org >>> because you are an attendee of this event. >>> >>> To stop receiving future updates for this event, decline this event. >>> Alternatively you can sign up for a Google account at >>> https://www.google.com/calendar/ and control your notification settings >>> for your entire calendar. >>> >>> Forwarding this invitation could allow any recipient to modify your RSVP >>> response. Learn More >>> . >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> >> -- >> >> SANDRO BONAZZOLA >> >> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >> >> Red Hat EMEA >> >> sbonazzo at redhat.com >> >> >> > > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Tue Apr 24 08:15:12 2018 From: bkorren at redhat.com (bkorren at redhat.com) Date: Tue, 24 Apr 2018 08:15:12 +0000 Subject: [ovirt-devel] Updated invitation: oVirt STDCI v2 deep dive @ Thu May 3, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) Message-ID: <000000000000dea0a9056a93c022@google.com> This event has been changed. Title: oVirt STDCI v2 deep dive Introduction to the 2nd version of oVirt's CI standard - What is it, what can it do, how to use it and how does it work. Note: Meeting was moved To join the Meeting: https://bluejeans.com/8705030462 To join via Room System: Video Conferencing System: redhat.bjn.vc -or-199.48.152.18 Meeting ID : 8705030462 To join via phone : 1) Dial: 408-915-6466 (United States) (see all numbers - https://www.redhat.com/en/conference-numbers) 2) Enter Conference ID : 8705030462 RSVP at: https://www.eventbrite.com/e/ovirt-stdci-v2-deep-dive-tickets-45468120372 (changed) When: Thu May 3, 2018 11:00 ? 12:00 Jerusalem (changed) Where: ; https://bluejeans.com/8705030462, raanana-04-asia-8-p-vc (changed) Calendar: devel at ovirt.org Who: * bkorren at redhat.com - organizer * devel at ovirt.org * sbonazzo at redhat.com * dkenigsb at redhat.com * rbarry at redhat.com * mskrivan at redhat.com * eedri at redhat.com * tjelinek at redhat.com * tnisan at redhat.com * mperina at redhat.com * sradco at redhat.com * msivak at redhat.com * sabose at redhat.com Event details: https://www.google.com/calendar/event?action=VIEW&eid=MHBvcmNoYWQ5YmtzNmlmdWw2M25jNzM5djMgZGV2ZWxAb3ZpcnQub3Jn&tok=MTgjYmtvcnJlbkByZWRoYXQuY29tODBiMjViYzFjZmZhYWYzMmJiNmNlNWU3NTA3OGRjOGQwYmJiNTBhOA&ctz=Asia%2FJerusalem&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account devel at ovirt.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to modify your RSVP response. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 3736 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 3799 bytes Desc: not available URL: From msivak at redhat.com Tue Apr 24 08:22:28 2018 From: msivak at redhat.com (Martin Sivak) Date: Tue, 24 Apr 2018 10:22:28 +0200 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) In-Reply-To: References: <0000000000000bdd5f056a7fef2e@google.com> Message-ID: Hi, I did see it already and added it to the team calendar. I wonder if it forwards the confirmations back to you. Martin On Tue, Apr 24, 2018 at 9:27 AM, Sandro Bonazzola wrote: > > > 2018-04-24 6:11 GMT+02:00 Barak Korren : > >> >> >> On 23 April 2018 at 14:36, Sandro Bonazzola wrote: >> >>> From past experience, sending calendar invitation to ovirt mailing lists >>> doesn't work well. see https://lists.ovirt.org/pi >>> permail/users/2018-March/087616.html for reference. >>> >> >> I agree the experience is less then optional - I do seem to get >> confirmation emails, so some people manage to use it anyway. >> >> >>> I would recommend to track this on https://ovirt.org/events/ by adding >>> the event following https://github.com/OSAS/rh-events/wiki/Adding-and- >>> modifying-events >>> >>> >> Are you sure that is the right place for this? that repo seems to list >> events at the scale of conferences... >> > > Well, this is a kind of conference with just 1 talk :-) > > >> >> >>> I would also recommend to send personal invitation to oVirt team leads >>> to be sure they see it. >>> >> >> Sure of you can give me a list... >> > > Sandro Bonazzola - Integration / Release Engineering > Ryan Barry - Node > Michal Skrivanek - Virtualization > Shirly Radco - Metrics / Data Warehouse > Tal Nisan - Storage > Martin Sivak - SLA > Eyal Edri - Project infrastructure > Martin Perina - Infra > Dan Kenigsberg - Network > Sahina Bose - Gluster > Tomas Jelinek - UX > > I'm missing a reference person for other teams listed in > https://ovirt.org/develop/#ovirt-teams: > Docs > I18N > Marketing > Spice > > > > >> >> >>> >>> If you need to track who's going to join, I would recommend a ticketing >>> system like eventbrite. >>> >> >> Not sure I want to force people to use yet another 3rd party platform for >> the benefit of me having some tracking information. Do people prefer this? >> >> >>> >>> 2018-04-23 10:36 GMT+02:00 : >>> >>>> more details ? >>>> >>>> oVirt STDCI v2 deep dive >>>> >>>> *When* >>>> Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem >>>> *Where* >>>> raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 (map >>>> >>>> ) >>>> *Calendar* >>>> devel at ovirt.org >>>> *Who* >>>> ? >>>> bkorren at redhat.com - organizer >>>> ? >>>> devel at ovirt.org >>>> >>>> Introduction to the 2nd version of oVirt's CI standard - What is it, >>>> what can it do, how to use it and how does it work. >>>> >>>> BJ link: >>>> https://bluejeans.com/8705030462 >>>> >>>> >>>> Going? *Yes >>>> >>>> - Maybe >>>> >>>> - No >>>> * >>>> more options ? >>>> >>>> >>>> Invitation from Google Calendar >>>> >>>> You are receiving this courtesy email at the account devel at ovirt.org >>>> because you are an attendee of this event. >>>> >>>> To stop receiving future updates for this event, decline this event. >>>> Alternatively you can sign up for a Google account at >>>> https://www.google.com/calendar/ and control your notification >>>> settings for your entire calendar. >>>> >>>> Forwarding this invitation could allow any recipient to modify your >>>> RSVP response. Learn More >>>> . >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> >>> -- >>> >>> SANDRO BONAZZOLA >>> >>> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >>> >>> Red Hat EMEA >>> >>> sbonazzo at redhat.com >>> >>> >>> >> >> >> >> -- >> Barak Korren >> RHV DevOps team , RHCE, RHCi >> Red Hat EMEA >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >> > > > > -- > > SANDRO BONAZZOLA > > ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D > > Red Hat EMEA > > sbonazzo at redhat.com > > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Tue Apr 24 08:43:03 2018 From: bkorren at redhat.com (Barak Korren) Date: Tue, 24 Apr 2018 11:43:03 +0300 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) In-Reply-To: References: <0000000000000bdd5f056a7fef2e@google.com> Message-ID: On 24 April 2018 at 11:22, Martin Sivak wrote: > Hi, > > I did see it already and added it to the team calendar. I wonder if it > forwards the confirmations back to you. > It forwards some of them back as if they were made by devel at ovirt.org - so not very useful for actual RSVP tracking... Please note that I've just moved it. > > Martin > > On Tue, Apr 24, 2018 at 9:27 AM, Sandro Bonazzola > wrote: > >> >> >> 2018-04-24 6:11 GMT+02:00 Barak Korren : >> >>> >>> >>> On 23 April 2018 at 14:36, Sandro Bonazzola wrote: >>> >>>> From past experience, sending calendar invitation to ovirt mailing >>>> lists doesn't work well. see https://lists.ovirt.org/pi >>>> permail/users/2018-March/087616.html for reference. >>>> >>> >>> I agree the experience is less then optional - I do seem to get >>> confirmation emails, so some people manage to use it anyway. >>> >>> >>>> I would recommend to track this on https://ovirt.org/events/ by adding >>>> the event following https://github.com/OSAS/rh-events/wiki/Adding-and- >>>> modifying-events >>>> >>>> >>> Are you sure that is the right place for this? that repo seems to list >>> events at the scale of conferences... >>> >> >> Well, this is a kind of conference with just 1 talk :-) >> >> >>> >>> >>>> I would also recommend to send personal invitation to oVirt team leads >>>> to be sure they see it. >>>> >>> >>> Sure of you can give me a list... >>> >> >> Sandro Bonazzola - Integration / Release >> Engineering >> Ryan Barry - Node >> Michal Skrivanek - Virtualization >> Shirly Radco - Metrics / Data Warehouse >> Tal Nisan - Storage >> Martin Sivak - SLA >> Eyal Edri - Project infrastructure >> Martin Perina - Infra >> Dan Kenigsberg - Network >> Sahina Bose - Gluster >> Tomas Jelinek - UX >> >> I'm missing a reference person for other teams listed in >> https://ovirt.org/develop/#ovirt-teams: >> Docs >> I18N >> Marketing >> Spice >> >> >> >> >>> >>> >>>> >>>> If you need to track who's going to join, I would recommend a ticketing >>>> system like eventbrite. >>>> >>> >>> Not sure I want to force people to use yet another 3rd party platform >>> for the benefit of me having some tracking information. Do people prefer >>> this? >>> >>> >>>> >>>> 2018-04-23 10:36 GMT+02:00 : >>>> >>>>> more details ? >>>>> >>>>> oVirt STDCI v2 deep dive >>>>> >>>>> *When* >>>>> Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem >>>>> *Where* >>>>> raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 (map >>>>> >>>>> ) >>>>> *Calendar* >>>>> devel at ovirt.org >>>>> *Who* >>>>> ? >>>>> bkorren at redhat.com - organizer >>>>> ? >>>>> devel at ovirt.org >>>>> >>>>> Introduction to the 2nd version of oVirt's CI standard - What is it, >>>>> what can it do, how to use it and how does it work. >>>>> >>>>> BJ link: >>>>> https://bluejeans.com/8705030462 >>>>> >>>>> >>>>> Going? *Yes >>>>> >>>>> - Maybe >>>>> >>>>> - No >>>>> * >>>>> more options ? >>>>> >>>>> >>>>> Invitation from Google Calendar >>>>> >>>>> You are receiving this courtesy email at the account devel at ovirt.org >>>>> because you are an attendee of this event. >>>>> >>>>> To stop receiving future updates for this event, decline this event. >>>>> Alternatively you can sign up for a Google account at >>>>> https://www.google.com/calendar/ and control your notification >>>>> settings for your entire calendar. >>>>> >>>>> Forwarding this invitation could allow any recipient to modify your >>>>> RSVP response. Learn More >>>>> . >>>>> >>>>> _______________________________________________ >>>>> Devel mailing list >>>>> Devel at ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> SANDRO BONAZZOLA >>>> >>>> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >>>> >>>> Red Hat EMEA >>>> >>>> sbonazzo at redhat.com >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Barak Korren >>> RHV DevOps team , RHCE, RHCi >>> Red Hat EMEA >>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>> >> >> >> >> -- >> >> SANDRO BONAZZOLA >> >> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >> >> Red Hat EMEA >> >> sbonazzo at redhat.com >> >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Tue Apr 24 08:44:55 2018 From: bkorren at redhat.com (Barak Korren) Date: Tue, 24 Apr 2018 11:44:55 +0300 Subject: [ovirt-devel] Invitation: oVirt STDCI v2 deep dive @ Thu Apr 26, 2018 11:00 - 12:00 (IDT) (devel@ovirt.org) In-Reply-To: References: <0000000000000bdd5f056a7fef2e@google.com> Message-ID: On 23 April 2018 at 14:43, Greg Sheremeta wrote: > Is this the only meeting planned? 4am US time, I'll have to get up a few > mins early :) > > Terribly sorry, but its really hard to find a time that will be suited to a divers group such as the oVirt developers. The talk will be recorded, and I will consider re-doing it at another time if there is demand. > > On Mon, Apr 23, 2018 at 7:36 AM, Sandro Bonazzola > wrote: > >> From past experience, sending calendar invitation to ovirt mailing lists >> doesn't work well. see https://lists.ovirt.org/pi >> permail/users/2018-March/087616.html for reference. >> >> I would recommend to track this on https://ovirt.org/events/ by adding >> the event following https://github.com/OSAS/rh-events/wiki/Adding-and- >> modifying-events >> >> I would also recommend to send personal invitation to oVirt team leads to >> be sure they see it. >> >> If you need to track who's going to join, I would recommend a ticketing >> system like eventbrite. >> >> 2018-04-23 10:36 GMT+02:00 : >> >>> more details ? >>> >>> oVirt STDCI v2 deep dive >>> >>> *When* >>> Thu Apr 26, 2018 11:00 ? 12:00 Jerusalem >>> *Where* >>> raanana-04-asia-8-p-vc; https://bluejeans.com/8705030462 (map >>> >>> ) >>> *Calendar* >>> devel at ovirt.org >>> *Who* >>> ? >>> bkorren at redhat.com - organizer >>> ? >>> devel at ovirt.org >>> >>> Introduction to the 2nd version of oVirt's CI standard - What is it, >>> what can it do, how to use it and how does it work. >>> >>> BJ link: >>> https://bluejeans.com/8705030462 >>> >>> >>> Going? *Yes >>> >>> - Maybe >>> >>> - No >>> * >>> more options ? >>> >>> >>> Invitation from Google Calendar >>> >>> You are receiving this courtesy email at the account devel at ovirt.org >>> because you are an attendee of this event. >>> >>> To stop receiving future updates for this event, decline this event. >>> Alternatively you can sign up for a Google account at >>> https://www.google.com/calendar/ and control your notification settings >>> for your entire calendar. >>> >>> Forwarding this invitation could allow any recipient to modify your RSVP >>> response. Learn More >>> . >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> >> -- >> >> SANDRO BONAZZOLA >> >> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >> >> Red Hat EMEA >> >> sbonazzo at redhat.com >> >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Tue Apr 24 11:00:04 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Tue, 24 Apr 2018 14:00:04 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Ravi's patch is in, but a similar problem remains, and the test cannot be put back into its place. It seems that while Vdsm was taken down, a couple of getCapsAsync requests queued up. At one point, the host resumed its connection, before the requests have been cleared of the queue. After the host is up, the following tests resume, and at a pseudorandom point in time, an old getCapsAsync request times out and kills our connection. I believe that as long as ANY request is on flight, the monitoring lock should not be released, and the host should not be declared as up. On Wed, Apr 11, 2018 at 1:04 AM, Ravi Shankar Nori wrote: > This [1] should fix the multiple release lock issue > > [1] https://gerrit.ovirt.org/#/c/90077/ > > On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori wrote: >> >> Working on a patch will post a fix >> >> Thanks >> >> Ravi >> >> On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan wrote: >>> >>> Hi all, >>> >>> Looking at the log it seems that the new GetCapabilitiesAsync is >>> responsible for the mess. >>> >>> - 08:29:47 - engine loses connectivity to host >>> 'lago-basic-suite-4-2-host-0'. >>> >>> - Every 3 seconds a getCapabalititiesAsync request is sent to the host >>> (unsuccessfully). >>> >>> * before each "getCapabilitiesAsync" the monitoring lock is taken >>> (VdsManager,refreshImpl) >>> >>> * "getCapabilitiesAsync" immediately fails and throws >>> 'VDSNetworkException: java.net.ConnectException: Connection refused'. The >>> exception is caught by >>> 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls >>> 'onFailure' of the callback and re-throws the exception. >>> >>> catch (Throwable t) { >>> getParameters().getCallback().onFailure(t); >>> throw t; >>> } >>> >>> * The 'onFailure' of the callback releases the "monitoringLock" >>> ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) >>> lockManager.releaseLock(monitoringLock);') >>> >>> * 'VdsManager,refreshImpl' catches the network exception, marks >>> 'releaseLock = true' and tries to release the already released lock. >>> >>> The following warning is printed to the log - >>> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >>> (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to release >>> exclusive lock which does not exist, lock key: >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> >>> - 08:30:51 a successful getCapabilitiesAsync is sent. >>> >>> - 08:32:55 - The failing test starts (Setup Networks for setting ipv6). >>> >>> >>> * SetupNetworks takes the monitoring lock. >>> >>> - 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests >>> from 4 minutes ago from its queue and prints a VDSNetworkException: Vds >>> timeout occured. >>> >>> * When the first request is removed from the queue >>> ('ResponseTracker.remove()'), the 'Callback.onFailure' is invoked (for the >>> second time) -> monitoring lock is released (the lock taken by the >>> SetupNetworks!). >>> >>> * The other requests removed from the queue also try to release the >>> monitoring lock, but there is nothing to release. >>> >>> * The following warning log is printed - >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >>> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to release >>> exclusive lock which does not exist, lock key: >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> - 08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started. >>> Why? I'm not 100% sure but I guess the late processing of the >>> 'getCapabilitiesAsync' that causes losing of the monitoring lock and the >>> late + mupltiple processing of failure is root cause. >>> >>> >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is >>> trying to be released three times. Please share your opinion regarding how >>> it should be fixed. >>> >>> >>> Thanks, >>> >>> Alona. >>> >>> >>> >>> >>> >>> >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg wrote: >>>> >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: >>>>> >>>>> >>>>> >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >>>>>> >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>>>>> Is it still failing? >>>>>> >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>>>>> wrote: >>>>>>> >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg wrote: >>>>>>> > No, I am afraid that we have not managed to understand why setting >>>>>>> > and >>>>>>> > ipv6 address too the host off the grid. We shall continue >>>>>>> > researching >>>>>>> > this next week. >>>>>>> > >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, >>>>>>> > but >>>>>>> > could it possibly be related (I really doubt that)? >>>>>>> > >>>>> >>>>> >>>>> Sorry, but I do not see how this problem is related to VDSM. >>>>> There is nothing that indicates that there is a VDSM problem. >>>>> >>>>> Has the RPC connection between Engine and VDSM failed? >>>>> >>>> >>>> Further up the thread, Piotr noticed that (at least on one failure of >>>> this test) that the Vdsm host lost connectivity to its storage, and Vdsm >>>> process was restarted. However, this does not seems to happen in all cases >>>> where this test fails. >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>> >>> >> > From ykaul at redhat.com Tue Apr 24 12:14:19 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Tue, 24 Apr 2018 15:14:19 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 2:00 PM, Dan Kenigsberg wrote: > Ravi's patch is in, but a similar problem remains, and the test cannot > be put back into its place. > > It seems that while Vdsm was taken down, a couple of getCapsAsync > - Why is VDSM down? Y. > requests queued up. At one point, the host resumed its connection, > before the requests have been cleared of the queue. After the host is > up, the following tests resume, and at a pseudorandom point in time, > an old getCapsAsync request times out and kills our connection. > > I believe that as long as ANY request is on flight, the monitoring > lock should not be released, and the host should not be declared as > up. > > > On Wed, Apr 11, 2018 at 1:04 AM, Ravi Shankar Nori > wrote: > > This [1] should fix the multiple release lock issue > > > > [1] https://gerrit.ovirt.org/#/c/90077/ > > > > On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori > wrote: > >> > >> Working on a patch will post a fix > >> > >> Thanks > >> > >> Ravi > >> > >> On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan > wrote: > >>> > >>> Hi all, > >>> > >>> Looking at the log it seems that the new GetCapabilitiesAsync is > >>> responsible for the mess. > >>> > >>> - 08:29:47 - engine loses connectivity to host > >>> 'lago-basic-suite-4-2-host-0'. > >>> > >>> - Every 3 seconds a getCapabalititiesAsync request is sent to the host > >>> (unsuccessfully). > >>> > >>> * before each "getCapabilitiesAsync" the monitoring lock is taken > >>> (VdsManager,refreshImpl) > >>> > >>> * "getCapabilitiesAsync" immediately fails and throws > >>> 'VDSNetworkException: java.net.ConnectException: Connection refused'. > The > >>> exception is caught by > >>> 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls > >>> 'onFailure' of the callback and re-throws the exception. > >>> > >>> catch (Throwable t) { > >>> getParameters().getCallback().onFailure(t); > >>> throw t; > >>> } > >>> > >>> * The 'onFailure' of the callback releases the "monitoringLock" > >>> ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) > >>> lockManager.releaseLock(monitoringLock);') > >>> > >>> * 'VdsManager,refreshImpl' catches the network exception, marks > >>> 'releaseLock = true' and tries to release the already released lock. > >>> > >>> The following warning is printed to the log - > >>> > >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > >>> (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to > release > >>> exclusive lock which does not exist, lock key: > >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > >>> > >>> > >>> - 08:30:51 a successful getCapabilitiesAsync is sent. > >>> > >>> - 08:32:55 - The failing test starts (Setup Networks for setting ipv6). > >>> > >>> > >>> * SetupNetworks takes the monitoring lock. > >>> > >>> - 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests > >>> from 4 minutes ago from its queue and prints a VDSNetworkException: Vds > >>> timeout occured. > >>> > >>> * When the first request is removed from the queue > >>> ('ResponseTracker.remove()'), the 'Callback.onFailure' is invoked (for > the > >>> second time) -> monitoring lock is released (the lock taken by the > >>> SetupNetworks!). > >>> > >>> * The other requests removed from the queue also try to release > the > >>> monitoring lock, but there is nothing to release. > >>> > >>> * The following warning log is printed - > >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > >>> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to > release > >>> exclusive lock which does not exist, lock key: > >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > >>> > >>> - 08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started. > >>> Why? I'm not 100% sure but I guess the late processing of the > >>> 'getCapabilitiesAsync' that causes losing of the monitoring lock and > the > >>> late + mupltiple processing of failure is root cause. > >>> > >>> > >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is > >>> trying to be released three times. Please share your opinion regarding > how > >>> it should be fixed. > >>> > >>> > >>> Thanks, > >>> > >>> Alona. > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg > wrote: > >>>> > >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: > >>>>> > >>>>> > >>>>> > >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: > >>>>>> > >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. > >>>>>> Is it still failing? > >>>>>> > >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren > >>>>>> wrote: > >>>>>>> > >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg > wrote: > >>>>>>> > No, I am afraid that we have not managed to understand why > setting > >>>>>>> > and > >>>>>>> > ipv6 address too the host off the grid. We shall continue > >>>>>>> > researching > >>>>>>> > this next week. > >>>>>>> > > >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, > >>>>>>> > but > >>>>>>> > could it possibly be related (I really doubt that)? > >>>>>>> > > >>>>> > >>>>> > >>>>> Sorry, but I do not see how this problem is related to VDSM. > >>>>> There is nothing that indicates that there is a VDSM problem. > >>>>> > >>>>> Has the RPC connection between Engine and VDSM failed? > >>>>> > >>>> > >>>> Further up the thread, Piotr noticed that (at least on one failure of > >>>> this test) that the Vdsm host lost connectivity to its storage, and > Vdsm > >>>> process was restarted. However, this does not seems to happen in all > cases > >>>> where this test fails. > >>>> > >>>> _______________________________________________ > >>>> Devel mailing list > >>>> Devel at ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/devel > >>> > >>> > >> > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Tue Apr 24 12:40:13 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Tue, 24 Apr 2018 15:40:13 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 3:14 PM, Yaniv Kaul wrote: > > > On Tue, Apr 24, 2018 at 2:00 PM, Dan Kenigsberg wrote: >> >> Ravi's patch is in, but a similar problem remains, and the test cannot >> be put back into its place. >> >> It seems that while Vdsm was taken down, a couple of getCapsAsync > > > - Why is VDSM down? vdsm_recovery test, I presume. From rnori at redhat.com Tue Apr 24 13:17:43 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 24 Apr 2018 09:17:43 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg wrote: > Ravi's patch is in, but a similar problem remains, and the test cannot > be put back into its place. > > It seems that while Vdsm was taken down, a couple of getCapsAsync > requests queued up. At one point, the host resumed its connection, > before the requests have been cleared of the queue. After the host is > up, the following tests resume, and at a pseudorandom point in time, > an old getCapsAsync request times out and kills our connection. > > I believe that as long as ANY request is on flight, the monitoring > lock should not be released, and the host should not be declared as > up. > > > Hi Dan, Can I have the link to the job on jenkins so I can look at the logs > On Wed, Apr 11, 2018 at 1:04 AM, Ravi Shankar Nori > wrote: > > This [1] should fix the multiple release lock issue > > > > [1] https://gerrit.ovirt.org/#/c/90077/ > > > > On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori > wrote: > >> > >> Working on a patch will post a fix > >> > >> Thanks > >> > >> Ravi > >> > >> On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan > wrote: > >>> > >>> Hi all, > >>> > >>> Looking at the log it seems that the new GetCapabilitiesAsync is > >>> responsible for the mess. > >>> > >>> - 08:29:47 - engine loses connectivity to host > >>> 'lago-basic-suite-4-2-host-0'. > >>> > >>> - Every 3 seconds a getCapabalititiesAsync request is sent to the host > >>> (unsuccessfully). > >>> > >>> * before each "getCapabilitiesAsync" the monitoring lock is taken > >>> (VdsManager,refreshImpl) > >>> > >>> * "getCapabilitiesAsync" immediately fails and throws > >>> 'VDSNetworkException: java.net.ConnectException: Connection refused'. > The > >>> exception is caught by > >>> 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls > >>> 'onFailure' of the callback and re-throws the exception. > >>> > >>> catch (Throwable t) { > >>> getParameters().getCallback().onFailure(t); > >>> throw t; > >>> } > >>> > >>> * The 'onFailure' of the callback releases the "monitoringLock" > >>> ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) > >>> lockManager.releaseLock(monitoringLock);') > >>> > >>> * 'VdsManager,refreshImpl' catches the network exception, marks > >>> 'releaseLock = true' and tries to release the already released lock. > >>> > >>> The following warning is printed to the log - > >>> > >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > >>> (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to > release > >>> exclusive lock which does not exist, lock key: > >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > >>> > >>> > >>> - 08:30:51 a successful getCapabilitiesAsync is sent. > >>> > >>> - 08:32:55 - The failing test starts (Setup Networks for setting ipv6). > >>> > >>> > >>> * SetupNetworks takes the monitoring lock. > >>> > >>> - 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests > >>> from 4 minutes ago from its queue and prints a VDSNetworkException: Vds > >>> timeout occured. > >>> > >>> * When the first request is removed from the queue > >>> ('ResponseTracker.remove()'), the 'Callback.onFailure' is invoked (for > the > >>> second time) -> monitoring lock is released (the lock taken by the > >>> SetupNetworks!). > >>> > >>> * The other requests removed from the queue also try to release > the > >>> monitoring lock, but there is nothing to release. > >>> > >>> * The following warning log is printed - > >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > >>> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to > release > >>> exclusive lock which does not exist, lock key: > >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' > >>> > >>> - 08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started. > >>> Why? I'm not 100% sure but I guess the late processing of the > >>> 'getCapabilitiesAsync' that causes losing of the monitoring lock and > the > >>> late + mupltiple processing of failure is root cause. > >>> > >>> > >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is > >>> trying to be released three times. Please share your opinion regarding > how > >>> it should be fixed. > >>> > >>> > >>> Thanks, > >>> > >>> Alona. > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg > wrote: > >>>> > >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas wrote: > >>>>> > >>>>> > >>>>> > >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: > >>>>>> > >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. > >>>>>> Is it still failing? > >>>>>> > >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren > >>>>>> wrote: > >>>>>>> > >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg > wrote: > >>>>>>> > No, I am afraid that we have not managed to understand why > setting > >>>>>>> > and > >>>>>>> > ipv6 address too the host off the grid. We shall continue > >>>>>>> > researching > >>>>>>> > this next week. > >>>>>>> > > >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks old, > >>>>>>> > but > >>>>>>> > could it possibly be related (I really doubt that)? > >>>>>>> > > >>>>> > >>>>> > >>>>> Sorry, but I do not see how this problem is related to VDSM. > >>>>> There is nothing that indicates that there is a VDSM problem. > >>>>> > >>>>> Has the RPC connection between Engine and VDSM failed? > >>>>> > >>>> > >>>> Further up the thread, Piotr noticed that (at least on one failure of > >>>> this test) that the Vdsm host lost connectivity to its storage, and > Vdsm > >>>> process was restarted. However, this does not seems to happen in all > cases > >>>> where this test fails. > >>>> > >>>> _______________________________________________ > >>>> Devel mailing list > >>>> Devel at ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/devel > >>> > >>> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mperina at redhat.com Tue Apr 24 13:24:10 2018 From: mperina at redhat.com (Martin Perina) Date: Tue, 24 Apr 2018 15:24:10 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori wrote: > > > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg wrote: > >> Ravi's patch is in, but a similar problem remains, and the test cannot >> be put back into its place. >> >> It seems that while Vdsm was taken down, a couple of getCapsAsync >> requests queued up. At one point, the host resumed its connection, >> before the requests have been cleared of the queue. After the host is >> up, the following tests resume, and at a pseudorandom point in time, >> an old getCapsAsync request times out and kills our connection. >> >> I believe that as long as ANY request is on flight, the monitoring >> lock should not be released, and the host should not be declared as >> up. >> >> >> > > Hi Dan, > > Can I have the link to the job on jenkins so I can look at the logs > ?http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ ? > > > >> On Wed, Apr 11, 2018 at 1:04 AM, Ravi Shankar Nori >> wrote: >> > This [1] should fix the multiple release lock issue >> > >> > [1] https://gerrit.ovirt.org/#/c/90077/ >> > >> > On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori >> wrote: >> >> >> >> Working on a patch will post a fix >> >> >> >> Thanks >> >> >> >> Ravi >> >> >> >> On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan >> wrote: >> >>> >> >>> Hi all, >> >>> >> >>> Looking at the log it seems that the new GetCapabilitiesAsync is >> >>> responsible for the mess. >> >>> >> >>> - 08:29:47 - engine loses connectivity to host >> >>> 'lago-basic-suite-4-2-host-0'. >> >>> >> >>> - Every 3 seconds a getCapabalititiesAsync request is sent to the host >> >>> (unsuccessfully). >> >>> >> >>> * before each "getCapabilitiesAsync" the monitoring lock is taken >> >>> (VdsManager,refreshImpl) >> >>> >> >>> * "getCapabilitiesAsync" immediately fails and throws >> >>> 'VDSNetworkException: java.net.ConnectException: Connection refused'. >> The >> >>> exception is caught by >> >>> 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls >> >>> 'onFailure' of the callback and re-throws the exception. >> >>> >> >>> catch (Throwable t) { >> >>> getParameters().getCallback().onFailure(t); >> >>> throw t; >> >>> } >> >>> >> >>> * The 'onFailure' of the callback releases the "monitoringLock" >> >>> ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) >> >>> lockManager.releaseLock(monitoringLock);') >> >>> >> >>> * 'VdsManager,refreshImpl' catches the network exception, marks >> >>> 'releaseLock = true' and tries to release the already released lock. >> >>> >> >>> The following warning is printed to the log - >> >>> >> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >> >>> (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to >> release >> >>> exclusive lock which does not exist, lock key: >> >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >>> >> >>> >> >>> - 08:30:51 a successful getCapabilitiesAsync is sent. >> >>> >> >>> - 08:32:55 - The failing test starts (Setup Networks for setting >> ipv6). >> >>> >> >>> >> >>> * SetupNetworks takes the monitoring lock. >> >>> >> >>> - 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests >> >>> from 4 minutes ago from its queue and prints a VDSNetworkException: >> Vds >> >>> timeout occured. >> >>> >> >>> * When the first request is removed from the queue >> >>> ('ResponseTracker.remove()'), the 'Callback.onFailure' is invoked >> (for the >> >>> second time) -> monitoring lock is released (the lock taken by the >> >>> SetupNetworks!). >> >>> >> >>> * The other requests removed from the queue also try to release >> the >> >>> monitoring lock, but there is nothing to release. >> >>> >> >>> * The following warning log is printed - >> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >> >>> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to >> release >> >>> exclusive lock which does not exist, lock key: >> >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >> >>> >> >>> - 08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is >> started. >> >>> Why? I'm not 100% sure but I guess the late processing of the >> >>> 'getCapabilitiesAsync' that causes losing of the monitoring lock and >> the >> >>> late + mupltiple processing of failure is root cause. >> >>> >> >>> >> >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is >> >>> trying to be released three times. Please share your opinion >> regarding how >> >>> it should be fixed. >> >>> >> >>> >> >>> Thanks, >> >>> >> >>> Alona. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg >> wrote: >> >>>> >> >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas >> wrote: >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri wrote: >> >>>>>> >> >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >> >>>>>> Is it still failing? >> >>>>>> >> >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg >> wrote: >> >>>>>>> > No, I am afraid that we have not managed to understand why >> setting >> >>>>>>> > and >> >>>>>>> > ipv6 address too the host off the grid. We shall continue >> >>>>>>> > researching >> >>>>>>> > this next week. >> >>>>>>> > >> >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks >> old, >> >>>>>>> > but >> >>>>>>> > could it possibly be related (I really doubt that)? >> >>>>>>> > >> >>>>> >> >>>>> >> >>>>> Sorry, but I do not see how this problem is related to VDSM. >> >>>>> There is nothing that indicates that there is a VDSM problem. >> >>>>> >> >>>>> Has the RPC connection between Engine and VDSM failed? >> >>>>> >> >>>> >> >>>> Further up the thread, Piotr noticed that (at least on one failure of >> >>>> this test) that the Vdsm host lost connectivity to its storage, and >> Vdsm >> >>>> process was restarted. However, this does not seems to happen in all >> cases >> >>>> where this test fails. >> >>>> >> >>>> _______________________________________________ >> >>>> Devel mailing list >> >>>> Devel at ovirt.org >> >>>> http://lists.ovirt.org/mailman/listinfo/devel >> >>> >> >>> >> >> >> > >> > > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Tue Apr 24 13:28:53 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Tue, 24 Apr 2018 16:28:53 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori wrote: > > > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg wrote: >> >> Ravi's patch is in, but a similar problem remains, and the test cannot >> be put back into its place. >> >> It seems that while Vdsm was taken down, a couple of getCapsAsync >> requests queued up. At one point, the host resumed its connection, >> before the requests have been cleared of the queue. After the host is >> up, the following tests resume, and at a pseudorandom point in time, >> an old getCapsAsync request times out and kills our connection. >> >> I believe that as long as ANY request is on flight, the monitoring >> lock should not be released, and the host should not be declared as >> up. >> >> > > > Hi Dan, > > Can I have the link to the job on jenkins so I can look at the logs We disabled a network test that started failing after getCapsAsync was merged. Please own its re-introduction to OST: https://gerrit.ovirt.org/#/c/90264/ Its most recent failure http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ has been discussed by Alona and Piotr over IRC. From rnori at redhat.com Tue Apr 24 13:36:56 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 24 Apr 2018 09:36:56 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina wrote: > > > On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori > wrote: > >> >> >> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >> wrote: >> >>> Ravi's patch is in, but a similar problem remains, and the test cannot >>> be put back into its place. >>> >>> It seems that while Vdsm was taken down, a couple of getCapsAsync >>> requests queued up. At one point, the host resumed its connection, >>> before the requests have been cleared of the queue. After the host is >>> up, the following tests resume, and at a pseudorandom point in time, >>> an old getCapsAsync request times out and kills our connection. >>> >>> I believe that as long as ANY request is on flight, the monitoring >>> lock should not be released, and the host should not be declared as >>> up. >>> >>> >>> >> >> Hi Dan, >> >> Can I have the link to the job on jenkins so I can look at the logs >> > > ?http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ > ? > > >From the logs the only VDS lock that is being released twice is VDS_FENCE lock. Opened a BZ [1] for it. Will post a fix [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 > >> >> >>> On Wed, Apr 11, 2018 at 1:04 AM, Ravi Shankar Nori >>> wrote: >>> > This [1] should fix the multiple release lock issue >>> > >>> > [1] https://gerrit.ovirt.org/#/c/90077/ >>> > >>> > On Tue, Apr 10, 2018 at 3:53 PM, Ravi Shankar Nori >>> wrote: >>> >> >>> >> Working on a patch will post a fix >>> >> >>> >> Thanks >>> >> >>> >> Ravi >>> >> >>> >> On Tue, Apr 10, 2018 at 9:14 AM, Alona Kaplan >>> wrote: >>> >>> >>> >>> Hi all, >>> >>> >>> >>> Looking at the log it seems that the new GetCapabilitiesAsync is >>> >>> responsible for the mess. >>> >>> >>> >>> - 08:29:47 - engine loses connectivity to host >>> >>> 'lago-basic-suite-4-2-host-0'. >>> >>> >>> >>> - Every 3 seconds a getCapabalititiesAsync request is sent to the >>> host >>> >>> (unsuccessfully). >>> >>> >>> >>> * before each "getCapabilitiesAsync" the monitoring lock is >>> taken >>> >>> (VdsManager,refreshImpl) >>> >>> >>> >>> * "getCapabilitiesAsync" immediately fails and throws >>> >>> 'VDSNetworkException: java.net.ConnectException: Connection >>> refused'. The >>> >>> exception is caught by >>> >>> 'GetCapabilitiesAsyncVDSCommand.executeVdsBrokerCommand' which calls >>> >>> 'onFailure' of the callback and re-throws the exception. >>> >>> >>> >>> catch (Throwable t) { >>> >>> getParameters().getCallback().onFailure(t); >>> >>> throw t; >>> >>> } >>> >>> >>> >>> * The 'onFailure' of the callback releases the "monitoringLock" >>> >>> ('postProcessRefresh()->afterRefreshTreatment()-> if (!succeeded) >>> >>> lockManager.releaseLock(monitoringLock);') >>> >>> >>> >>> * 'VdsManager,refreshImpl' catches the network exception, marks >>> >>> 'releaseLock = true' and tries to release the already released lock. >>> >>> >>> >>> The following warning is printed to the log - >>> >>> >>> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >>> >>> (EE-ManagedThreadFactory-engineScheduled-Thread-53) [] Trying to >>> release >>> >>> exclusive lock which does not exist, lock key: >>> >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> >>> >>> >>> >>> - 08:30:51 a successful getCapabilitiesAsync is sent. >>> >>> >>> >>> - 08:32:55 - The failing test starts (Setup Networks for setting >>> ipv6). >>> >>> >>> >>> >>> >>> * SetupNetworks takes the monitoring lock. >>> >>> >>> >>> - 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests >>> >>> from 4 minutes ago from its queue and prints a VDSNetworkException: >>> Vds >>> >>> timeout occured. >>> >>> >>> >>> * When the first request is removed from the queue >>> >>> ('ResponseTracker.remove()'), the 'Callback.onFailure' is invoked >>> (for the >>> >>> second time) -> monitoring lock is released (the lock taken by the >>> >>> SetupNetworks!). >>> >>> >>> >>> * The other requests removed from the queue also try to >>> release the >>> >>> monitoring lock, but there is nothing to release. >>> >>> >>> >>> * The following warning log is printed - >>> >>> WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >>> >>> (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] Trying to >>> release >>> >>> exclusive lock which does not exist, lock key: >>> >>> 'ecf53d69-eb68-4b11-8df2-c4aa4e19bd93VDS_INIT' >>> >>> >>> >>> - 08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is >>> started. >>> >>> Why? I'm not 100% sure but I guess the late processing of the >>> >>> 'getCapabilitiesAsync' that causes losing of the monitoring lock and >>> the >>> >>> late + mupltiple processing of failure is root cause. >>> >>> >>> >>> >>> >>> Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is >>> >>> trying to be released three times. Please share your opinion >>> regarding how >>> >>> it should be fixed. >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Alona. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg >>> wrote: >>> >>>> >>> >>>> On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas >>> wrote: >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri >>> wrote: >>> >>>>>> >>> >>>>>> Was already done by Yaniv - https://gerrit.ovirt.org/#/c/89851. >>> >>>>>> Is it still failing? >>> >>>>>> >>> >>>>>> On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren >>> >>>>>> wrote: >>> >>>>>>> >>> >>>>>>> On 7 April 2018 at 00:30, Dan Kenigsberg >>> wrote: >>> >>>>>>> > No, I am afraid that we have not managed to understand why >>> setting >>> >>>>>>> > and >>> >>>>>>> > ipv6 address too the host off the grid. We shall continue >>> >>>>>>> > researching >>> >>>>>>> > this next week. >>> >>>>>>> > >>> >>>>>>> > Edy, https://gerrit.ovirt.org/#/c/88637/ is already 4 weeks >>> old, >>> >>>>>>> > but >>> >>>>>>> > could it possibly be related (I really doubt that)? >>> >>>>>>> > >>> >>>>> >>> >>>>> >>> >>>>> Sorry, but I do not see how this problem is related to VDSM. >>> >>>>> There is nothing that indicates that there is a VDSM problem. >>> >>>>> >>> >>>>> Has the RPC connection between Engine and VDSM failed? >>> >>>>> >>> >>>> >>> >>>> Further up the thread, Piotr noticed that (at least on one failure >>> of >>> >>>> this test) that the Vdsm host lost connectivity to its storage, and >>> Vdsm >>> >>>> process was restarted. However, this does not seems to happen in >>> all cases >>> >>>> where this test fails. >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Devel mailing list >>> >>>> Devel at ovirt.org >>> >>>> http://lists.ovirt.org/mailman/listinfo/devel >>> >>> >>> >>> >>> >> >>> > >>> >> >> > > > -- > Martin Perina > Associate Manager, Software Engineering > Red Hat Czech s.r.o. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Tue Apr 24 13:47:17 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Tue, 24 Apr 2018 16:47:17 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori wrote: > > > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina wrote: >> >> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori >> wrote: >>> >>> >>> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >>> wrote: >>>> >>>> Ravi's patch is in, but a similar problem remains, and the test cannot >>>> be put back into its place. >>>> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync >>>> requests queued up. At one point, the host resumed its connection, >>>> before the requests have been cleared of the queue. After the host is >>>> up, the following tests resume, and at a pseudorandom point in time, >>>> an old getCapsAsync request times out and kills our connection. >>>> >>>> I believe that as long as ANY request is on flight, the monitoring >>>> lock should not be released, and the host should not be declared as >>>> up. Would you relate to this analysis ^^^ ? >>>> >>>> >>> >>> >>> Hi Dan, >>> >>> Can I have the link to the job on jenkins so I can look at the logs >> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >> > > > From the logs the only VDS lock that is being released twice is VDS_FENCE > lock. Opened a BZ [1] for it. Will post a fix > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 Can this possibly cause a surprise termination of host connection? From rnori at redhat.com Tue Apr 24 14:09:12 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 24 Apr 2018 10:09:12 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg wrote: > On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori > wrote: > > > > > > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina > wrote: > >> > >> > >> > >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori > >> wrote: > >>> > >>> > >>> > >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg > >>> wrote: > >>>> > >>>> Ravi's patch is in, but a similar problem remains, and the test cannot > >>>> be put back into its place. > >>>> > >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync > >>>> requests queued up. At one point, the host resumed its connection, > >>>> before the requests have been cleared of the queue. After the host is > >>>> up, the following tests resume, and at a pseudorandom point in time, > >>>> an old getCapsAsync request times out and kills our connection. > >>>> > >>>> I believe that as long as ANY request is on flight, the monitoring > >>>> lock should not be released, and the host should not be declared as > >>>> up. > > Would you relate to this analysis ^^^ ? > > The HostMonitoring lock issue has been fixed by https://gerrit.ovirt.org/#/c/90189/ > >>>> > >>>> > >>> > >>> > >>> Hi Dan, > >>> > >>> Can I have the link to the job on jenkins so I can look at the logs > >> > >> > >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard- > check-patch/346/ > >> > > > > > > From the logs the only VDS lock that is being released twice is VDS_FENCE > > lock. Opened a BZ [1] for it. Will post a fix > > > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 > > Can this possibly cause a surprise termination of host connection? > Not sure, from the logs VDS_FENCE is the only other VDS lock that is being released -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Tue Apr 24 14:29:48 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Tue, 24 Apr 2018 17:29:48 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori wrote: > > > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg wrote: >> >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori >> wrote: >> > >> > >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina >> > wrote: >> >> >> >> >> >> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori >> >> wrote: >> >>> >> >>> >> >>> >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >> >>> wrote: >> >>>> >> >>>> Ravi's patch is in, but a similar problem remains, and the test >> >>>> cannot >> >>>> be put back into its place. >> >>>> >> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync >> >>>> requests queued up. At one point, the host resumed its connection, >> >>>> before the requests have been cleared of the queue. After the host is >> >>>> up, the following tests resume, and at a pseudorandom point in time, >> >>>> an old getCapsAsync request times out and kills our connection. >> >>>> >> >>>> I believe that as long as ANY request is on flight, the monitoring >> >>>> lock should not be released, and the host should not be declared as >> >>>> up. >> >> Would you relate to this analysis ^^^ ? >> > > The HostMonitoring lock issue has been fixed by > https://gerrit.ovirt.org/#/c/90189/ Is there still a chance that a host moves to Up while former getCapsAsync request are still in-flight? > >> >> >>>> >> >>>> >> >>> >> >>> >> >>> Hi Dan, >> >>> >> >>> Can I have the link to the job on jenkins so I can look at the logs >> >> >> >> >> >> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >> >> >> > >> > >> > From the logs the only VDS lock that is being released twice is >> > VDS_FENCE >> > lock. Opened a BZ [1] for it. Will post a fix >> > >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 >> >> Can this possibly cause a surprise termination of host connection? > > > Not sure, from the logs VDS_FENCE is the only other VDS lock that is being > released From rnori at redhat.com Tue Apr 24 14:46:46 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 24 Apr 2018 10:46:46 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg wrote: > On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori > wrote: > > > > > > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg > wrote: > >> > >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori > >> wrote: > >> > > >> > > >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina > >> > wrote: > >> >> > >> >> > >> >> > >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori > > >> >> wrote: > >> >>> > >> >>> > >> >>> > >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg > >> >>> wrote: > >> >>>> > >> >>>> Ravi's patch is in, but a similar problem remains, and the test > >> >>>> cannot > >> >>>> be put back into its place. > >> >>>> > >> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync > >> >>>> requests queued up. At one point, the host resumed its connection, > >> >>>> before the requests have been cleared of the queue. After the host > is > >> >>>> up, the following tests resume, and at a pseudorandom point in > time, > >> >>>> an old getCapsAsync request times out and kills our connection. > >> >>>> > >> >>>> I believe that as long as ANY request is on flight, the monitoring > >> >>>> lock should not be released, and the host should not be declared as > >> >>>> up. > >> > >> Would you relate to this analysis ^^^ ? > >> > > > > The HostMonitoring lock issue has been fixed by > > https://gerrit.ovirt.org/#/c/90189/ > > Is there still a chance that a host moves to Up while former > getCapsAsync request are still in-flight? > > Should not happen. Is there a way to execute/reproduce the failing test on Dev env? > > > >> > >> >>>> > >> >>>> > >> >>> > >> >>> > >> >>> Hi Dan, > >> >>> > >> >>> Can I have the link to the job on jenkins so I can look at the logs > >> >> > >> >> > >> >> > >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard- > check-patch/346/ > >> >> > >> > > >> > > >> > From the logs the only VDS lock that is being released twice is > >> > VDS_FENCE > >> > lock. Opened a BZ [1] for it. Will post a fix > >> > > >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 > >> > >> Can this possibly cause a surprise termination of host connection? > > > > > > Not sure, from the logs VDS_FENCE is the only other VDS lock that is > being > > released > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnori at redhat.com Tue Apr 24 16:00:21 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 24 Apr 2018 12:00:21 -0400 Subject: [ovirt-devel] Issues on master with releasing a lock Message-ID: Hi All, We have some issues with a few flows on master where the lock acquired is being released more than once. This could be a real problem in cases where the lock is prematurely released for a second command that had acquired the lock which the first command has released. For instance 1. Command A has acquired the VM lock and executes a child command. 2. The child command releases the lock 3. Command B now acquires the VM lock 4. Command A continues execution and finally releases the lock the second time 5. The Command B's lock has now been released prematurely by Command A. The tell tale sign indicating that a flow is one of the offending flows is the warning message in the logs indicating that a previously released lock is being released again. [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-6) [1435f925] Trying to release exclusive lock which does not exist, lock key: '53cfa6c3-ecd6-4796-84cb-5c53d7c2a77fVDS_FENCE' I have seen issues with Live Merge [1], Moving storage domain to maintenance[2] , VdsNotRespondingTreatmentCommand [3] and Creating a snapshot [4]. I have submitted some patches to fix the issue with Live Merge [5], Moving storage domain to maintenance [6] and VdsNotRespondingTreatmentCommand [7]. The multiple lock release issue needs to fixed and if you are the owner of a flow please make sure the flow does not log a warning message to the logs. There is no single solution to fix the issue, this has be done flow by flow. But below are a few pointers 1. If a command has a callback and the command is executing a child command, which in turn is executing a few child commands. The child command should also have a command callback so that the locks are released by the framework. Example patch [6] 2. If a parent has acquired the lock and is executing a child command with cloned context, to make sure that the child does not release the locks acquired by the parent pass the cloned parent context with out locks to the child. Example patch [7] Thanks Ravi [1] https://bugzilla.redhat.com/1568556 [2] https://bugzilla.redhat.com/1568447 [3] https://bugzilla.redhat.com/1571300 [4] https://bugzilla.redhat.com/1569625 [5] https://gerrit.ovirt.org/#/c/90482/ [6] https://gerrit.ovirt.org/#/c/90411/ [7] https://gerrit.ovirt.org/#/c/90568/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnori at redhat.com Tue Apr 24 19:27:00 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Tue, 24 Apr 2018 15:27:00 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 10:46 AM, Ravi Shankar Nori wrote: > > > On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg > wrote: > >> On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori >> wrote: >> > >> > >> > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg >> wrote: >> >> >> >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori >> >> wrote: >> >> > >> >> > >> >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina >> >> > wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori < >> rnori at redhat.com> >> >> >> wrote: >> >> >>> >> >> >>> >> >> >>> >> >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg > > >> >> >>> wrote: >> >> >>>> >> >> >>>> Ravi's patch is in, but a similar problem remains, and the test >> >> >>>> cannot >> >> >>>> be put back into its place. >> >> >>>> >> >> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync >> >> >>>> requests queued up. At one point, the host resumed its connection, >> >> >>>> before the requests have been cleared of the queue. After the >> host is >> >> >>>> up, the following tests resume, and at a pseudorandom point in >> time, >> >> >>>> an old getCapsAsync request times out and kills our connection. >> >> >>>> >> >> >>>> I believe that as long as ANY request is on flight, the monitoring >> >> >>>> lock should not be released, and the host should not be declared >> as >> >> >>>> up. >> >> >> >> Would you relate to this analysis ^^^ ? >> >> >> > >> > The HostMonitoring lock issue has been fixed by >> > https://gerrit.ovirt.org/#/c/90189/ >> >> Is there still a chance that a host moves to Up while former >> getCapsAsync request are still in-flight? >> >> > Should not happen. Is there a way to execute/reproduce the failing test on > Dev env? > > >> > >> >> >> >> >>>> >> >> >>>> >> >> >>> >> >> >>> >> >> >>> Hi Dan, >> >> >>> >> >> >>> Can I have the link to the job on jenkins so I can look at the logs >> >> >> >> >> >> >> >> >> >> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-che >> ck-patch/346/ >> >> >> >> >> > >> >> > >> >> > From the logs the only VDS lock that is being released twice is >> >> > VDS_FENCE >> >> > lock. Opened a BZ [1] for it. Will post a fix >> >> > >> >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 >> >> >> >> Can this possibly cause a surprise termination of host connection? >> > >> > >> > Not sure, from the logs VDS_FENCE is the only other VDS lock that is >> being >> > released >> > > Would be helpful if I can get the exact flow that is failing and also the steps if any needed to reproduce the issue -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Wed Apr 25 06:06:39 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Wed, 25 Apr 2018 09:06:39 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 10:27 PM, Ravi Shankar Nori wrote: > > > On Tue, Apr 24, 2018 at 10:46 AM, Ravi Shankar Nori > wrote: >> >> >> >> On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg >> wrote: >>> >>> On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori >>> wrote: >>> > >>> > >>> > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg >>> > wrote: >>> >> >>> >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori >>> >> wrote: >>> >> > >>> >> > >>> >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina >>> >> > wrote: >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori >>> >> >> >>> >> >> wrote: >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >>> >> >>> >>> >> >>> wrote: >>> >> >>>> >>> >> >>>> Ravi's patch is in, but a similar problem remains, and the test >>> >> >>>> cannot >>> >> >>>> be put back into its place. >>> >> >>>> >>> >> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync >>> >> >>>> requests queued up. At one point, the host resumed its >>> >> >>>> connection, >>> >> >>>> before the requests have been cleared of the queue. After the >>> >> >>>> host is >>> >> >>>> up, the following tests resume, and at a pseudorandom point in >>> >> >>>> time, >>> >> >>>> an old getCapsAsync request times out and kills our connection. >>> >> >>>> >>> >> >>>> I believe that as long as ANY request is on flight, the >>> >> >>>> monitoring >>> >> >>>> lock should not be released, and the host should not be declared >>> >> >>>> as >>> >> >>>> up. >>> >> >>> >> Would you relate to this analysis ^^^ ? >>> >> >>> > >>> > The HostMonitoring lock issue has been fixed by >>> > https://gerrit.ovirt.org/#/c/90189/ >>> >>> Is there still a chance that a host moves to Up while former >>> getCapsAsync request are still in-flight? >>> >> >> Should not happen. Is there a way to execute/reproduce the failing test on >> Dev env? >> >>> >>> > >>> >> >>> >> >>>> >>> >> >>>> >>> >> >>> >>> >> >>> >>> >> >>> Hi Dan, >>> >> >>> >>> >> >>> Can I have the link to the job on jenkins so I can look at the >>> >> >>> logs >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >>> >> >> >>> >> > >>> >> > >>> >> > From the logs the only VDS lock that is being released twice is >>> >> > VDS_FENCE >>> >> > lock. Opened a BZ [1] for it. Will post a fix >>> >> > >>> >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300 >>> >> >>> >> Can this possibly cause a surprise termination of host connection? >>> > >>> > >>> > Not sure, from the logs VDS_FENCE is the only other VDS lock that is >>> > being >>> > released >> >> > > Would be helpful if I can get the exact flow that is failing and also the > steps if any needed to reproduce the issue By now the logs of http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ have been garbage-collected, so I cannot point you to the location in the logs. Maybe Alona has a local copy. According to her analysis the issue manifest itself when setupNetworks follows vdsm restart. Have you tried running OST with prepare_migration_attachments_ipv6 reintroduced? It should always pass. Regards, Dan. From ebenahar at redhat.com Wed Apr 25 07:12:29 2018 From: ebenahar at redhat.com (Elad Ben Aharon) Date: Wed, 25 Apr 2018 10:12:29 +0300 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> <20180423125653.GA19987@Alexandra.local> Message-ID: Here it is - https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ rhv-4.2-ge-runner-tier1/122/ This was executed over iscsi, nfs, gluster and fcp. It ended up with 98.43% success rate. 2 storage related test cases failed and their failures, from first glance, don't seem to be bugs. On Tue, Apr 24, 2018 at 12:37 AM, Elad Ben Aharon wrote: > I will update with the results of the next tier1 execution on latest > 4.2.3 > > On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik > wrote: > >> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote: >> >>> Hi, I've triggered another execution [1] due to some issues I saw in the >>> first which are not related to the patch. >>> >>> The success rate is 78% which is low comparing to tier1 executions with >>> code from downstream builds (95-100% success rates) [2]. >>> >> >> Could you run the current master (without the dynamic_ownership patch) >> so that we have viable comparision? >> >> From what I could see so far, there is an issue with move and copy >>> operations to and from Gluster domains. For example [3]. >>> >>> The logs are attached. >>> >>> >>> [1] >>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv >>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/ >>> >> -4.2-ge-runner-tier1-after-upgrade/7/testReport/>* >>> >>> >>> >>> [2] >>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ >>> >>> rhv-4.2-ge-runner-tier1-after-upgrade/7/ >>> >>> >>> >>> [3] >>> 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH >>> deleteImage error=Image does not exist in domain: >>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >>> from=: >>> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935, >>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) >>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] >>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error >>> (task:875) >>> Traceback (most recent call last): >>> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, >>> in >>> _run >>> return fn(*args, **kargs) >>> File "", line 2, in deleteImage >>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in >>> method >>> ret = func(*args, **kwargs) >>> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, >>> in >>> deleteImage >>> raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) >>> ImageDoesNotExistInSD: Image does not exist in domain: >>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >>> >>> 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task] >>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted: >>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835- >>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268 >>> (task:1181) >>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] >>> FINISH >>> deleteImage error=Image does not exist in domain: >>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>> domain=e5fd29c8-52ba-467e-be09 >>> -ca40ff054d >>> d4' (dispatcher:82) >>> >>> >>> >>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon >>> wrote: >>> >>> Triggered a sanity tier1 execution [1] using [2], which covers all the >>>> requested areas, on iSCSI, NFS and Gluster. >>>> I'll update with the results. >>>> >>>> [1] >>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 >>>> _dev/job/rhv-4.2-ge-flow-storage/1161/ >>>> >>>> [2] >>>> https://gerrit.ovirt.org/#/c/89830/ >>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64 >>>> >>>> >>>> >>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik >>>> wrote: >>>> >>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >>>>> >>>>> Hi Martin, >>>>>> >>>>>> I see [1] requires a rebase, can you please take care? >>>>>> >>>>>> >>>>> Should be rebased. >>>>> >>>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and >>>>> >>>>>> FC. >>>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's >>>>>> not >>>>>> stable enough at the moment. >>>>>> >>>>>> >>>>> That is still pretty good. >>>>> >>>>> >>>>> [1] https://gerrit.ovirt.org/#/c/89830/ >>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik < >>>>>> mpolednik at redhat.com> >>>>>> wrote: >>>>>> >>>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>>>> >>>>>>> >>>>>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>>>>> >>>>>>>> areas >>>>>>>> have to be tested here. >>>>>>>> >>>>>>>> >>>>>>>> I'd say that you have quite a bit of freedom in this regard. >>>>>>> GlusterFS >>>>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>>>>>> that covers basic operations (start & stop VM, migrate it), snapshots >>>>>>> and merging them, and whatever else would be important for storage >>>>>>> sanity. >>>>>>> >>>>>>> mpolednik >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik < >>>>>>> mpolednik at redhat.com >>>>>>> > >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>>>>>> >>>>>>>> >>>>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and >>>>>>>>> cinder, >>>>>>>>> >>>>>>>>> will >>>>>>>>>> have to check, since usually, we don't execute our automation on >>>>>>>>>> them. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any update on this? I believe the gluster tests were successful, >>>>>>>>>> OST >>>>>>>>>> >>>>>>>>> passes fine and unit tests pass fine, that makes the storage >>>>>>>>> backends >>>>>>>>> test the last required piece. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> +Elad >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg < >>>>>>>>>>> danken at redhat.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>>>>> > >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>>>>>> possible >>>>>>>>>>>>> >>>>>>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>>>>>> >>>>>>>>>>>>> Nir >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>>>>>> mpolednik at redhat.com >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>>>>>> libvirt's >>>>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>>>>>> functionality of our storage code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That of course comes with quite a bit of code removal, >>>>>>>>>>>>>>> mostly in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> area of host devices, hwrng and anything that touches >>>>>>>>>>>>>>> devices; >>>>>>>>>>>>>>> bunch >>>>>>>>>>>>>>> of test changes and one XML generation caveat (storage is >>>>>>>>>>>>>>> handled >>>>>>>>>>>>>>> by >>>>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the >>>>>>>>>>>>>>> VDSM >>>>>>>>>>>>>>> level). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Because of the scope of the patch, I welcome >>>>>>>>>>>>>>> storage/virt/network >>>>>>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>>>>>> change >>>>>>>>>>>>>>> has >>>>>>>>>>>>>>> on current/future features. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In particular: dynamic_ownership was set to 0 >>>>>>>>>>>>>>> prehistorically >>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> part >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) >>>>>>>>>>>> because >>>>>>>>>>>> libvirt, >>>>>>>>>>>> running as root, was not able to play properly with root-squash >>>>>>>>>>>> nfs >>>>>>>>>>>> mounts. >>>>>>>>>>>> >>>>>>>>>>>> Have you attempted this use case? >>>>>>>>>>>> >>>>>>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Raz Tamir >>>>>>>>>>> Manager, RHV QE >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bzlotnik at redhat.com Wed Apr 25 09:11:51 2018 From: bzlotnik at redhat.com (Benny Zlotnik) Date: Wed, 25 Apr 2018 12:11:51 +0300 Subject: [ovirt-devel] Issues on master with releasing a lock In-Reply-To: References: Message-ID: I've submitted a patch for LSM: https://gerrit.ovirt.org/#/c/90625/ On Tue, Apr 24, 2018 at 7:00 PM, Ravi Shankar Nori wrote: > Hi All, > > We have some issues with a few flows on master where the lock acquired is > being > released more than once. This could be a real problem in cases where the > lock is > prematurely released for a second command that had acquired the lock which > the first > command has released. > > For instance > > 1. Command A has acquired the VM lock and executes a child command. > 2. The child command releases the lock > 3. Command B now acquires the VM lock > 4. Command A continues execution and finally releases the lock the second > time > 5. The Command B's lock has now been released prematurely by Command A. > > The tell tale sign indicating that a flow is one of the offending flows is > the warning > message in the logs indicating that a previously released lock is being > released again. > > [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-6) [1435f925] Trying to release exclusive lock which does not exist, lock key: '53cfa6c3-ecd6-4796-84cb-5c53d7c2a77fVDS_FENCE' > > > I have seen issues with Live Merge [1], Moving storage domain to > maintenance[2] , > VdsNotRespondingTreatmentCommand [3] and Creating a snapshot [4]. > > I have submitted some patches to fix the issue with Live Merge [5], Moving > storage domain to maintenance [6] and VdsNotRespondingTreatmentCommand > [7]. > > The multiple lock release issue needs to fixed and if you are the owner of > a flow please > make sure the flow does not log a warning message to the logs. There is no > single > solution to fix the issue, this has be done flow by flow. But below are a > few pointers > > 1. If a command has a callback and the command is executing a child > command, which > in turn is executing a few child commands. The child command should also > have a > command callback so that the locks are released by the framework. Example > patch [6] > > 2. If a parent has acquired the lock and is executing a child command with > cloned > context, to make sure that the child does not release the locks acquired > by the parent > pass the cloned parent context with out locks to the child. Example patch > [7] > > > Thanks > > Ravi > > > [1] https://bugzilla.redhat.com/1568556 > [2] https://bugzilla.redhat.com/1568447 > [3] https://bugzilla.redhat.com/1571300 > [4] https://bugzilla.redhat.com/1569625 > > > [5] https://gerrit.ovirt.org/#/c/90482/ > [6] https://gerrit.ovirt.org/#/c/90411/ > [7] https://gerrit.ovirt.org/#/c/90568/ > > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From piotr.kliczewski at gmail.com Wed Apr 25 09:26:15 2018 From: piotr.kliczewski at gmail.com (Piotr Kliczewski) Date: Wed, 25 Apr 2018 11:26:15 +0200 Subject: [ovirt-devel] Name resolution not working after vdsm installation Message-ID: All, I am using latest engine and vdsm 4.20.23-1. I installed vdsm on centos7 vm and after the installation my vm stopped resolving names. Here is my nic configuration before installation: TYPE="Ethernet" BOOTPROTO="dhcp" DEFROUTE="yes" PEERDNS="yes" PEERROUTES="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_PEERDNS="yes" IPV6_PEERROUTES="yes" IPV6_FAILURE_FATAL="no" NAME="eth0" UUID="0079668a-8708-46ad-8dbf-da418ae6f77e" DEVICE="eth0" ONBOOT="yes" here is after: # Generated by VDSM version 4.20.23-1.el7.centos DEVICE=eth0 BRIDGE=ovirtmgmt ONBOOT=yes MTU=1500 DEFROUTE=no NM_CONTROLLED=no IPV6INIT=no and the bridge: # Generated by VDSM version 4.20.23-1.el7.centos DEVICE=ovirtmgmt TYPE=Bridge DELAY=0 STP=off ONBOOT=yes BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no IPV6INIT=yes DHCPV6C=yes IPV6_AUTOCONF=no Thanks, Piotr From piotr.kliczewski at gmail.com Wed Apr 25 09:31:13 2018 From: piotr.kliczewski at gmail.com (Piotr Kliczewski) Date: Wed, 25 Apr 2018 11:31:13 +0200 Subject: [ovirt-devel] Installing vdsm on f27 Message-ID: All, I attempted to install vdsm on my f27 machine but I failed due to missing dependency. I was unable to find rubygem-fluent-plugin-elasticsearch. I used upstream repos and was not able to find it. From where a user can get it? Thanks, Piotr From piotr.kliczewski at gmail.com Wed Apr 25 09:43:25 2018 From: piotr.kliczewski at gmail.com (Piotr Kliczewski) Date: Wed, 25 Apr 2018 11:43:25 +0200 Subject: [ovirt-devel] Name resolution not working after vdsm installation In-Reply-To: References: Message-ID: In addition here is my resolv.conf ; generated by /usr/sbin/dhclient-script nameserver 2001:730:3ed2::53 nameserver 2001:730:3ed2:1000::53 search localdomain On Wed, Apr 25, 2018 at 11:26 AM, Piotr Kliczewski wrote: > All, > > I am using latest engine and vdsm 4.20.23-1. I installed vdsm on > centos7 vm and after the installation my vm stopped resolving names. > > Here is my nic configuration before installation: > > TYPE="Ethernet" > BOOTPROTO="dhcp" > DEFROUTE="yes" > PEERDNS="yes" > PEERROUTES="yes" > IPV4_FAILURE_FATAL="no" > IPV6INIT="yes" > IPV6_AUTOCONF="yes" > IPV6_DEFROUTE="yes" > IPV6_PEERDNS="yes" > IPV6_PEERROUTES="yes" > IPV6_FAILURE_FATAL="no" > NAME="eth0" > UUID="0079668a-8708-46ad-8dbf-da418ae6f77e" > DEVICE="eth0" > ONBOOT="yes" > > here is after: > > # Generated by VDSM version 4.20.23-1.el7.centos > DEVICE=eth0 > BRIDGE=ovirtmgmt > ONBOOT=yes > MTU=1500 > DEFROUTE=no > NM_CONTROLLED=no > IPV6INIT=no > > and the bridge: > # Generated by VDSM version 4.20.23-1.el7.centos > DEVICE=ovirtmgmt > TYPE=Bridge > DELAY=0 > STP=off > ONBOOT=yes > BOOTPROTO=dhcp > MTU=1500 > DEFROUTE=yes > NM_CONTROLLED=no > IPV6INIT=yes > DHCPV6C=yes > IPV6_AUTOCONF=no > > Thanks, > Piotr From msivak at redhat.com Wed Apr 25 09:45:30 2018 From: msivak at redhat.com (Martin Sivak) Date: Wed, 25 Apr 2018 11:45:30 +0200 Subject: [ovirt-devel] Name resolution not working after vdsm installation In-Reply-To: References: Message-ID: IPv6 only DNS? Cool (if it works) :) Martin On Wed, Apr 25, 2018 at 11:43 AM, Piotr Kliczewski wrote: > In addition here is my resolv.conf > > ; generated by /usr/sbin/dhclient-script > nameserver 2001:730:3ed2::53 > nameserver 2001:730:3ed2:1000::53 > search localdomain > > On Wed, Apr 25, 2018 at 11:26 AM, Piotr Kliczewski > wrote: >> All, >> >> I am using latest engine and vdsm 4.20.23-1. I installed vdsm on >> centos7 vm and after the installation my vm stopped resolving names. >> >> Here is my nic configuration before installation: >> >> TYPE="Ethernet" >> BOOTPROTO="dhcp" >> DEFROUTE="yes" >> PEERDNS="yes" >> PEERROUTES="yes" >> IPV4_FAILURE_FATAL="no" >> IPV6INIT="yes" >> IPV6_AUTOCONF="yes" >> IPV6_DEFROUTE="yes" >> IPV6_PEERDNS="yes" >> IPV6_PEERROUTES="yes" >> IPV6_FAILURE_FATAL="no" >> NAME="eth0" >> UUID="0079668a-8708-46ad-8dbf-da418ae6f77e" >> DEVICE="eth0" >> ONBOOT="yes" >> >> here is after: >> >> # Generated by VDSM version 4.20.23-1.el7.centos >> DEVICE=eth0 >> BRIDGE=ovirtmgmt >> ONBOOT=yes >> MTU=1500 >> DEFROUTE=no >> NM_CONTROLLED=no >> IPV6INIT=no >> >> and the bridge: >> # Generated by VDSM version 4.20.23-1.el7.centos >> DEVICE=ovirtmgmt >> TYPE=Bridge >> DELAY=0 >> STP=off >> ONBOOT=yes >> BOOTPROTO=dhcp >> MTU=1500 >> DEFROUTE=yes >> NM_CONTROLLED=no >> IPV6INIT=yes >> DHCPV6C=yes >> IPV6_AUTOCONF=no >> >> Thanks, >> Piotr > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel From piotr.kliczewski at gmail.com Wed Apr 25 10:07:21 2018 From: piotr.kliczewski at gmail.com (Piotr Kliczewski) Date: Wed, 25 Apr 2018 12:07:21 +0200 Subject: [ovirt-devel] Name resolution not working after vdsm installation In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 11:45 AM, Martin Sivak wrote: > IPv6 only DNS? Cool (if it works) :) It used to :) > > Martin > > On Wed, Apr 25, 2018 at 11:43 AM, Piotr Kliczewski > wrote: >> In addition here is my resolv.conf >> >> ; generated by /usr/sbin/dhclient-script >> nameserver 2001:730:3ed2::53 >> nameserver 2001:730:3ed2:1000::53 >> search localdomain >> >> On Wed, Apr 25, 2018 at 11:26 AM, Piotr Kliczewski >> wrote: >>> All, >>> >>> I am using latest engine and vdsm 4.20.23-1. I installed vdsm on >>> centos7 vm and after the installation my vm stopped resolving names. >>> >>> Here is my nic configuration before installation: >>> >>> TYPE="Ethernet" >>> BOOTPROTO="dhcp" >>> DEFROUTE="yes" >>> PEERDNS="yes" >>> PEERROUTES="yes" >>> IPV4_FAILURE_FATAL="no" >>> IPV6INIT="yes" >>> IPV6_AUTOCONF="yes" >>> IPV6_DEFROUTE="yes" >>> IPV6_PEERDNS="yes" >>> IPV6_PEERROUTES="yes" >>> IPV6_FAILURE_FATAL="no" >>> NAME="eth0" >>> UUID="0079668a-8708-46ad-8dbf-da418ae6f77e" >>> DEVICE="eth0" >>> ONBOOT="yes" >>> >>> here is after: >>> >>> # Generated by VDSM version 4.20.23-1.el7.centos >>> DEVICE=eth0 >>> BRIDGE=ovirtmgmt >>> ONBOOT=yes >>> MTU=1500 >>> DEFROUTE=no >>> NM_CONTROLLED=no >>> IPV6INIT=no >>> >>> and the bridge: >>> # Generated by VDSM version 4.20.23-1.el7.centos >>> DEVICE=ovirtmgmt >>> TYPE=Bridge >>> DELAY=0 >>> STP=off >>> ONBOOT=yes >>> BOOTPROTO=dhcp >>> MTU=1500 >>> DEFROUTE=yes >>> NM_CONTROLLED=no >>> IPV6INIT=yes >>> DHCPV6C=yes >>> IPV6_AUTOCONF=no >>> >>> Thanks, >>> Piotr >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel From sabose at redhat.com Wed Apr 25 10:24:33 2018 From: sabose at redhat.com (Sahina Bose) Date: Wed, 25 Apr 2018 15:54:33 +0530 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: On Mon, Apr 23, 2018 at 6:28 PM, Sahina Bose wrote: > > On Mon, Apr 23, 2018 at 5:41 PM, Eyal Edri wrote: > >> Sahina, >> Any update on this? >> > > Sorry, haven't been able to spend any time on this. The last I checked > the HE install was failing at task - Get Local VM IP. > and there were no logs from HE VM to debug. > > Will spend sometime on this tomorrow > https://gerrit.ovirt.org/#/c/89953/ - fixes the issue, atleast when I tried this on my local setup. > >> On Wed, Apr 18, 2018 at 3:40 PM, Sandro Bonazzola >> wrote: >> >>> >>> >>> 2018-04-18 9:37 GMT+02:00 Eyal Edri : >>> >>>> FYI, >>>> >>>> I've disabled the 4.2 and master HC suites nightly run on CI as they >>>> are constantly failing for almost 3 weeks and spamming the mailing lists. >>>> >>> >>> >>> HC uses gdeploy 2.0.6 which was released in December and was based on >>> ansible 2.4. >>> ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is not >>> supporting ansible-2.5 properly. >>> I had no time to validate my guess with proof, so please Sahina cross >>> check this. >>> >>> >>> >>>> >>>> I think this should get higher priority for a fix if we want it to >>>> provide any value, >>>> Work can continue using the manual jobs or via check-patch. >>>> >>>> >>>> On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim >>>> wrote: >>>> >>>>> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >>>>> The HC suites still failing and it's hard to understand why without >>>>> the logs from the engine VM. >>>>> >>>>> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi < >>>>>> stirabos at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose >>>>>>> wrote: >>>>>>> >>>>>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>>>>> >>>>>>>> Both the 4.2 and master suites are failing on getting local VM IP. >>>>>>>> Any idea what changed or if I have to change the test? >>>>>>>> >>>>>>>> thanks! >>>>>>>> >>>>>>> >>>>>>> Hi Sahina, >>>>>>> 4.2 and master suite non HC are correctly running this morning. >>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>>>>> >>>>>>> I'll try to check the difference with HC suites. >>>>>>> >>>>>>> Are you using more than one subnet in the HC suites? >>>>>>> >>>>>> >>>>>> No, I'm not. And we havent's changed anything related to network in >>>>>> the test suite. >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *GAL bEN HAIM* >>>>> RHV DEVOPS >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Eyal edri >>>> >>>> >>>> MANAGER >>>> >>>> RHV DevOps >>>> >>>> EMEA VIRTUALIZATION R&D >>>> >>>> >>>> Red Hat EMEA >>>> TRIED. TESTED. TRUSTED. >>>> >>>> phone: +972-9-7692018 >>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> >>> -- >>> >>> SANDRO BONAZZOLA >>> >>> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >>> >>> Red Hat EMEA >>> >>> sbonazzo at redhat.com >>> >>> >>> >> >> >> >> -- >> >> Eyal edri >> >> >> MANAGER >> >> RHV DevOps >> >> EMEA VIRTUALIZATION R&D >> >> >> Red Hat EMEA >> TRIED. TESTED. TRUSTED. >> phone: +972-9-7692018 >> irc: eedri (on #tlv #rhev-dev #rhev-integ) >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tnisan at redhat.com Wed Apr 25 10:45:37 2018 From: tnisan at redhat.com (Tal Nisan) Date: Wed, 25 Apr 2018 13:45:37 +0300 Subject: [ovirt-devel] oVirt Engine 4.2.3 was branched Message-ID: We've started Engine's 4.2.3.3 build a couple of hours ago, as discussed in the build meeting since we are close to GA date with only blockers to make to 4.2.3 and a lot of patches queuing up for 4.2.4 I've branched the 4.2.3.z branch. 4.2.4 patches should be backported to ovirt-engine-4.2 branch and will be merged upon receiving the necessary acks and will be included in the upcoming 4.2.4 build. 4.2.3 patches should be backported to ovirt-engine-4.2 branch as well as the ovirt-engine-4.2.3.z branch, the cherry-pick will most likely be cut and clear but in any case of differentiation in the cherry-pick make sure you verify by compiling as well as *no CI is configured for the 4.2.3.z branch.* Tal. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabose at redhat.com Wed Apr 25 10:53:31 2018 From: sabose at redhat.com (Sahina Bose) Date: Wed, 25 Apr 2018 16:23:31 +0530 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 3:54 PM, Sahina Bose wrote: > > > On Mon, Apr 23, 2018 at 6:28 PM, Sahina Bose wrote: > >> >> On Mon, Apr 23, 2018 at 5:41 PM, Eyal Edri wrote: >> >>> Sahina, >>> Any update on this? >>> >> >> Sorry, haven't been able to spend any time on this. The last I checked >> the HE install was failing at task - Get Local VM IP. >> and there were no logs from HE VM to debug. >> >> Will spend sometime on this tomorrow >> > > https://gerrit.ovirt.org/#/c/89953/ - fixes the issue, atleast when I > tried this on my local setup. > The CI however still fails in the HE install with : TASK [Get local VM IP]", "[ ERROR ] fatal: [localhost]: FAILED! => {\"attempts\": 50, \"changed\": true, \"cmd\": \"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'\", \"delta\": \"0:00:00.043961\", \"end\": \"2018-04-25 05:51:34.226374\", \"rc\": 0, \"start\": \"2018-04-25 05:51:34.182413\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"\", \"stdout_lines\": []}" FWIW, my local setup , ost repo was at I3fc2976ab2400e5908760aadc3258329c0ffdf4d > > >> >>> On Wed, Apr 18, 2018 at 3:40 PM, Sandro Bonazzola >>> wrote: >>> >>>> >>>> >>>> 2018-04-18 9:37 GMT+02:00 Eyal Edri : >>>> >>>>> FYI, >>>>> >>>>> I've disabled the 4.2 and master HC suites nightly run on CI as they >>>>> are constantly failing for almost 3 weeks and spamming the mailing lists. >>>>> >>>> >>>> >>>> HC uses gdeploy 2.0.6 which was released in December and was based on >>>> ansible 2.4. >>>> ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is not >>>> supporting ansible-2.5 properly. >>>> I had no time to validate my guess with proof, so please Sahina cross >>>> check this. >>>> >>>> >>>> >>>>> >>>>> I think this should get higher priority for a fix if we want it to >>>>> provide any value, >>>>> Work can continue using the manual jobs or via check-patch. >>>>> >>>>> >>>>> On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim >>>>> wrote: >>>>> >>>>>> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >>>>>> The HC suites still failing and it's hard to understand why without >>>>>> the logs from the engine VM. >>>>>> >>>>>> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi < >>>>>>> stirabos at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose >>>>>>>> wrote: >>>>>>>> >>>>>>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>>>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>>>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>>>>>> >>>>>>>>> Both the 4.2 and master suites are failing on getting local VM IP. >>>>>>>>> Any idea what changed or if I have to change the test? >>>>>>>>> >>>>>>>>> thanks! >>>>>>>>> >>>>>>>> >>>>>>>> Hi Sahina, >>>>>>>> 4.2 and master suite non HC are correctly running this morning. >>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>>>>>> >>>>>>>> I'll try to check the difference with HC suites. >>>>>>>> >>>>>>>> Are you using more than one subnet in the HC suites? >>>>>>>> >>>>>>> >>>>>>> No, I'm not. And we havent's changed anything related to network in >>>>>>> the test suite. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *GAL bEN HAIM* >>>>>> RHV DEVOPS >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Eyal edri >>>>> >>>>> >>>>> MANAGER >>>>> >>>>> RHV DevOps >>>>> >>>>> EMEA VIRTUALIZATION R&D >>>>> >>>>> >>>>> Red Hat EMEA >>>>> TRIED. TESTED. TRUSTED. >>>>> >>>>> phone: +972-9-7692018 >>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>>> >>>>> _______________________________________________ >>>>> Devel mailing list >>>>> Devel at ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> SANDRO BONAZZOLA >>>> >>>> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >>>> >>>> Red Hat EMEA >>>> >>>> sbonazzo at redhat.com >>>> >>>> >>>> >>> >>> >>> >>> -- >>> >>> Eyal edri >>> >>> >>> MANAGER >>> >>> RHV DevOps >>> >>> EMEA VIRTUALIZATION R&D >>> >>> >>> Red Hat EMEA >>> TRIED. TESTED. TRUSTED. >>> >>> phone: +972-9-7692018 >>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Wed Apr 25 12:18:22 2018 From: dron at redhat.com (Dafna Ron) Date: Wed, 25 Apr 2018 13:18:22 +0100 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 (ovirt-engine-metrics) ] [ 25-04-2018 ] [005_network_by_label.assign_hosts_network_label ] Message-ID: hi, we have a failure for test 005_network_by_label.assign_hosts_network_label on basic suite. action seemed to have failed because we failed to acquire lock for host The failure does not seem to be related to the reported change *Link and headline of suspected patches: The change reported is not related to *this failure. *Link to Job:* *http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1797/ Link to all logs:http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1797/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-005_network_by_label.py/ (Relevant) error snippet from the log: 2018-04-25 07:54:13,518-04 WARN [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (default task-19) [69ada7e2] Validation of action 'HostSetupNetworks' failed for user admin at internal-authz. Reasons: VAR__ACTION__SETUP,VAR__TYPE__NETWORKS,ACTION_TYPE_FAILED_SETUP_NETWORKS_OR_REFRESH_IN_PROGRESS2018-04-25 07:54:13,530-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-19) [69ada7e2] Compiled stored procedure. Call string is [{call get_entity_snapshot_by_command_id(?)}]2018-04-25 07:54:13,530-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-19) [69ada7e2] SqlCall for procedure [get_entity_snapshot_by_command_id] compiled2018-04-25 07:54:13,543-04 ERROR [org.ovirt.engine.core.bll.network.host.LabelNicCommand] (default task-19) [69ada7e2] Transaction rolled-back for command 'org.ovirt.engine.core.bll.network.host.LabelNicCommand'.2018-04-25 07:54:13,548-04 DEBUG [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] (default task-19) [69ada7e2] method: get, params: [d8c9cedc-c26c-45c9-a9fa-78569a96e5da], timeElapsed: 3ms2018-04-25 07:54:13,552-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-19) [69ada7e2] EVENT_ID: LABEL_NIC_FAILED(1,137), Failed to label network interface card eth0 with label NETWORK_LABEL on host lago-basic-suite-4-2-host-1.2018-04-25 07:54:13,552-04 DEBUG [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] (default task-19) [69ada7e2] method: runAction, params: [LabelNic, LabelNicParameters:{commandId='97cd3097-83e5-468b-80db-1db341fd6b20', user='null', commandType='Unknown'}], timeElapsed: 57ms2018-04-25 07:54:13,560-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-19) [] Operation Failed: [Cannot add Label. Another Setup Networks or Host Refresh process in progress on the host. Please try later.]* -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Wed Apr 25 13:43:25 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Wed, 25 Apr 2018 16:43:25 +0300 Subject: [ovirt-devel] Name resolution not working after vdsm installation In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 1:07 PM, Piotr Kliczewski wrote: > On Wed, Apr 25, 2018 at 11:45 AM, Martin Sivak wrote: >> IPv6 only DNS? Cool (if it works) :) > > It used to :) > >> >> Martin >> >> On Wed, Apr 25, 2018 at 11:43 AM, Piotr Kliczewski >> wrote: >>> In addition here is my resolv.conf >>> >>> ; generated by /usr/sbin/dhclient-script >>> nameserver 2001:730:3ed2::53 >>> nameserver 2001:730:3ed2:1000::53 >>> search localdomain >>> >>> On Wed, Apr 25, 2018 at 11:26 AM, Piotr Kliczewski >>> wrote: >>>> All, >>>> >>>> I am using latest engine and vdsm 4.20.23-1. I installed vdsm on >>>> centos7 vm and after the installation my vm stopped resolving names. >>>> >>>> Here is my nic configuration before installation: >>>> >>>> TYPE="Ethernet" >>>> BOOTPROTO="dhcp" >>>> DEFROUTE="yes" >>>> PEERDNS="yes" >>>> PEERROUTES="yes" >>>> IPV4_FAILURE_FATAL="no" >>>> IPV6INIT="yes" >>>> IPV6_AUTOCONF="yes" >>>> IPV6_DEFROUTE="yes" >>>> IPV6_PEERDNS="yes" >>>> IPV6_PEERROUTES="yes" >>>> IPV6_FAILURE_FATAL="no" Sorry for breaking your use case. Marcin can show you how to write an ifcfg hook which reinroduces IPV6_ entries to the ifcfg file. We never supported IPV6 default route; but I would have expected add host flow to at least maintain your autoconf. Could you file a bug with your *vdsm.log and engine.log from the time of add-host? From piotr.kliczewski at gmail.com Wed Apr 25 14:56:07 2018 From: piotr.kliczewski at gmail.com (Piotr Kliczewski) Date: Wed, 25 Apr 2018 16:56:07 +0200 Subject: [ovirt-devel] Name resolution not working after vdsm installation In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 3:43 PM, Dan Kenigsberg wrote: > > Sorry for breaking your use case. > Marcin can show you how to write an ifcfg hook which reinroduces IPV6_ > entries to the ifcfg file. It is enough to add ipv4 nameserver to resolv.conf to workaround the issue. > > We never supported IPV6 default route; but I would have expected add > host flow to at least maintain your autoconf. > Could you file a bug with your *vdsm.log and engine.log from the time > of add-host? Here is the BZ [1] [1] https://bugzilla.redhat.com/1571869 From mperina at redhat.com Wed Apr 25 14:57:54 2018 From: mperina at redhat.com (Martin Perina) Date: Wed, 25 Apr 2018 16:57:54 +0200 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 3:28 PM, Dan Kenigsberg wrote: > On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori > wrote: > > > > > > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg > wrote: > >> > >> Ravi's patch is in, but a similar problem remains, and the test cannot > >> be put back into its place. > >> > >> It seems that while Vdsm was taken down, a couple of getCapsAsync > >> requests queued up. At one point, the host resumed its connection, > >> before the requests have been cleared of the queue. After the host is > >> up, the following tests resume, and at a pseudorandom point in time, > >> an old getCapsAsync request times out and kills our connection. > >> > >> I believe that as long as ANY request is on flight, the monitoring > >> lock should not be released, and the host should not be declared as > >> up. > >> > >> > > > > > > Hi Dan, > > > > Can I have the link to the job on jenkins so I can look at the logs > > We disabled a network test that started failing after getCapsAsync was > merged. > Please own its re-introduction to OST: https://gerrit.ovirt.org/#/c/90264/ > > Its most recent failure > http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ > has been discussed by Alona and Piotr over IRC. > ?So https://bugzilla.redhat.com/1571768 was created to cover this issue? discovered during Alona's and Piotr's conversation. But after further discussion we have found out that this issue is not related to non-blocking thread changes in engine 4.2 and this behavior exists from beginning of vdsm-jsonrpc-java. Ravi will continue verify the fix for BZ1571768 along with other locking changes he already posted to see if they will help network OST to succeed. But the fix for BZ1571768 is too dangerous for 4.2.3, let's try to fix that on master and let's see if it doesn't introduce any regressions. If not, then we can backport to 4.2.4. -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Wed Apr 25 15:34:40 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Wed, 25 Apr 2018 18:34:40 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 5:57 PM, Martin Perina wrote: > > > On Tue, Apr 24, 2018 at 3:28 PM, Dan Kenigsberg wrote: >> >> On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori >> wrote: >> > >> > >> > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >> > wrote: >> >> >> >> Ravi's patch is in, but a similar problem remains, and the test cannot >> >> be put back into its place. >> >> >> >> It seems that while Vdsm was taken down, a couple of getCapsAsync >> >> requests queued up. At one point, the host resumed its connection, >> >> before the requests have been cleared of the queue. After the host is >> >> up, the following tests resume, and at a pseudorandom point in time, >> >> an old getCapsAsync request times out and kills our connection. >> >> >> >> I believe that as long as ANY request is on flight, the monitoring >> >> lock should not be released, and the host should not be declared as >> >> up. >> >> >> >> >> > >> > >> > Hi Dan, >> > >> > Can I have the link to the job on jenkins so I can look at the logs >> >> We disabled a network test that started failing after getCapsAsync was >> merged. >> Please own its re-introduction to OST: https://gerrit.ovirt.org/#/c/90264/ >> >> Its most recent failure >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >> has been discussed by Alona and Piotr over IRC. > > > So https://bugzilla.redhat.com/1571768 was created to cover this issue > discovered during Alona's and Piotr's conversation. But after further > discussion we have found out that this issue is not related to non-blocking > thread changes in engine 4.2 and this behavior exists from beginning of > vdsm-jsonrpc-java. Ravi will continue verify the fix for BZ1571768 along > with other locking changes he already posted to see if they will help > network OST to succeed. > > But the fix for BZ1571768 is too dangerous for 4.2.3, let's try to fix that > on master and let's see if it doesn't introduce any regressions. If not, > then we can backport to 4.2.4. I sense as if there is a regression in connection management, that coincided with the introduction of async monitoring. I am not alone: Gal Ben Haim was reluctant to take our test back. Do you think that it is now safe to take it in https://gerrit.ovirt.org/#/c/90264/ ? I'd appreciate your support there. I don't want any test to be skipped without a very good reason. From rnori at redhat.com Wed Apr 25 16:20:50 2018 From: rnori at redhat.com (Ravi Shankar Nori) Date: Wed, 25 Apr 2018 12:20:50 -0400 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 10:57 AM, Martin Perina wrote: > > > On Tue, Apr 24, 2018 at 3:28 PM, Dan Kenigsberg wrote: > >> On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori >> wrote: >> > >> > >> > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >> wrote: >> >> >> >> Ravi's patch is in, but a similar problem remains, and the test cannot >> >> be put back into its place. >> >> >> >> It seems that while Vdsm was taken down, a couple of getCapsAsync >> >> requests queued up. At one point, the host resumed its connection, >> >> before the requests have been cleared of the queue. After the host is >> >> up, the following tests resume, and at a pseudorandom point in time, >> >> an old getCapsAsync request times out and kills our connection. >> >> >> >> I believe that as long as ANY request is on flight, the monitoring >> >> lock should not be released, and the host should not be declared as >> >> up. >> >> >> >> >> > >> > >> > Hi Dan, >> > >> > Can I have the link to the job on jenkins so I can look at the logs >> >> We disabled a network test that started failing after getCapsAsync was >> merged. >> Please own its re-introduction to OST: https://gerrit.ovirt.org/#/c/9 >> 0264/ >> >> Its most recent failure >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >> has been discussed by Alona and Piotr over IRC. >> > > ?So https://bugzilla.redhat.com/1571768 was created to cover this issue? > discovered during Alona's and Piotr's conversation. But after further > discussion we have found out that this issue is not related to non-blocking > thread changes in engine 4.2 and this behavior exists from beginning of > vdsm-jsonrpc-java. Ravi will continue verify the fix for BZ1571768 along > with other locking changes he already posted to see if they will help > network OST to succeed. > > But the fix for BZ1571768 is too dangerous for 4.2.3, let's try to fix > that on master and let's see if it doesn't introduce any regressions. If > not, then we can backport to 4.2.4. > > > > -- > Martin Perina > Associate Manager, Software Engineering > Red Hat Czech s.r.o. > Posted a vdsm-jsonrpc-java patch [1] for BZ 1571768 [2] which fixes the OST issue with enabling 006_migrations.prepare_migration_attachments_ipv6. I ran OST with the vdsm-jsonrpc-java patch [1] and the patch to add back 006_migrations.prepare_migration_attachments_ipv6 [3] and the jobs succeeded thrice [4][5][6] [1] https://gerrit.ovirt.org/#/c/90646/ [2] https://bugzilla.redhat.com/show_bug.cgi?id=1571768 [3] https://gerrit.ovirt.org/#/c/90264/ [4] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2643/ [5] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2644/ [6] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2645/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Wed Apr 25 16:48:47 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Wed, 25 Apr 2018 12:48:47 -0400 Subject: [ovirt-devel] Installing vdsm on f27 In-Reply-To: References: Message-ID: Is Fedora a supported host platform? On Wed, Apr 25, 2018 at 5:31 AM, Piotr Kliczewski < piotr.kliczewski at gmail.com> wrote: > All, > > I attempted to install vdsm on my f27 machine but I failed due to > missing dependency. I was unable to find > rubygem-fluent-plugin-elasticsearch. I used upstream repos and was not > able to find it. From where a user can get it? > > Thanks, > Piotr > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA gshereme at redhat.com IRC: gshereme -------------- next part -------------- An HTML attachment was scrubbed... URL: From mperina at redhat.com Wed Apr 25 17:06:47 2018 From: mperina at redhat.com (Martin Perina) Date: Wed, 25 Apr 2018 17:06:47 +0000 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: Ravi/Piotr, so what's the connection between non-blocking threads, jsonrpc-java connection closing and failing this network test? Does it mean that non-blocking threads change just revealed the jsonrpc-java issue which we haven't noticed before? And did the test really works with code prior to non-blocking threads changes and we are missing something else? On Wed, 25 Apr 2018, 18:21 Ravi Shankar Nori, wrote: > > > On Wed, Apr 25, 2018 at 10:57 AM, Martin Perina > wrote: > >> >> >> On Tue, Apr 24, 2018 at 3:28 PM, Dan Kenigsberg >> wrote: >> >>> On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori >>> wrote: >>> > >>> > >>> > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >>> wrote: >>> >> >>> >> Ravi's patch is in, but a similar problem remains, and the test cannot >>> >> be put back into its place. >>> >> >>> >> It seems that while Vdsm was taken down, a couple of getCapsAsync >>> >> requests queued up. At one point, the host resumed its connection, >>> >> before the requests have been cleared of the queue. After the host is >>> >> up, the following tests resume, and at a pseudorandom point in time, >>> >> an old getCapsAsync request times out and kills our connection. >>> >> >>> >> I believe that as long as ANY request is on flight, the monitoring >>> >> lock should not be released, and the host should not be declared as >>> >> up. >>> >> >>> >> >>> > >>> > >>> > Hi Dan, >>> > >>> > Can I have the link to the job on jenkins so I can look at the logs >>> >>> We disabled a network test that started failing after getCapsAsync was >>> merged. >>> Please own its re-introduction to OST: >>> https://gerrit.ovirt.org/#/c/90264/ >>> >>> Its most recent failure >>> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >>> has been discussed by Alona and Piotr over IRC. >>> >> >> ?So https://bugzilla.redhat.com/1571768 was created to cover this issue? >> discovered during Alona's and Piotr's conversation. But after further >> discussion we have found out that this issue is not related to non-blocking >> thread changes in engine 4.2 and this behavior exists from beginning of >> vdsm-jsonrpc-java. Ravi will continue verify the fix for BZ1571768 along >> with other locking changes he already posted to see if they will help >> network OST to succeed. >> >> But the fix for BZ1571768 is too dangerous for 4.2.3, let's try to fix >> that on master and let's see if it doesn't introduce any regressions. If >> not, then we can backport to 4.2.4. >> >> >> >> -- >> Martin Perina >> Associate Manager, Software Engineering >> Red Hat Czech s.r.o. >> > > Posted a vdsm-jsonrpc-java patch [1] for BZ 1571768 [2] which fixes the > OST issue with enabling 006_migrations.prepare_migration_attachments_ipv6. > > I ran OST with the vdsm-jsonrpc-java patch [1] and the patch to add back > 006_migrations.prepare_migration_attachments_ipv6 [3] and the jobs > succeeded thrice [4][5][6] > > [1] https://gerrit.ovirt.org/#/c/90646/ > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1571768 > [3] https://gerrit.ovirt.org/#/c/90264/ > [4] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2643/ > [5] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2644/ > [6] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2645/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkliczew at redhat.com Wed Apr 25 17:15:52 2018 From: pkliczew at redhat.com (Piotr Kliczewski) Date: Wed, 25 Apr 2018 17:15:52 +0000 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: ?r., 25 kwi 2018, 19:07 u?ytkownik Martin Perina napisa?: > Ravi/Piotr, so what's the connection between non-blocking threads, > jsonrpc-java connection closing and failing this network test? Does it mean > that non-blocking threads change just revealed the jsonrpc-java issue which > we haven't noticed before? > And did the test really works with code prior to non-blocking threads > changes and we are missing something else? > I think that the test found something not related to non-blocking threads. This behavior was in the code since the beginning. > > On Wed, 25 Apr 2018, 18:21 Ravi Shankar Nori, wrote: > >> >> >> On Wed, Apr 25, 2018 at 10:57 AM, Martin Perina >> wrote: >> >>> >>> >>> On Tue, Apr 24, 2018 at 3:28 PM, Dan Kenigsberg >>> wrote: >>> >>>> On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori >>>> wrote: >>>> > >>>> > >>>> > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >>>> wrote: >>>> >> >>>> >> Ravi's patch is in, but a similar problem remains, and the test >>>> cannot >>>> >> be put back into its place. >>>> >> >>>> >> It seems that while Vdsm was taken down, a couple of getCapsAsync >>>> >> requests queued up. At one point, the host resumed its connection, >>>> >> before the requests have been cleared of the queue. After the host is >>>> >> up, the following tests resume, and at a pseudorandom point in time, >>>> >> an old getCapsAsync request times out and kills our connection. >>>> >> >>>> >> I believe that as long as ANY request is on flight, the monitoring >>>> >> lock should not be released, and the host should not be declared as >>>> >> up. >>>> >> >>>> >> >>>> > >>>> > >>>> > Hi Dan, >>>> > >>>> > Can I have the link to the job on jenkins so I can look at the logs >>>> >>>> We disabled a network test that started failing after getCapsAsync was >>>> merged. >>>> Please own its re-introduction to OST: >>>> https://gerrit.ovirt.org/#/c/90264/ >>>> >>>> Its most recent failure >>>> >>>> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >>>> has been discussed by Alona and Piotr over IRC. >>>> >>> >>> ?So https://bugzilla.redhat.com/1571768 was created to cover this >>> issue? discovered during Alona's and Piotr's conversation. But after >>> further discussion we have found out that this issue is not related to >>> non-blocking thread changes in engine 4.2 and this behavior exists from >>> beginning of vdsm-jsonrpc-java. Ravi will continue verify the fix for >>> BZ1571768 along with other locking changes he already posted to see if they >>> will help network OST to succeed. >>> >>> But the fix for BZ1571768 is too dangerous for 4.2.3, let's try to fix >>> that on master and let's see if it doesn't introduce any regressions. If >>> not, then we can backport to 4.2.4. >>> >>> >>> >>> -- >>> Martin Perina >>> Associate Manager, Software Engineering >>> Red Hat Czech s.r.o. >>> >> >> Posted a vdsm-jsonrpc-java patch [1] for BZ 1571768 [2] which fixes the >> OST issue with enabling 006_migrations.prepare_migration_attachments_ipv6. >> >> I ran OST with the vdsm-jsonrpc-java patch [1] and the patch to add back >> 006_migrations.prepare_migration_attachments_ipv6 [3] and the jobs >> succeeded thrice [4][5][6] >> >> [1] https://gerrit.ovirt.org/#/c/90646/ >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1571768 >> [3] https://gerrit.ovirt.org/#/c/90264/ >> [4] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2643/ >> [5] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2644/ >> [6] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2645/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danken at redhat.com Thu Apr 26 06:50:44 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Thu, 26 Apr 2018 09:50:44 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 8:15 PM, Piotr Kliczewski wrote: > > > ?r., 25 kwi 2018, 19:07 u?ytkownik Martin Perina > napisa?: >> >> Ravi/Piotr, so what's the connection between non-blocking threads, >> jsonrpc-java connection closing and failing this network test? Does it mean >> that non-blocking threads change just revealed the jsonrpc-java issue which >> we haven't noticed before? >> And did the test really works with code prior to non-blocking threads >> changes and we are missing something else? > > > I think that the test found something not related to non-blocking threads. > This behavior was in the code since the beginning. ipv6 migration test is quite old too commit dea714ba85d5c0b77859f504bb80840275277e68 Author: Ond?ej Svoboda Date: Tue Mar 14 11:21:53 2017 +0100 Re-enable IPv6 migration. and vdsm_recovery test is not so young: commit 315cc100bc9bc0d81c01728a3576a10bfa576ac4 Author: Milan Zamazal Date: Fri Sep 23 10:08:47 2016 +0200 vdsm_recovery test From danken at redhat.com Thu Apr 26 06:51:58 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Thu, 26 Apr 2018 09:51:58 +0300 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6] In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 7:20 PM, Ravi Shankar Nori wrote: > > > On Wed, Apr 25, 2018 at 10:57 AM, Martin Perina wrote: >> >> >> >> On Tue, Apr 24, 2018 at 3:28 PM, Dan Kenigsberg wrote: >>> >>> On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori >>> wrote: >>> > >>> > >>> > On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >>> > wrote: >>> >> >>> >> Ravi's patch is in, but a similar problem remains, and the test cannot >>> >> be put back into its place. >>> >> >>> >> It seems that while Vdsm was taken down, a couple of getCapsAsync >>> >> requests queued up. At one point, the host resumed its connection, >>> >> before the requests have been cleared of the queue. After the host is >>> >> up, the following tests resume, and at a pseudorandom point in time, >>> >> an old getCapsAsync request times out and kills our connection. >>> >> >>> >> I believe that as long as ANY request is on flight, the monitoring >>> >> lock should not be released, and the host should not be declared as >>> >> up. >>> >> >>> >> >>> > >>> > >>> > Hi Dan, >>> > >>> > Can I have the link to the job on jenkins so I can look at the logs >>> >>> We disabled a network test that started failing after getCapsAsync was >>> merged. >>> Please own its re-introduction to OST: >>> https://gerrit.ovirt.org/#/c/90264/ >>> >>> Its most recent failure >>> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >>> has been discussed by Alona and Piotr over IRC. >> >> >> So https://bugzilla.redhat.com/1571768 was created to cover this issue >> discovered during Alona's and Piotr's conversation. But after further >> discussion we have found out that this issue is not related to non-blocking >> thread changes in engine 4.2 and this behavior exists from beginning of >> vdsm-jsonrpc-java. Ravi will continue verify the fix for BZ1571768 along >> with other locking changes he already posted to see if they will help >> network OST to succeed. >> >> But the fix for BZ1571768 is too dangerous for 4.2.3, let's try to fix >> that on master and let's see if it doesn't introduce any regressions. If >> not, then we can backport to 4.2.4. >> >> >> >> -- >> Martin Perina >> Associate Manager, Software Engineering >> Red Hat Czech s.r.o. > > > Posted a vdsm-jsonrpc-java patch [1] for BZ 1571768 [2] which fixes the OST > issue with enabling 006_migrations.prepare_migration_attachments_ipv6. > > I ran OST with the vdsm-jsonrpc-java patch [1] and the patch to add back > 006_migrations.prepare_migration_attachments_ipv6 [3] and the jobs > succeeded thrice [4][5][6] > > [1] https://gerrit.ovirt.org/#/c/90646/ > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1571768 > [3] https://gerrit.ovirt.org/#/c/90264/ > [4] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2643/ > [5] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2644/ > [6] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/2645/ Eyal, Gal: would you please take the test back? From danken at redhat.com Thu Apr 26 06:55:03 2018 From: danken at redhat.com (Dan Kenigsberg) Date: Thu, 26 Apr 2018 09:55:03 +0300 Subject: [ovirt-devel] Installing vdsm on f27 In-Reply-To: References: Message-ID: No, but it is a common development platform. I would like to reintroduce fc28 as a support ovirt-4.3 platform, so please report errors such as Piotr's (or sharon's https://gerrit.ovirt.org/#/c/90544/ ) and try to fix them. On Wed, Apr 25, 2018 at 7:48 PM, Greg Sheremeta wrote: > Is Fedora a supported host platform? > > On Wed, Apr 25, 2018 at 5:31 AM, Piotr Kliczewski < > piotr.kliczewski at gmail.com> wrote: > >> All, >> >> I attempted to install vdsm on my f27 machine but I failed due to >> missing dependency. I was unable to find >> rubygem-fluent-plugin-elasticsearch. I used upstream repos and was not >> able to find it. From where a user can get it? >> >> Thanks, >> Piotr >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > > GREG SHEREMETA > > SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX > > Red Hat NA > > > > gshereme at redhat.com IRC: gshereme > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbonazzo at redhat.com Thu Apr 26 07:15:15 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Thu, 26 Apr 2018 09:15:15 +0200 Subject: [ovirt-devel] Installing vdsm on f27 In-Reply-To: References: Message-ID: 2018-04-25 11:31 GMT+02:00 Piotr Kliczewski : > All, > > I attempted to install vdsm on my f27 machine but I failed due to > missing dependency. I was unable to find > rubygem-fluent-plugin-elasticsearch. I used upstream repos and was not > able to find it. From where a user can get it? > Opened https://bugzilla.redhat.com/show_bug.cgi?id=1572079 > > Thanks, > Piotr > -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkorren at redhat.com Thu Apr 26 08:23:58 2018 From: bkorren at redhat.com (Barak Korren) Date: Thu, 26 Apr 2018 11:23:58 +0300 Subject: [ovirt-devel] [PLEASE NOTE]: STDCI V2 presentation moved to next week Message-ID: It will be on May 3rd at 11:00 IST/10:00 CEST/9:00 UTC An updated calendar invite was already sent on Tuesday, and will be sent again shortly. Apologies to anyone who thought this was still hapenning today. -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted From bkorren at redhat.com Thu Apr 26 08:26:13 2018 From: bkorren at redhat.com (bkorren at redhat.com) Date: Thu, 26 Apr 2018 08:26:13 +0000 Subject: [ovirt-devel] oVirt STDCI v2 deep dive Message-ID: <001a11441f3af917ef056abc2393@google.com> Please note that the talk was re-scheduled after it was initially announced. You have been invited to the following event. Title: oVirt STDCI v2 deep dive Introduction to the 2nd version of oVirt's CI standard - What is it, what can it do, how to use it and how does it work. Note: Meeting was moved To join the Meeting: https://bluejeans.com/8705030462 To join via Room System: Video Conferencing System: redhat.bjn.vc -or-199.48.152.18 Meeting ID : 8705030462 To join via phone : 1) Dial: 408-915-6466 (United States) (see all numbers - https://www.redhat.com/en/conference-numbers) 2) Enter Conference ID : 8705030462 RSVP at: https://www.eventbrite.com/e/ovirt-stdci-v2-deep-dive-tickets-45468120372 When: Thu May 3, 2018 11:00 ? 12:00 Jerusalem Where: ; https://bluejeans.com/8705030462, raanana-04-asia-8-p-vc Who: * bkorren at redhat.com - organizer * devel at ovirt.org * sbonazzo at redhat.com * dkenigsb at redhat.com * rbarry at redhat.com * mskrivan at redhat.com * eedri at redhat.com * tjelinek at redhat.com * tnisan at redhat.com * mperina at redhat.com * sradco at redhat.com * msivak at redhat.com * sabose at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbonazzo at redhat.com Thu Apr 26 08:37:44 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Thu, 26 Apr 2018 10:37:44 +0200 Subject: [ovirt-devel] oVirt STDCI v2 deep dive In-Reply-To: <001a11441f3af917ef056abc2393@google.com> References: <001a11441f3af917ef056abc2393@google.com> Message-ID: 2018-04-26 10:26 GMT+02:00 : > Please note that the talk was re-scheduled after it was initially > announced. > Not sure why, but I haven't received the following invitation. I've registered to eventbrite event. Please note it's timing is set to Thursday, May 3, 2018 at 11:00 AM - Friday, May 4, 2018 at 2:00 AM (IDT) which looks wrong :-) > oVirt STDCI v2 deep dive > > Introduction to the 2nd version of oVirt's CI standard - What is it, what > can it do, how to use it and how does it work. > > Note: Meeting was moved > > To join the Meeting: > https://bluejeans.com/8705030462 > > > To join via Room System: > Video Conferencing System: redhat.bjn.vc > > -or-199.48.152.18 > Meeting ID : 8705030462 > > To join via phone : > 1) Dial: > 408-915-6466 (United States) > (see all numbers - https://www.redhat.com/en/conference-numbers > > ) > 2) Enter Conference ID : 8705030462 > > > RSVP at: > https://www.eventbrite.com/e/ovirt-stdci-v2-deep-dive-tickets-45468120372 > > > *When* > Thu May 3, 2018 11:00 ? 12:00 Jerusalem > > *Where* > ; https://bluejeans.com/8705030462, raanana-04-asia-8-p-vc (map > > ) > > *Who* > ? > bkorren at redhat.com - organizer > ? > devel at ovirt.org > ? > sbonazzo at redhat.com > ? > dkenigsb at redhat.com > ? > rbarry at redhat.com > ? > mskrivan at redhat.com > ? > eedri at redhat.com > ? > tjelinek at redhat.com > ? > tnisan at redhat.com > ? > mperina at redhat.com > ? > sradco at redhat.com > ? > msivak at redhat.com > ? > sabose at redhat.com > -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From akrejcir at redhat.com Thu Apr 26 10:29:19 2018 From: akrejcir at redhat.com (Andrej Krejcir) Date: Thu, 26 Apr 2018 12:29:19 +0200 Subject: [ovirt-devel] Changing unit labels from GB, MB, KB to GiB, MiB, KiB Message-ID: Hi, In the ovirt-engine UI, we use units KB, MB, GB with the meaning of powers of 1024 bytes, instead of powers of 1000. Some time ago, there was a bug[1] that has changed some of the GB labels to GiB, but not in all places. This causes inconsistent messages in the UI, where one part of a sentence uses GB and the other GiB. To make it consistent, I have posted patches[2] that change unit labels in other places as well. This change touches strings from different parts of the project, so I'm sending this email to let you know about it. Feel free to check the patches or comment if the change is correct. Andrej [1] - https://bugzilla.redhat.com/1535714 [2] - https://gerrit.ovirt.org/#/q/project:ovirt-engine+topic:unit-relabel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ederevea at redhat.com Thu Apr 26 12:59:07 2018 From: ederevea at redhat.com (Evgheni Dereveanchin) Date: Thu, 26 Apr 2018 14:59:07 +0200 Subject: [ovirt-devel] gerrit service restart Message-ID: Hi everyone, As you may have noticed, Gerrit is slow to respond today. I will restart the service in a few minutes to address that so if you get some failures while trying to fetch/push/review patches please re-try in a few minutes. -- Regards, Evgheni Dereveanchin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ederevea at redhat.com Thu Apr 26 13:35:37 2018 From: ederevea at redhat.com (Evgheni Dereveanchin) Date: Thu, 26 Apr 2018 15:35:37 +0200 Subject: [ovirt-devel] gerrit service restart In-Reply-To: References: Message-ID: Service restart complete but I still see that some operations are slow. I will monitor the situation and schedule a full system restart outside of working hours if needed. On Thu, Apr 26, 2018 at 2:59 PM, Evgheni Dereveanchin wrote: > Hi everyone, > > As you may have noticed, Gerrit is slow to respond today. I will restart > the service in a few minutes to address that so if you get some failures > while trying to fetch/push/review patches please re-try in a few minutes. > > -- > Regards, > Evgheni Dereveanchin > -- Regards, Evgheni Dereveanchin -------------- next part -------------- An HTML attachment was scrubbed... URL: From gshereme at redhat.com Thu Apr 26 16:31:28 2018 From: gshereme at redhat.com (Greg Sheremeta) Date: Thu, 26 Apr 2018 12:31:28 -0400 Subject: [ovirt-devel] Changing unit labels from GB, MB, KB to GiB, MiB, KiB In-Reply-To: References: Message-ID: Related, in the dashboard: Bug 1492397 ? consistent binary prefix units for sizes (MiB, GiB) https://bugzilla.redhat.com/show_bug.cgi?id=1492397 We probably only need one of the bugs. Best wishes, Greg On Thu, Apr 26, 2018 at 6:29 AM, Andrej Krejcir wrote: > Hi, > > In the ovirt-engine UI, we use units KB, MB, GB with the meaning of powers > of 1024 bytes, instead of powers of 1000. Some time ago, there was a > bug[1] that has changed some of the GB labels to GiB, but not in all > places. This causes inconsistent messages in the UI, where one part of a > sentence uses GB and the other GiB. > > To make it consistent, I have posted patches[2] that change unit labels in > other places as well. > > This change touches strings from different parts of the project, so I'm > sending this email to let you know about it. Feel free to check the > patches or comment if the change is correct. > > Andrej > > > [1] - https://bugzilla.redhat.com/1535714 > [2] - https://gerrit.ovirt.org/#/q/project:ovirt-engine+topic:unit-relabel > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA gshereme at redhat.com IRC: gshereme -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpolednik at redhat.com Fri Apr 27 07:23:24 2018 From: mpolednik at redhat.com (Martin Polednik) Date: Fri, 27 Apr 2018 09:23:24 +0200 Subject: [ovirt-devel] dynamic ownership changes In-Reply-To: References: <20180418081628.GA1704@Alexandra.local> <20180418111733.GA4702@Alexandra.local> <20180419120709.GB9449@Alexandra.local> <20180423125653.GA19987@Alexandra.local> Message-ID: <20180427072323.GA61744@Alexandra.local> On 24/04/18 00:37 +0300, Elad Ben Aharon wrote: >I will update with the results of the next tier1 execution on latest 4.2.3 That isn't master but old branch though. Could you run it against *current* VDSM master? >On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik >wrote: > >> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote: >> >>> Hi, I've triggered another execution [1] due to some issues I saw in the >>> first which are not related to the patch. >>> >>> The success rate is 78% which is low comparing to tier1 executions with >>> code from downstream builds (95-100% success rates) [2]. >>> >> >> Could you run the current master (without the dynamic_ownership patch) >> so that we have viable comparision? >> >> From what I could see so far, there is an issue with move and copy >>> operations to and from Gluster domains. For example [3]. >>> >>> The logs are attached. >>> >>> >>> [1] >>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv >>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/ >>> >> -4.2-ge-runner-tier1-after-upgrade/7/testReport/>* >>> >>> >>> >>> [2] >>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ >>> >>> rhv-4.2-ge-runner-tier1-after-upgrade/7/ >>> >>> >>> >>> [3] >>> 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH >>> deleteImage error=Image does not exist in domain: >>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >>> from=: >>> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935, >>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) >>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] >>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875) >>> Traceback (most recent call last): >>> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, >>> in >>> _run >>> return fn(*args, **kargs) >>> File "", line 2, in deleteImage >>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in >>> method >>> ret = func(*args, **kwargs) >>> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503, >>> in >>> deleteImage >>> raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) >>> ImageDoesNotExistInSD: Image does not exist in domain: >>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >>> >>> 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task] >>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted: >>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835- >>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268 >>> (task:1181) >>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH >>> deleteImage error=Image does not exist in domain: >>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>> domain=e5fd29c8-52ba-467e-be09 >>> -ca40ff054d >>> d4' (dispatcher:82) >>> >>> >>> >>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon >>> wrote: >>> >>> Triggered a sanity tier1 execution [1] using [2], which covers all the >>>> requested areas, on iSCSI, NFS and Gluster. >>>> I'll update with the results. >>>> >>>> [1] >>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 >>>> _dev/job/rhv-4.2-ge-flow-storage/1161/ >>>> >>>> [2] >>>> https://gerrit.ovirt.org/#/c/89830/ >>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64 >>>> >>>> >>>> >>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik >>>> wrote: >>>> >>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >>>>> >>>>> Hi Martin, >>>>>> >>>>>> I see [1] requires a rebase, can you please take care? >>>>>> >>>>>> >>>>> Should be rebased. >>>>> >>>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and >>>>> >>>>>> FC. >>>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's >>>>>> not >>>>>> stable enough at the moment. >>>>>> >>>>>> >>>>> That is still pretty good. >>>>> >>>>> >>>>> [1] https://gerrit.ovirt.org/#/c/89830/ >>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >>>>> > >>>>>> wrote: >>>>>> >>>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>>>> >>>>>>> >>>>>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>>>>> >>>>>>>> areas >>>>>>>> have to be tested here. >>>>>>>> >>>>>>>> >>>>>>>> I'd say that you have quite a bit of freedom in this regard. >>>>>>> GlusterFS >>>>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite >>>>>>> that covers basic operations (start & stop VM, migrate it), snapshots >>>>>>> and merging them, and whatever else would be important for storage >>>>>>> sanity. >>>>>>> >>>>>>> mpolednik >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik < >>>>>>> mpolednik at redhat.com >>>>>>> > >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>>>>>> >>>>>>>> >>>>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and >>>>>>>>> cinder, >>>>>>>>> >>>>>>>>> will >>>>>>>>>> have to check, since usually, we don't execute our automation on >>>>>>>>>> them. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any update on this? I believe the gluster tests were successful, >>>>>>>>>> OST >>>>>>>>>> >>>>>>>>> passes fine and unit tests pass fine, that makes the storage >>>>>>>>> backends >>>>>>>>> test the last required piece. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> +Elad >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg >>>>>>>>>> > >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Please make sure to run as much OST suites on this patch as >>>>>>>>>>>>> possible >>>>>>>>>>>>> >>>>>>>>>>>>> before merging ( using 'ci please build' ) >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> But note that OST is not a way to verify the patch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Such changes require testing with all storage types we support. >>>>>>>>>>>>> >>>>>>>>>>>>> Nir >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>>>>>>>>>>> mpolednik at redhat.com >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've created a patch[0] that is finally able to activate >>>>>>>>>>>>>>> libvirt's >>>>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting >>>>>>>>>>>>>>> functionality of our storage code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices; >>>>>>>>>>>>>>> bunch >>>>>>>>>>>>>>> of test changes and one XML generation caveat (storage is >>>>>>>>>>>>>>> handled >>>>>>>>>>>>>>> by >>>>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the >>>>>>>>>>>>>>> VDSM >>>>>>>>>>>>>>> level). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Because of the scope of the patch, I welcome >>>>>>>>>>>>>>> storage/virt/network >>>>>>>>>>>>>>> people to review the code and consider the implication this >>>>>>>>>>>>>>> change >>>>>>>>>>>>>>> has >>>>>>>>>>>>>>> on current/future features. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically >>>>>>>>>>>>>>> (as >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> part >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>>>>>>>>>> libvirt, >>>>>>>>>>>> running as root, was not able to play properly with root-squash >>>>>>>>>>>> nfs >>>>>>>>>>>> mounts. >>>>>>>>>>>> >>>>>>>>>>>> Have you attempted this use case? >>>>>>>>>>>> >>>>>>>>>>>> I join to Nir's request to run this with storage QE. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Raz Tamir >>>>>>>>>>> Manager, RHV QE >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>> >> >> From sbonazzo at redhat.com Fri Apr 27 11:43:26 2018 From: sbonazzo at redhat.com (Sandro Bonazzola) Date: Fri, 27 Apr 2018 13:43:26 +0200 Subject: [ovirt-devel] Fwd: [CentOS-devel] CentOS 7.4.1708 CR Available in CBS In-Reply-To: <20180426211604.otlgd3e4tmqk4tp7@ender.home.bstinson.com> References: <20180426211604.otlgd3e4tmqk4tp7@ender.home.bstinson.com> Message-ID: FYI, CentOS 7.5 beta is available in CR repos. I'm going to build qemu-kvm-ev for it in CBS but it will be published to release repo only when CentOS 7.5 will go GA. Please note that test repos will be updated with this build so in order to use test repo you'll need CR repo enabled. ---------- Forwarded message ---------- From: Brian Stinson Date: 2018-04-26 23:16 GMT+02:00 Subject: [CentOS-devel] CentOS 7.4.1708 CR Available in CBS To: centos-devel at centos.org Hi Folks, The CR (https://wiki.centos.org/AdditionalResources/Repositories/CR) repository was released earlier today. We have regenerated all the CentOS 7 buildroots in CBS so you can build against the content that will make it into the next point-release of CentOS Linux. Machines in ci.centos.org are available with CR as well so you may continue your testing. As we discussed earlier (https://lists.centos.org/pipermail/centos-devel/2018-April/016627.html) we are currently in Point-release Freeze which means anything tagged for release will *NOT* get pushed out to mirror.centos.org and we will be holding new push requests until we have a GA point-release. If there are any questions please find us here or in #centos-devel on Freenode. Cheers! -- Brian Stinson _______________________________________________ CentOS-devel mailing list CentOS-devel at centos.org https://lists.centos.org/mailman/listinfo/centos-devel -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA sbonazzo at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Fri Apr 27 15:01:43 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 27 Apr 2018 16:01:43 +0100 Subject: [ovirt-devel] [ OST Failure Report ] [ oVirt Master (ovirt-engine-sdk) ] [ 27-04-2018 ] [ 004_basic_sanity.verify_vm_import ] Message-ID: Hi, CQ reported a failure for test 004_basic_sanity.verify_vm_import on basic suite. It seems to me that its related to the reported change. can you please have a look? *Link and headline of suspected patches: https://gerrit.ovirt.org/#/c/89852/ - examples: upload ova as a virtual machine template Link to Job:* * http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7166/ Link to all logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7166/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-004_basic_sanity.py/ (Relevant) error snippet from the log: vdsm: * *2018-04-27 10:26:03,131-0400 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.dumpxmls succeeded in 0.00 seconds (__init__:311)2018-04-27 10:26:04,825-0400 DEBUG (tasks/8) [root] FAILED: = "virt-sparsify: error: libguestfs error: guestfs_launch failed.\nThis usually means the libguestfs appliance failed to start or crashed.\nDo:\n export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\nand run the command again. For further information, read:\n http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\nYou can also run 'libguestfs-test-tool' and post the *complete* output\ninto a bug report or message to the libguestfs mailing list.\n\nIf reporting bugs, run virt-sparsify with debugging enabled and include the \ncomplete output:\n\n virt-sparsify -v -x [...]\n"; = 1 (commands:87)2018-04-27 10:26:04,829-0400 INFO (tasks/8) [storage.SANLock] Releasing Lease(name='ee91b001-53e4-42b0-9dc1-3a24e5e2b273', path=u'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/c4befad8-ac1e-4cf7-929e-8c26e3a12935/images/46fd673b-4e94-4bc5-aab3-f7af16f11e19/ee91b001-53e4-42b0-9dc1-3a24e5e2b273.lease', offset=0) (clusterlock:435)2018-04-27 10:26:04,835-0400 INFO (tasks/8) [storage.SANLock] Successfully released Lease(name='ee91b001-53e4-42b0-9dc1-3a24e5e2b273', path=u'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/c4befad8-ac1e-4cf7-929e-8c26e3a12935/images/46fd673b-4e94-4bc5-aab3-f7af16f11e19/ee91b001-53e4-42b0-9dc1-3a24e5e2b273.lease', offset=0) (clusterlock:444)2018-04-27 10:26:04,835-0400 ERROR (tasks/8) [root] Job u'f7cc21f1-26fa-430a-93d0-671b402613d1' failed (jobs:221)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run self._run() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/sparsify_volume.py", line 56, in _run virtsparsify.sparsify_inplace(self._vol_info.path) File "/usr/lib/python2.7/site-packages/vdsm/virtsparsify.py", line 71, in sparsify_inplace raise cmdutils.Error(cmd, rc, out, err)Error: Command ['/usr/bin/virt-sparsify', '--machine-readable', '--in-place', u'/rhev/data-center/mnt/192.168.200engine: 2018-04-27 10:16:40,016-04 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (default task-9) [] An error occurred while fetching unregistered disks from Storage Domain id 'f25de789-f7ce-4b7a-9e92-156bae627b9c'{"jsonrpc": "2.0", "id": "38e21a74-88e9-4f1b-a4bf-8ba61f6a0438", "error": {"message": "Image does not exist in domain: u'image=79a81745-1140-4ba8-aa6b-16ab7820df5e, domain=9729b818-49e6-4ac6-b7a6-5d634bb06a83'", "code": 268}}^@2018-04-27 10:28:00,286-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Message received: {"jsonrpc": "2.0", "id": "38e21a74-88e9-4f1b-a4bf-8ba61f6a0438", "error": {"message": "Image does not exist in domain: u'image=79a81745-1140-4ba8-aa6b-16ab7820df5e, domain=9729b818-49e6-4ac6-b7a6-5d634bb06a83'", "code": 268}}2018-04-27 10:28:00,296-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [1d9c39cd] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command DeleteImageGroupVDS failed: Image does not exist in domain: u'image=79a81745-1140-4ba8-aa6b-16ab7820df5e, domain=9729b818-49e6-4ac6-b7a6-5d634bb06a83'2018-04-27 10:28:00,296-04 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [1d9c39cd] Command 'DeleteImageGroupVDSCommand( DeleteImageGroupVDSCommandParameters:{storagePoolId='42724c3b-3d09-4c1d-a716-56758c9ba2e3', ignoreFailoverLimit='false', storageDomainId='9729b818-49e6-4ac6-b7a6-5d634bb06a83', imageGroupId='79a81745-1140-4ba8-aa6b-16ab7820df5e', postZeros='false', discard='true', forceDelete='false'})' execution failed: IRSGenericException: IRSErrorException: Image does not exist in domain: u'image=79a81745-1140-4ba8-aa6b-16ab7820df5e, domain=9729b818-49e6-4ac6-b7a6-5d634bb06a83'2018-04-27 10:28:00,296-04 DEBUG [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [1d9c39cd] Exception: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Image does not exist in domain: u'image=79a81745-1140-4ba8-aa6b-16ab7820df5e, domain=9729b818-49e6-4ac6-b7a6-5d634bb06a83' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:180) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand.executeIrsBrokerCommand(DeleteImageGroupVDSCommand.java:30) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.lambda$executeVDSCommand$0(IrsBrokerCommand.java:98) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.runInControlledConcurrency(IrsProxy.java:274) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:95) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31) [dal.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:398) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand$$super(Unknown Source) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor271.invoke(Unknown Source) [:1.8.0_161] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_161] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_161] at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:49) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:77) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12) [common.jar:] at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) [:1.8.0_161] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_161] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_161] at org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeAroundInvoke(InterceptorMethodHandler.java:84) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeInterception(InterceptorMethodHandler.java:72) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.invoke(InterceptorMethodHandler.java:56) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2046) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand.performDeleteImageVdsmOperation(RemoveImageCommand.java:310) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand.executeCommand(RemoveImageCommand.java:112) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1133) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1286) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1935) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1346) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:400) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:450) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:432) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:638) [bll.jar:] at sun.reflect.GeneratedMethodAccessor551.invoke(Unknown Source) [:1.8.0_161] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_161] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_161] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509):2018-04-27 10:28:00,358-04 INFO [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [] Lock freed to object 'EngineLock:{exclusiveLocks='[2cf61dee-c89c-469a-8ab8-54aebd0ab875=VM, imported_vm=VM_NAME]', sharedLocks='[5e0b3417-5e0d-4f98-a3cb-1a78445de946=REMOTE_VM]'}'2018-04-27 10:28:00,372-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [] EVENT_ID: IMPORTEXPORT_IMPORT_VM_FAILED(1,153), Failed to import Vm imported_vm to Data Center test-dc, Cluster test-cluster* -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Fri Apr 27 16:34:53 2018 From: dron at redhat.com (Dafna Ron) Date: Fri, 27 Apr 2018 17:34:53 +0100 Subject: [ovirt-devel] OST Failure - Weekly update [/04/2018-20/04/2018] Message-ID: Hi, I wanted to give a short status on this week's failures and OST current status. I am glad to report that the issue with CQ alerts was resolved thanks to Barak and Evgheni. You can read more about the issue and how it was resolved here: https://ovirt-jira.atlassian.net/browse/OVIRT-1974 Currently we have one on-going possible regression which was reported to the list and to Arik. the change reported: https://gerrit.ovirt.org/#/c/89852/ - examples: upload ova as a virtual machine template. you can view the details in this Jira: https://ovirt-jira.atlassian.net/browse/OFT-648 The majority of issues we had this week were failed-build artifacts for fc27. There were two different cases, one was reported to Francesco who was already working on a fix to the issue and the second started and resolved during the evening/night of Apr 26-27. You can see the Jira to these two issues here: https://ovirt-jira.atlassian.net/browse/OFT-605 https://ovirt-jira.atlassian.net/browse/OFT-612 There was an infra issue with Mirrors not being available for a few minutes. the issue was momentarily and was resolved on its own. https://ovirt-jira.atlassian.net/browse/OFT-606 *Below you can see the chart for this week's resolved issues but cause of failure:**Code* = regression of working components/functionalities *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power outages *OST Tests* = package related issues, failed build artifacts *Below is a chart showing failures by suite type: * *Below is a chart showing failures by version type: * *Below you can see the number of reported failures by resolution status:* Thanks, Dafna -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6773 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25509 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6360 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30895 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5230 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26666 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30036 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5559 bytes Desc: not available URL: From ykaul at redhat.com Fri Apr 27 16:57:52 2018 From: ykaul at redhat.com (Yaniv Kaul) Date: Fri, 27 Apr 2018 19:57:52 +0300 Subject: [ovirt-devel] OST Failure - Weekly update [/04/2018-20/04/2018] In-Reply-To: References: Message-ID: On Fri, Apr 27, 2018 at 7:34 PM, Dafna Ron wrote: > Hi, > > I wanted to give a short status on this week's failures and OST current > status. > > I am glad to report that the issue with CQ alerts was resolved thanks to > Barak and Evgheni. > You can read more about the issue and how it was resolved here: > https://ovirt-jira.atlassian.net/browse/OVIRT-1974 > How was the VM2 high-performance with vNUMA and pinned to more than one host (a race) was solved? > > > Currently we have one on-going possible regression which was reported to > the list and to Arik. > the change reported: https://gerrit.ovirt.org/#/c/89852/ - examples: > upload ova as a virtual machine template. > you can view the details in this Jira: https://ovirt-jira.atlassian. > net/browse/OFT-648 > I don't understand how that *example* script could have caused this regression - in a complete different scenario (virt-sparsify fails because of libguestfs issue). Y. > > > The majority of issues we had this week were failed-build artifacts for > fc27. There were two different cases, one was reported to Francesco who was > already working on a fix to the issue and the second started and resolved > during the evening/night of Apr 26-27. > You can see the Jira to these two issues here: > > https://ovirt-jira.atlassian.net/browse/OFT-605 > https://ovirt-jira.atlassian.net/browse/OFT-612 > > There was an infra issue with Mirrors not being available for a few > minutes. the issue was momentarily and was resolved on its own. > > https://ovirt-jira.atlassian.net/browse/OFT-606 > > > > *Below you can see the chart for this week's resolved issues but cause of > failure:**Code* = regression of working components/functionalities > *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power > outages > *OST Tests* = package related issues, failed build artifacts > > > > > > > > > > > > > *Below is a chart showing failures by suite type: * > > > > > > *Below is a chart showing failures by version type: * > > > > > *Below you can see the number of reported failures by resolution status:* > Thanks, > Dafna > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30895 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5230 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30036 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6773 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25509 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5559 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26666 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6360 bytes Desc: not available URL: From duck at redhat.com Sat Apr 28 00:34:55 2018 From: duck at redhat.com (=?UTF-8?B?TWFyYyBEZXF1w6huZXMgKER1Y2sp?=) Date: Sat, 28 Apr 2018 09:34:55 +0900 Subject: [ovirt-devel] Mailing-Lists upgrade In-Reply-To: References: Message-ID: <9d6d9e1f-798a-1a40-4ca5-f3d1463ac752@redhat.com> Quack, A few months ago we had to rollback the migration because of a nasty bug. This is fixed in recent versions of Mailman 3 so we're rescheduling it on Tuesday 8th during the slot 11:00-12:00 JST. Please read the original announcement with full detail below. \_o< On 09/07/2017 01:12 PM, Marc Dequ?nes (Duck) wrote: > Quack, > > On behalf of the oVirt infra team, I'd like to announce the current > Mailing-Lists system is going to be upgraded to a brand new Mailman 3 > installation on Monday during the slot 11:00-12:00 JST. > > It should not take a full hour to migrate as we already made incremental > synchronization with the current system but better keep some margin. The > system will then take over delivery of the mails but might be a bit slow > at first as it needs to reindex all the archived mails (which might take > a few hours). > > To manage your subscriptions and delivery settings you can do this > easily on the much nicer web interface (https://lists.ovirt.org). There > is a notion of account so you don't need to login separately for each ML. > > You can Sign In using Fedora, GitHub or Google or create a local account > if you prefer. Please keep in mind signing in with a different method > would create separate accounts (which cannot be merged at the moment). > But you can easily link your account to other authentication methods in > your settings (click on you name in the up-right corner -> Account -> > Account Connections). > > As for the original mail archives, because the previous system did not > have stable URLs, we cannot create mappings to the new pages. We decided > to keep the old archives around on the same URL (/pipermail), so the > Internet links would still work fine. > > Hope you'd be happy with the new system. > \_o< > > > > _______________________________________________ > Infra mailing list > Infra at ovirt.org > http://lists.ovirt.org/mailman/listinfo/infra > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From eedri at redhat.com Mon Apr 30 12:15:03 2018 From: eedri at redhat.com (Eyal Edri) Date: Mon, 30 Apr 2018 15:15:03 +0300 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 1:53 PM, Sahina Bose wrote: > > > On Wed, Apr 25, 2018 at 3:54 PM, Sahina Bose wrote: > >> >> >> On Mon, Apr 23, 2018 at 6:28 PM, Sahina Bose wrote: >> >>> >>> On Mon, Apr 23, 2018 at 5:41 PM, Eyal Edri wrote: >>> >>>> Sahina, >>>> Any update on this? >>>> >>> >>> Sorry, haven't been able to spend any time on this. The last I checked >>> the HE install was failing at task - Get Local VM IP. >>> and there were no logs from HE VM to debug. >>> >>> Will spend sometime on this tomorrow >>> >> >> https://gerrit.ovirt.org/#/c/89953/ - fixes the issue, atleast when I >> tried this on my local setup. >> > > > The CI however still fails in the HE install with : > > TASK [Get local VM IP]", "[ ERROR ] fatal: [localhost]: FAILED! => {\"attempts\": 50, \"changed\": true, \"cmd\": \"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'\", \"delta\": \"0:00:00.043961\", \"end\": \"2018-04-25 05:51:34.226374\", \"rc\": 0, \"start\": \"2018-04-25 05:51:34.182413\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"\", \"stdout_lines\": []}" > > > > FWIW, my local setup , ost repo was at I3fc2976ab2400e5908760aadc3258 > 329c0ffdf4d > Any update? suites are still failing. > > >> >> >>> >>>> On Wed, Apr 18, 2018 at 3:40 PM, Sandro Bonazzola >>>> wrote: >>>> >>>>> >>>>> >>>>> 2018-04-18 9:37 GMT+02:00 Eyal Edri : >>>>> >>>>>> FYI, >>>>>> >>>>>> I've disabled the 4.2 and master HC suites nightly run on CI as they >>>>>> are constantly failing for almost 3 weeks and spamming the mailing lists. >>>>>> >>>>> >>>>> >>>>> HC uses gdeploy 2.0.6 which was released in December and was based on >>>>> ansible 2.4. >>>>> ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is >>>>> not supporting ansible-2.5 properly. >>>>> I had no time to validate my guess with proof, so please Sahina cross >>>>> check this. >>>>> >>>>> >>>>> >>>>>> >>>>>> I think this should get higher priority for a fix if we want it to >>>>>> provide any value, >>>>>> Work can continue using the manual jobs or via check-patch. >>>>>> >>>>>> >>>>>> On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim >>>>>> wrote: >>>>>> >>>>>>> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >>>>>>> The HC suites still failing and it's hard to understand why without >>>>>>> the logs from the engine VM. >>>>>>> >>>>>>> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi < >>>>>>>> stirabos at redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>>>>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>>>>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>>>>>>> >>>>>>>>>> Both the 4.2 and master suites are failing on getting local VM IP. >>>>>>>>>> Any idea what changed or if I have to change the test? >>>>>>>>>> >>>>>>>>>> thanks! >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Sahina, >>>>>>>>> 4.2 and master suite non HC are correctly running this morning. >>>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>>>>>>> >>>>>>>>> I'll try to check the difference with HC suites. >>>>>>>>> >>>>>>>>> Are you using more than one subnet in the HC suites? >>>>>>>>> >>>>>>>> >>>>>>>> No, I'm not. And we havent's changed anything related to network in >>>>>>>> the test suite. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *GAL bEN HAIM* >>>>>>> RHV DEVOPS >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Eyal edri >>>>>> >>>>>> >>>>>> MANAGER >>>>>> >>>>>> RHV DevOps >>>>>> >>>>>> EMEA VIRTUALIZATION R&D >>>>>> >>>>>> >>>>>> Red Hat EMEA >>>>>> TRIED. TESTED. TRUSTED. >>>>>> >>>>>> phone: +972-9-7692018 >>>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>>>> >>>>>> _______________________________________________ >>>>>> Devel mailing list >>>>>> Devel at ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> SANDRO BONAZZOLA >>>>> >>>>> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >>>>> >>>>> Red Hat EMEA >>>>> >>>>> sbonazzo at redhat.com >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Eyal edri >>>> >>>> >>>> MANAGER >>>> >>>> RHV DevOps >>>> >>>> EMEA VIRTUALIZATION R&D >>>> >>>> >>>> Red Hat EMEA >>>> TRIED. TESTED. TRUSTED. >>>> >>>> phone: +972-9-7692018 >>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>> >>> >>> >> > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabose at redhat.com Mon Apr 30 12:59:09 2018 From: sabose at redhat.com (Sahina Bose) Date: Mon, 30 Apr 2018 18:29:09 +0530 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: On Mon, Apr 30, 2018 at 5:45 PM, Eyal Edri wrote: > > > On Wed, Apr 25, 2018 at 1:53 PM, Sahina Bose wrote: > >> >> >> On Wed, Apr 25, 2018 at 3:54 PM, Sahina Bose wrote: >> >>> >>> >>> On Mon, Apr 23, 2018 at 6:28 PM, Sahina Bose wrote: >>> >>>> >>>> On Mon, Apr 23, 2018 at 5:41 PM, Eyal Edri wrote: >>>> >>>>> Sahina, >>>>> Any update on this? >>>>> >>>> >>>> Sorry, haven't been able to spend any time on this. The last I checked >>>> the HE install was failing at task - Get Local VM IP. >>>> and there were no logs from HE VM to debug. >>>> >>>> Will spend sometime on this tomorrow >>>> >>> >>> https://gerrit.ovirt.org/#/c/89953/ - fixes the issue, atleast when I >>> tried this on my local setup. >>> >> >> >> The CI however still fails in the HE install with : >> >> TASK [Get local VM IP]", "[ ERROR ] fatal: [localhost]: FAILED! => {\"attempts\": 50, \"changed\": true, \"cmd\": \"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'\", \"delta\": \"0:00:00.043961\", \"end\": \"2018-04-25 05:51:34.226374\", \"rc\": 0, \"start\": \"2018-04-25 05:51:34.182413\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"\", \"stdout_lines\": []}" >> >> >> >> FWIW, my local setup , ost repo was at I3fc2976ab2400e5908760aadc3258 >> 329c0ffdf4d >> > > > Any update? suites are still failing. > The suites work locally, on the CI systems they fail with above error. Unfortunately, no clue as to why this is so. With the current run, are we able to get logs from the engine VM? > >> >> >>> >>> >>>> >>>>> On Wed, Apr 18, 2018 at 3:40 PM, Sandro Bonazzola >>>> > wrote: >>>>> >>>>>> >>>>>> >>>>>> 2018-04-18 9:37 GMT+02:00 Eyal Edri : >>>>>> >>>>>>> FYI, >>>>>>> >>>>>>> I've disabled the 4.2 and master HC suites nightly run on CI as they >>>>>>> are constantly failing for almost 3 weeks and spamming the mailing lists. >>>>>>> >>>>>> >>>>>> >>>>>> HC uses gdeploy 2.0.6 which was released in December and was based on >>>>>> ansible 2.4. >>>>>> ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is >>>>>> not supporting ansible-2.5 properly. >>>>>> I had no time to validate my guess with proof, so please Sahina cross >>>>>> check this. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> I think this should get higher priority for a fix if we want it to >>>>>>> provide any value, >>>>>>> Work can continue using the manual jobs or via check-patch. >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim >>>>>>> wrote: >>>>>>> >>>>>>>> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >>>>>>>> The HC suites still failing and it's hard to understand why without >>>>>>>> the logs from the engine VM. >>>>>>>> >>>>>>>> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi < >>>>>>>>> stirabos at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>>>>>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>>>>>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>>>>>>>> >>>>>>>>>>> Both the 4.2 and master suites are failing on getting local VM >>>>>>>>>>> IP. >>>>>>>>>>> Any idea what changed or if I have to change the test? >>>>>>>>>>> >>>>>>>>>>> thanks! >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Sahina, >>>>>>>>>> 4.2 and master suite non HC are correctly running this morning. >>>>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>>>>>>>> >>>>>>>>>> I'll try to check the difference with HC suites. >>>>>>>>>> >>>>>>>>>> Are you using more than one subnet in the HC suites? >>>>>>>>>> >>>>>>>>> >>>>>>>>> No, I'm not. And we havent's changed anything related to network >>>>>>>>> in the test suite. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *GAL bEN HAIM* >>>>>>>> RHV DEVOPS >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Eyal edri >>>>>>> >>>>>>> >>>>>>> MANAGER >>>>>>> >>>>>>> RHV DevOps >>>>>>> >>>>>>> EMEA VIRTUALIZATION R&D >>>>>>> >>>>>>> >>>>>>> Red Hat EMEA >>>>>>> TRIED. TESTED. TRUSTED. >>>>>>> >>>>>>> phone: +972-9-7692018 >>>>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Devel mailing list >>>>>>> Devel at ovirt.org >>>>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> SANDRO BONAZZOLA >>>>>> >>>>>> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >>>>>> >>>>>> Red Hat EMEA >>>>>> >>>>>> sbonazzo at redhat.com >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Eyal edri >>>>> >>>>> >>>>> MANAGER >>>>> >>>>> RHV DevOps >>>>> >>>>> EMEA VIRTUALIZATION R&D >>>>> >>>>> >>>>> Red Hat EMEA >>>>> TRIED. TESTED. TRUSTED. >>>>> >>>>> phone: +972-9-7692018 >>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>>> >>>> >>>> >>> >> > > > -- > > Eyal edri > > > MANAGER > > RHV DevOps > > EMEA VIRTUALIZATION R&D > > > Red Hat EMEA > TRIED. TESTED. TRUSTED. > phone: +972-9-7692018 > irc: eedri (on #tlv #rhev-dev #rhev-integ) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eedri at redhat.com Mon Apr 30 13:13:27 2018 From: eedri at redhat.com (Eyal Edri) Date: Mon, 30 Apr 2018 16:13:27 +0300 Subject: [ovirt-devel] Update: HC suites failing for 3 weeks ( was: [OST][HC] HE fails to deploy ) In-Reply-To: References: Message-ID: On Mon, Apr 30, 2018 at 3:59 PM, Sahina Bose wrote: > > > On Mon, Apr 30, 2018 at 5:45 PM, Eyal Edri wrote: > >> >> >> On Wed, Apr 25, 2018 at 1:53 PM, Sahina Bose wrote: >> >>> >>> >>> On Wed, Apr 25, 2018 at 3:54 PM, Sahina Bose wrote: >>> >>>> >>>> >>>> On Mon, Apr 23, 2018 at 6:28 PM, Sahina Bose wrote: >>>> >>>>> >>>>> On Mon, Apr 23, 2018 at 5:41 PM, Eyal Edri wrote: >>>>> >>>>>> Sahina, >>>>>> Any update on this? >>>>>> >>>>> >>>>> Sorry, haven't been able to spend any time on this. The last I checked >>>>> the HE install was failing at task - Get Local VM IP. >>>>> and there were no logs from HE VM to debug. >>>>> >>>>> Will spend sometime on this tomorrow >>>>> >>>> >>>> https://gerrit.ovirt.org/#/c/89953/ - fixes the issue, atleast when I >>>> tried this on my local setup. >>>> >>> >>> >>> The CI however still fails in the HE install with : >>> >>> TASK [Get local VM IP]", "[ ERROR ] fatal: [localhost]: FAILED! => {\"attempts\": 50, \"changed\": true, \"cmd\": \"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'\", \"delta\": \"0:00:00.043961\", \"end\": \"2018-04-25 05:51:34.226374\", \"rc\": 0, \"start\": \"2018-04-25 05:51:34.182413\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"\", \"stdout_lines\": []}" >>> >>> >>> >>> FWIW, my local setup , ost repo was at I3fc2976ab2400e5908760aadc3258 >>> 329c0ffdf4d >>> >> >> >> Any update? suites are still failing. >> > > The suites work locally, on the CI systems they fail with above error. > Unfortunately, no clue as to why this is so. > With the current run, are we able to get logs from the engine VM? > The logs should be in the job build-artifacts [1] [1] http://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/122/consoleFull > > >> >>> >>> >>>> >>>> >>>>> >>>>>> On Wed, Apr 18, 2018 at 3:40 PM, Sandro Bonazzola < >>>>>> sbonazzo at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> 2018-04-18 9:37 GMT+02:00 Eyal Edri : >>>>>>> >>>>>>>> FYI, >>>>>>>> >>>>>>>> I've disabled the 4.2 and master HC suites nightly run on CI as >>>>>>>> they are constantly failing for almost 3 weeks and spamming the mailing >>>>>>>> lists. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> HC uses gdeploy 2.0.6 which was released in December and was based >>>>>>> on ansible 2.4. >>>>>>> ansible-2.5 landed 3 weeks ago in EPEL, my guess is that gdeploy is >>>>>>> not supporting ansible-2.5 properly. >>>>>>> I had no time to validate my guess with proof, so please Sahina >>>>>>> cross check this. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> I think this should get higher priority for a fix if we want it to >>>>>>>> provide any value, >>>>>>>> Work can continue using the manual jobs or via check-patch. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 16, 2018 at 10:56 AM, Gal Ben Haim >>>>>>> > wrote: >>>>>>>> >>>>>>>>> Any update on https://gerrit.ovirt.org/#/c/88887/ ? >>>>>>>>> The HC suites still failing and it's hard to understand why >>>>>>>>> without the logs from the engine VM. >>>>>>>>> >>>>>>>>> On Sat, Apr 7, 2018 at 7:19 AM, Sahina Bose >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 6, 2018 at 1:10 PM, Simone Tiraboschi < >>>>>>>>>> stirabos at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 6, 2018 at 9:28 AM, Sahina Bose >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> 2018-04-05 20:46:52,773-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get local VM IP] >>>>>>>>>>>> 2018-04-05 20:55:28,217-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'end': u'2018-04-05 20:55:28.046320', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-04-05 20:55:28.000470', u'attempts': 50, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.045850', u'stdout_lines': []} >>>>>>>>>>>> 2018-04-05 20:55:28,318-0400 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:24:d3:63 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.045850", "end": "2018-04-05 20:55:28.046320", "rc": 0, "start": "2018-04-05 20:55:28.000470", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >>>>>>>>>>>> >>>>>>>>>>>> Both the 4.2 and master suites are failing on getting local VM >>>>>>>>>>>> IP. >>>>>>>>>>>> Any idea what changed or if I have to change the test? >>>>>>>>>>>> >>>>>>>>>>>> thanks! >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Sahina, >>>>>>>>>>> 4.2 and master suite non HC are correctly running this morning. >>>>>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>>>>> rt-system-tests_he-basic-ansible-suite-master/146/ >>>>>>>>>>> http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi >>>>>>>>>>> rt-system-tests_he-basic-ansible-suite-4.2/76/ >>>>>>>>>>> >>>>>>>>>>> I'll try to check the difference with HC suites. >>>>>>>>>>> >>>>>>>>>>> Are you using more than one subnet in the HC suites? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> No, I'm not. And we havent's changed anything related to network >>>>>>>>>> in the test suite. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *GAL bEN HAIM* >>>>>>>>> RHV DEVOPS >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Eyal edri >>>>>>>> >>>>>>>> >>>>>>>> MANAGER >>>>>>>> >>>>>>>> RHV DevOps >>>>>>>> >>>>>>>> EMEA VIRTUALIZATION R&D >>>>>>>> >>>>>>>> >>>>>>>> Red Hat EMEA >>>>>>>> TRIED. TESTED. TRUSTED. >>>>>>>> >>>>>>>> phone: +972-9-7692018 >>>>>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Devel mailing list >>>>>>>> Devel at ovirt.org >>>>>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> SANDRO BONAZZOLA >>>>>>> >>>>>>> ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D >>>>>>> >>>>>>> Red Hat EMEA >>>>>>> >>>>>>> sbonazzo at redhat.com >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Eyal edri >>>>>> >>>>>> >>>>>> MANAGER >>>>>> >>>>>> RHV DevOps >>>>>> >>>>>> EMEA VIRTUALIZATION R&D >>>>>> >>>>>> >>>>>> Red Hat EMEA >>>>>> TRIED. TESTED. TRUSTED. >>>>>> >>>>>> phone: +972-9-7692018 >>>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> >> Eyal edri >> >> >> MANAGER >> >> RHV DevOps >> >> EMEA VIRTUALIZATION R&D >> >> >> Red Hat EMEA >> TRIED. TESTED. TRUSTED. >> phone: +972-9-7692018 >> irc: eedri (on #tlv #rhev-dev #rhev-integ) >> > > -- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Mon Apr 30 13:15:11 2018 From: dron at redhat.com (Dafna Ron) Date: Mon, 30 Apr 2018 14:15:11 +0100 Subject: [ovirt-devel] OST Failure - Weekly update [/04/2018-20/04/2018] In-Reply-To: References: Message-ID: On Fri, Apr 27, 2018 at 5:57 PM, Yaniv Kaul wrote: > > > On Fri, Apr 27, 2018 at 7:34 PM, Dafna Ron wrote: > >> Hi, >> >> I wanted to give a short status on this week's failures and OST current >> status. >> >> I am glad to report that the issue with CQ alerts was resolved thanks to >> Barak and Evgheni. >> You can read more about the issue and how it was resolved here: >> https://ovirt-jira.atlassian.net/browse/OVIRT-1974 >> > > How was the VM2 high-performance with vNUMA and pinned to more than one > host (a race) was solved? > I was not seeing it all week, however, we had just had a failure for that today: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7169/ > > >> >> >> Currently we have one on-going possible regression which was reported to >> the list and to Arik. >> the change reported: https://gerrit.ovirt.org/#/c/89852/ - examples: >> upload ova as a virtual machine template. >> you can view the details in this Jira: https://ovirt-jira.atlassian.n >> et/browse/OFT-648 >> > > I don't understand how that *example* script could have caused this > regression - in a complete different scenario (virt-sparsify fails because > of libguestfs issue). > Y. > I noticed the "example" but since it was reported as a failure I wanted to make sure nothing in the "example" caused the failure which is why I sent to the list and asked Arik to have a look. > >> >> >> The majority of issues we had this week were failed-build artifacts for >> fc27. There were two different cases, one was reported to Francesco who was >> already working on a fix to the issue and the second started and resolved >> during the evening/night of Apr 26-27. >> You can see the Jira to these two issues here: >> >> https://ovirt-jira.atlassian.net/browse/OFT-605 >> https://ovirt-jira.atlassian.net/browse/OFT-612 >> >> There was an infra issue with Mirrors not being available for a few >> minutes. the issue was momentarily and was resolved on its own. >> >> https://ovirt-jira.atlassian.net/browse/OFT-606 >> >> >> >> *Below you can see the chart for this week's resolved issues but cause of >> failure:**Code* = regression of working components/functionalities >> *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power >> outages >> *OST Tests* = package related issues, failed build artifacts >> >> >> >> >> >> >> >> >> >> >> >> >> *Below is a chart showing failures by suite type: * >> >> >> >> >> >> *Below is a chart showing failures by version type: * >> >> >> >> >> *Below you can see the number of reported failures by resolution status:* >> Thanks, >> Dafna >> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6773 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5230 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5559 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30895 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30036 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26666 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25509 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6360 bytes Desc: not available URL: From ahadas at redhat.com Mon Apr 30 13:29:40 2018 From: ahadas at redhat.com (Arik Hadas) Date: Mon, 30 Apr 2018 16:29:40 +0300 Subject: [ovirt-devel] OST Failure - Weekly update [/04/2018-20/04/2018] In-Reply-To: References: Message-ID: On Mon, Apr 30, 2018 at 4:15 PM, Dafna Ron wrote: > > > On Fri, Apr 27, 2018 at 5:57 PM, Yaniv Kaul wrote: > >> >> >> On Fri, Apr 27, 2018 at 7:34 PM, Dafna Ron wrote: >> >>> Hi, >>> >>> I wanted to give a short status on this week's failures and OST current >>> status. >>> >>> I am glad to report that the issue with CQ alerts was resolved thanks to >>> Barak and Evgheni. >>> You can read more about the issue and how it was resolved here: >>> https://ovirt-jira.atlassian.net/browse/OVIRT-1974 >>> >> >> How was the VM2 high-performance with vNUMA and pinned to more than one >> host (a race) was solved? >> > > I was not seeing it all week, however, we had just had a failure for that > today: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7169/ > > > >> >> >>> >>> >>> Currently we have one on-going possible regression which was reported >>> to the list and to Arik. >>> the change reported: https://gerrit.ovirt.org/#/c/89852/ - examples: >>> upload ova as a virtual machine template. >>> you can view the details in this Jira: https://ovirt-jira.atlassian.n >>> et/browse/OFT-648 >>> >> >> I don't understand how that *example* script could have caused this >> regression - in a complete different scenario (virt-sparsify fails because >> of libguestfs issue). >> Y. >> > > I noticed the "example" but since it was reported as a failure I wanted to > make sure nothing in the "example" caused the failure which is why I sent > to the list and asked Arik to have a look. > No, that sdk-example could not have caused it. > > >> >>> >>> >>> The majority of issues we had this week were failed-build artifacts for >>> fc27. There were two different cases, one was reported to Francesco who was >>> already working on a fix to the issue and the second started and resolved >>> during the evening/night of Apr 26-27. >>> You can see the Jira to these two issues here: >>> >>> https://ovirt-jira.atlassian.net/browse/OFT-605 >>> https://ovirt-jira.atlassian.net/browse/OFT-612 >>> >>> There was an infra issue with Mirrors not being available for a few >>> minutes. the issue was momentarily and was resolved on its own. >>> >>> https://ovirt-jira.atlassian.net/browse/OFT-606 >>> >>> >>> >>> *Below you can see the chart for this week's resolved issues but cause >>> of failure:**Code* = regression of working components/functionalities >>> *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power >>> outages >>> *OST Tests* = package related issues, failed build artifacts >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *Below is a chart showing failures by suite type: * >>> >>> >>> >>> >>> >>> *Below is a chart showing failures by version type: * >>> >>> >>> >>> >>> *Below you can see the number of reported failures by resolution status:* >>> Thanks, >>> Dafna >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel at ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30036 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5230 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 5559 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26666 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 25509 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6360 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30895 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6773 bytes Desc: not available URL: From michal.skrivanek at redhat.com Mon Apr 30 14:55:34 2018 From: michal.skrivanek at redhat.com (Michal Skrivanek) Date: Mon, 30 Apr 2018 16:55:34 +0200 Subject: [ovirt-devel] OST Failure - Weekly update [/04/2018-20/04/2018] In-Reply-To: References: Message-ID: <7FD50304-0F8B-4BAC-B5E5-FF1E7C8C2D22@redhat.com> > On 30 Apr 2018, at 15:29, Arik Hadas wrote: > > > > On Mon, Apr 30, 2018 at 4:15 PM, Dafna Ron > wrote: > > > On Fri, Apr 27, 2018 at 5:57 PM, Yaniv Kaul > wrote: > > > On Fri, Apr 27, 2018 at 7:34 PM, Dafna Ron > wrote: > Hi, > > I wanted to give a short status on this week's failures and OST current status. > > I am glad to report that the issue with CQ alerts was resolved thanks to Barak and Evgheni. > You can read more about the issue and how it was resolved here: > https://ovirt-jira.atlassian.net/browse/OVIRT-1974 > > How was the VM2 high-performance with vNUMA and pinned to more than one host (a race) was solved? > > I was not seeing it all week, however, we had just had a failure for that today: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7169/ > > > > > > Currently we have one on-going possible regression which was reported to the list and to Arik. Dafna, how can we see that the error is consistent and triggered by this patch? Are there other builds passing, after this failure? I see some green builds afterwards, with ?"(ovirt-engine)?, does it mean the error is not happening all the time? Thanks, michal > the change reported: https://gerrit.ovirt.org/#/c/89852/ - examples: upload ova as a virtual machine template. > you can view the details in this Jira: https://ovirt-jira.atlassian.net/browse/OFT-648 > > I don't understand how that *example* script could have caused this regression - in a complete different scenario (virt-sparsify fails because of libguestfs issue). > Y. > > I noticed the "example" but since it was reported as a failure I wanted to make sure nothing in the "example" caused the failure which is why I sent to the list and asked Arik to have a look. > > No, that sdk-example could not have caused it. > > > > > > The majority of issues we had this week were failed-build artifacts for fc27. There were two different cases, one was reported to Francesco who was already working on a fix to the issue and the second started and resolved during the evening/night of Apr 26-27. > You can see the Jira to these two issues here: > > https://ovirt-jira.atlassian.net/browse/OFT-605 > https://ovirt-jira.atlassian.net/browse/OFT-612 > > There was an infra issue with Mirrors not being available for a few minutes. the issue was momentarily and was resolved on its own. > > https://ovirt-jira.atlassian.net/browse/OFT-606 > > Below you can see the chart for this week's resolved issues but cause of failure: > > Code = regression of working components/functionalities > Infra = infrastructure/OST Infrastructure/Lago related issues/Power outages > OST Tests = package related issues, failed build artifacts > > > > > Below is a chart showing failures by suite type: > > > > > > > > > Below is a chart showing failures by version type: > > > > > Below you can see the number of reported failures by resolution status: > > > > > > Thanks, > Dafna > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > > > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From dron at redhat.com Mon Apr 30 16:59:07 2018 From: dron at redhat.com (Dafna Ron) Date: Mon, 30 Apr 2018 17:59:07 +0100 Subject: [ovirt-devel] OST Failure - Weekly update [/04/2018-20/04/2018] In-Reply-To: <7FD50304-0F8B-4BAC-B5E5-FF1E7C8C2D22@redhat.com> References: <7FD50304-0F8B-4BAC-B5E5-FF1E7C8C2D22@redhat.com> Message-ID: On Mon, Apr 30, 2018 at 3:55 PM, Michal Skrivanek < michal.skrivanek at redhat.com> wrote: > > > On 30 Apr 2018, at 15:29, Arik Hadas wrote: > > > > On Mon, Apr 30, 2018 at 4:15 PM, Dafna Ron wrote: > >> >> >> On Fri, Apr 27, 2018 at 5:57 PM, Yaniv Kaul wrote: >> >>> >>> >>> On Fri, Apr 27, 2018 at 7:34 PM, Dafna Ron wrote: >>> >>>> Hi, >>>> >>>> I wanted to give a short status on this week's failures and OST current >>>> status. >>>> >>>> I am glad to report that the issue with CQ alerts was resolved thanks >>>> to Barak and Evgheni. >>>> You can read more about the issue and how it was resolved here: >>>> https://ovirt-jira.atlassian.net/browse/OVIRT-1974 >>>> >>> >>> How was the VM2 high-performance with vNUMA and pinned to more than one >>> host (a race) was solved? >>> >> >> I was not seeing it all week, however, we had just had a failure for that >> today: http://jenkins.ovirt.org/job/ovirt-master_change-queu >> e-tester/7169/ >> >> >> >>> >>> >>>> >>>> >>>> Currently we have one on-going possible regression which was reported >>>> to the list and to Arik. >>>> >>> > Dafna, > how can we see that the error is consistent and triggered by this patch? > Are there other builds passing, after this failure? > I see some green builds afterwards, with ?"(ovirt-engine)?, does it mean > the error is not happening all the time? > > Thanks, > michal > Hi Michal, if I think the change may be related and I reported it to the developer, and yet the developer thinks that his change was not causing the failure, a way to debug it is to re-add this change to CQ and if it fails again on the same test the developer should take a closer look. CQ tries to isolate failures so that they would not break the tests which is why you are seeing green builds after the failure. I re-added this change today at 11:30 and indeed it passed: http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt-master_change-queue-tester/7190/ Thanks, Dafna > > > the change reported: https://gerrit.ovirt.org/#/c/89852/ - examples: >>>> upload ova as a virtual machine template. >>>> you can view the details in this Jira: https://ovirt-jira.atlas >>>> sian.net/browse/OFT-648 >>>> >>> >>> I don't understand how that *example* script could have caused this >>> regression - in a complete different scenario (virt-sparsify fails because >>> of libguestfs issue). >>> Y. >>> >> >> I noticed the "example" but since it was reported as a failure I wanted >> to make sure nothing in the "example" caused the failure which is why I >> sent to the list and asked Arik to have a look. >> > > No, that sdk-example could not have caused it. > > >> >> >>> >>>> >>>> >>>> The majority of issues we had this week were failed-build artifacts for >>>> fc27. There were two different cases, one was reported to Francesco who was >>>> already working on a fix to the issue and the second started and resolved >>>> during the evening/night of Apr 26-27. >>>> You can see the Jira to these two issues here: >>>> >>>> https://ovirt-jira.atlassian.net/browse/OFT-605 >>>> https://ovirt-jira.atlassian.net/browse/OFT-612 >>>> >>>> There was an infra issue with Mirrors not being available for a few >>>> minutes. the issue was momentarily and was resolved on its own. >>>> >>>> https://ovirt-jira.atlassian.net/browse/OFT-606 >>>> >>>> >>>> >>>> *Below you can see the chart for this week's resolved issues but cause >>>> of failure:**Code* = regression of working components/functionalities >>>> *Infra* = infrastructure/OST Infrastructure/Lago related issues/Power >>>> outages >>>> *OST Tests* = package related issues, failed build artifacts >>>> >>>> >>>> >>>> ** >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Below is a chart showing failures by suite type: >>>> * >>>> >>>> >>>> >>>> >>>> >>>> *Below is a chart showing failures by version type: >>>> * >>>> >>>> >>>> >>>> >>>> *Below you can see the number of reported failures by resolution >>>> status:* >>>> Thanks, >>>> Dafna >>>> >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel at ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >> >> _______________________________________________ >> Devel mailing list >> Devel at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > _______________________________________________ > Devel mailing list > Devel at ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: