
Triggered a sanity automation execution using [1], which covers all the requested areas, on iSCSI, NFS and Gluster. I'll update with the results. [1] *https://gerrit.ovirt.org/#/c/90906/ <https://gerrit.ovirt.org/#/c/90906/>* vdsm-4.20.28-6.gitc23aef6.el7.x86_64 On Tue, May 29, 2018 at 4:26 PM, Martin Polednik <mpolednik@redhat.com> wrote:
On 29/05/18 15:30 +0300, Elad Ben Aharon wrote:
Hi Martin,
Can you please create a cerry pick patch that is based on 4.2?
See https://gerrit.ovirt.org/#/c/90906/. The CI failure isn unrelated (storage needs real env).
mpolednik
Thanks
On Tue, May 29, 2018 at 1:34 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 29, 2018 at 1:21 PM, Elad Ben Aharon <ebenahar@redhat.com>
wrote:
Hi Dan,
In the last execution, the success rate was very low due to a large number of failures on start VM caused, according to Michal, by the vdsm-hook-allocate_net that was installed on the host.
This is the latest status here, would you like me to re-execute?
yes, of course. but you should rebase Polednik's code on top of *current* ovirt-4.2.3 branch.
If so, with or W/O vdsm-hook-allocate_net installed?
There was NO reason to have that installed. Please keep it (and any other needless code) out of the test environment.
On Tue, May 29, 2018 at 1:14 PM, Dan Kenigsberg <danken@redhat.com>
On Mon, May 7, 2018 at 3:53 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
Hi Elad, why did you install vdsm-hook-allocate_net?
adding Dan as I think the hook is not supposed to fail this badly in
any
case
yep, this looks bad and deserves a little bug report. Installing this little hook should not block vm startup.
But more importantly - what is the conclusion of this thread? Do we have a green light from QE to take this in?
Thanks, michal
On 5 May 2018, at 19:22, Elad Ben Aharon <ebenahar@redhat.com>
wrote:
Start VM fails on:
2018-05-05 17:53:27,399+0300 INFO (vm/e6ce66ce) [virt.vm] (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') drive 'vda' path:
'dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-
9a78-bdd13a843c62/images/6cdabfe5-
d1ca-40af-ae63-9834f235d1c8/7ef97445-30e6-4435-8425-f35a01928211' ->
u'*dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938- 9a78-bdd13a843c62/images/6cdabfe5-d1ca-40af-ae63- 9834f235d1c8/7ef97445-30e6-4435-8425- f35a01928211' (storagexml:334) 2018-05-05 17:53:27,888+0300 INFO (jsonrpc/1) [vdsm.api] START getSpmStatus(spUUID='940fe6f3-b0c6-4d0c-a921-198e7819c1cc', options=None) from=::ffff:10.35.161.127,53512, task_id=c70ace39-dbfe-4f5c-ae49-a1e3a82c 2758 (api:46) 2018-05-05 17:53:27,909+0300 INFO (vm/e6ce66ce) [root] /usr/libexec/vdsm/hooks/before_device_create/10_allocate_net: rc=2 err=vm net allocation hook: [unexpected error]: Traceback (most recent call last): File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ net", line 105, in <module> main() File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ net", line 93, in main allocate_random_network(device_xml) File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ net", line 62, in allocate_random_network net = _get_random_network() File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ net", line 50, in _get_random_network available_nets = _parse_nets() File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ net", line 46, in _parse_nets return [net for net in os.environ[AVAIL_NETS_KEY].split()] File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__ raise KeyError(key) KeyError: 'equivnets'
(hooks:110) 2018-05-05 17:53:27,915+0300 ERROR (vm/e6ce66ce) [virt.vm] (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') The vm start process failed (vm:943) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2861, in _run domxml = hooks.before_vm_start(self._buildDomainXML(), File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2254, in _buildDomainXML dom, self.id, self._custom['custom']) File "/usr/lib/python2.7/site-packages/vdsm/virt/domxml_
line 240, in replace_device_xml_with_hooks_xml dev_custom) File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 134, in before_device_create params=customProperties) File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120, in _runHooksDir raise exception.HookError(err) HookError: Hook Error: ('vm net allocation hook: [unexpected error]: Traceback (most recent call last):\n File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
105, in <module>\n main()\n File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_ net", line 93, in main\n allocate_random_network(device_xml)\n File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
i n allocate_random_network\n net = _get_random_network()\n File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
in _get_random_network\n available_nets = _parse_nets()\n File "/us r/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 46, in _parse_nets\n return [net for net in os.environ[AVAIL_NETS_KEY].split()]\n File "/usr/lib64/python2.7/UserDict.py", line 23, in __getit em__\n raise KeyError(key)\nKeyError: \'equivnets\'\n\n\n',)
Hence, the success rate was 28% against 100% running with d/s (d/s). If needed, I'll compare against the latest master, but I think you get
picture with d/s.
vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64 libvirt-3.9.0-14.el7_5.3.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.2.x86_64 kernel 3.10.0-862.el7.x86_64 rhel7.5
Logs attached
On Sat, May 5, 2018 at 1:26 PM, Elad Ben Aharon < ebenahar@redhat.com> wrote: > > nvm, found gluster 3.12 repo, managed to install vdsm > > On Sat, May 5, 2018 at 1:12 PM, Elad Ben Aharon < ebenahar@redhat.com
> wrote: >> >> No, vdsm requires it: >> >> Error: Package: vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64 >> (/vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64) >> Requires: glusterfs-fuse >= 3.12 >> Installed: glusterfs-fuse-3.8.4-54.8.el7.x86_64 (@rhv-4.2.3) >> >> Therefore, vdsm package installation is skipped upon force install. >> >> On Sat, May 5, 2018 at 11:42 AM, Michal Skrivanek >> <michal.skrivanek@redhat.com> wrote: >>> >>> >>> >>> On 5 May 2018, at 00:38, Elad Ben Aharon <ebenahar@redhat.com> wrote: >>> >>> Hi guys, >>> >>> The vdsm build from the patch requires glusterfs-fuse > 3.12. This is >>> while the latest 4.2.3-5 d/s build requires 3.8.4 (3.4.0.59rhs-1.el7) >>> >>> >>> because it is still oVirt, not a downstream build. We can’t really do >>> downstream builds with unmerged changes:/ >>> >>> Trying to get this gluster-fuse build, so far no luck. >>> Is this requirement intentional? >>> >>> >>> it should work regardless, I guess you can force install it without >>> the >>> dependency >>> >>> >>> On Fri, May 4, 2018 at 2:38 PM, Michal Skrivanek >>> <michal.skrivanek@redhat.com> wrote: >>>> >>>> Hi Elad, >>>> to make it easier to compare, Martin backported the change to 4.2 so >>>> it >>>> is actually comparable with a run without that patch. Would you >>>> please try >>>> that out? >>>> It would be best to have 4.2 upstream and this[1] run to really >>>> minimize the noise. >>>> >>>> Thanks, >>>> michal >>>> >>>> [1] >>>> >>>> http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on- demand-el7-x86_64/28/ >>>> >>>> On 27 Apr 2018, at 09:23, Martin Polednik <mpolednik@redhat.com
>>>> wrote: >>>> >>>> On 24/04/18 00:37 +0300, Elad Ben Aharon wrote: >>>> >>>> I will update with the results of the next tier1 execution on latest >>>> 4.2.3 >>>> >>>> >>>> That isn't master but old branch though. Could you run it against >>>> *current* VDSM master? >>>> >>>> On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik >>>> <mpolednik@redhat.com> >>>> wrote: >>>> >>>> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote: >>>> >>>> Hi, I've triggered another execution [1] due to some issues I saw in >>>> the >>>> first which are not related to the patch. >>>> >>>> The success rate is 78% which is low comparing to tier1 executions >>>> with >>>> code from downstream builds (95-100% success rates) [2]. >>>> >>>> >>>> Could you run the current master (without the dynamic_ownership >>>> patch) >>>> so that we have viable comparision? >>>> >>>> From what I could see so far, there is an issue with move and copy >>>> >>>> operations to and from Gluster domains. For example [3]. >>>> >>>> The logs are attached. >>>> >>>> >>>> [1] >>>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv >>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/ >>>> <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv >>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/>* >>>> >>>> >>>> >>>> [2] >>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ >>>> >>>> rhv-4.2-ge-runner-tier1-after-upgrade/7/ >>>> >>>> >>>> >>>> [3] >>>> 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH >>>> deleteImage error=Image does not exist in domain: >>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >>>> from=: >>>> :ffff:10.35.161.182,40936, >>>> flow_id=disks_syncAction_ba6b2630-5976-4935, >>>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51) >>>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) >>>> [storage.TaskManager.Task] >>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error >>>> (task:875) >>>> Traceback (most recent call last): >>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py",
>>>> 882, >>>> in >>>> _run >>>> return fn(*args, **kargs) >>>> File "<string>", line 2, in deleteImage >>>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py",
>>>> in >>>> method >>>> ret = func(*args, **kwargs) >>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py",
>>>> 1503, >>>> in >>>> deleteImage >>>> raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) >>>> ImageDoesNotExistInSD: Image does not exist in domain: >>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' >>>> >>>> 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) >>>> [storage.TaskManager.Task] >>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is >>>> aborted: >>>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835
wrote: preprocess.py", line line 62, line 50, the line line 49, line -
>>>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code >>>> 268 >>>> (task:1181) >>>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] >>>> FINISH >>>> deleteImage error=Image does not exist in domain: >>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33, >>>> domain=e5fd29c8-52ba-467e-be09 >>>> -ca40ff054d >>>> d4' (dispatcher:82) >>>> >>>> >>>> >>>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon >>>> <ebenahar@redhat.com> >>>> wrote: >>>> >>>> Triggered a sanity tier1 execution [1] using [2], which covers all >>>> the >>>> >>>> requested areas, on iSCSI, NFS and Gluster. >>>> I'll update with the results. >>>> >>>> [1] >>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 >>>> _dev/job/rhv-4.2-ge-flow-storage/1161/ >>>> >>>> [2] >>>> https://gerrit.ovirt.org/#/c/89830/ >>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64 >>>> >>>> >>>> >>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik >>>> <mpolednik@redhat.com> >>>> wrote: >>>> >>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote: >>>> >>>> >>>> Hi Martin, >>>> >>>> >>>> I see [1] requires a rebase, can you please take care? >>>> >>>> >>>> Should be rebased. >>>> >>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster >>>> and >>>> >>>> FC. >>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, >>>> it's >>>> not >>>> stable enough at the moment. >>>> >>>> >>>> That is still pretty good. >>>> >>>> >>>> [1] https://gerrit.ovirt.org/#/c/89830/ >>>> >>>> >>>> >>>> Thanks >>>> >>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik >>>> <mpolednik@redhat.com >>>> > >>>> wrote: >>>> >>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote: >>>> >>>> >>>> Hi, sorry if I misunderstood, I waited for more input regarding what >>>> >>>> areas >>>> have to be tested here. >>>> >>>> >>>> I'd say that you have quite a bit of freedom in this regard. >>>> >>>> GlusterFS >>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some >>>> suite >>>> that covers basic operations (start & stop VM, migrate it), >>>> snapshots >>>> and merging them, and whatever else would be important for storage >>>> sanity. >>>> >>>> mpolednik >>>> >>>> >>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik < >>>> mpolednik@redhat.com >>>> > >>>> >>>> wrote: >>>> >>>> >>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote: >>>> >>>> >>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and >>>> cinder, >>>> >>>> will >>>> >>>> have to check, since usually, we don't execute our automation on >>>> them. >>>> >>>> >>>> Any update on this? I believe the gluster tests were successful, >>>> OST >>>> >>>> passes fine and unit tests pass fine, that makes the storage >>>> backends >>>> test the last required piece. >>>> >>>> >>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <ratamir@redhat.com> >>>> wrote: >>>> >>>> >>>> +Elad >>>> >>>> >>>> >>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg < danken@redhat.com >>>> >>>> > >>>> wrote: >>>> >>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer < nsoffer@redhat.com> >>>> wrote: >>>> >>>> >>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <eedri@redhat.com> >>>> >>>> wrote: >>>> >>>> >>>> Please make sure to run as much OST suites on this patch as >>>> >>>> possible >>>> >>>> before merging ( using 'ci please build' ) >>>> >>>> >>>> >>>> But note that OST is not a way to verify the patch. >>>> >>>> >>>> Such changes require testing with all storage types we support. >>>> >>>> Nir >>>> >>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik < >>>> mpolednik@redhat.com >>>> > >>>> >>>> wrote: >>>> >>>> >>>> Hey, >>>> >>>> >>>> I've created a patch[0] that is finally able to activate >>>> >>>> libvirt's >>>> dynamic_ownership for VDSM while not negatively affecting >>>> functionality of our storage code. >>>> >>>> That of course comes with quite a bit of code removal, mostly >>>> in >>>> the >>>> area of host devices, hwrng and anything that touches devices; >>>> bunch >>>> of test changes and one XML generation caveat (storage is >>>> handled >>>> by >>>> VDSM, therefore disk relabelling needs to be disabled on the >>>> VDSM >>>> level). >>>> >>>> Because of the scope of the patch, I welcome >>>> storage/virt/network >>>> people to review the code and consider the implication this >>>> change >>>> has >>>> on current/future features. >>>> >>>> [0] https://gerrit.ovirt.org/#/c/89830/ >>>> >>>> >>>> In particular: dynamic_ownership was set to 0 prehistorically >>>> (as >>>> >>>> >>>> part >>>> >>>> >>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because >>>> >>>> libvirt, >>>> running as root, was not able to play properly with root-squash >>>> nfs >>>> mounts. >>>> >>>> Have you attempted this use case? >>>> >>>> I join to Nir's request to run this with storage QE. >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> Raz Tamir >>>> Manager, RHV QE >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>>> >>>> >>> >>> >> >
<logs.tar.gz>