Hi Martin, 

Can you please create a cerry pick patch that is based on 4.2?


Thanks

On Tue, May 29, 2018 at 1:34 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, May 29, 2018 at 1:21 PM, Elad Ben Aharon <ebenahar@redhat.com> wrote:
> Hi Dan,
>
> In the last execution, the success rate was very low due to a large number
> of failures on start VM caused, according to Michal, by the
> vdsm-hook-allocate_net that was installed on the host.
>
> This is the latest status here, would you like me to re-execute?

yes, of course. but you should rebase Polednik's code on top of
*current* ovirt-4.2.3 branch.

> If so, with
> or W/O vdsm-hook-allocate_net installed?

There was NO reason to have that installed. Please keep it (and any
other needless code) out of the test environment.

>
> On Tue, May 29, 2018 at 1:14 PM, Dan Kenigsberg <danken@redhat.com> wrote:
>>
>> On Mon, May 7, 2018 at 3:53 PM, Michal Skrivanek
>> <michal.skrivanek@redhat.com> wrote:
>> > Hi Elad,
>> > why did you install vdsm-hook-allocate_net?
>> >
>> > adding Dan as I think the hook is not supposed to fail this badly in any
>> > case
>>
>> yep, this looks bad and deserves a little bug report. Installing this
>> little hook should not block vm startup.
>>
>> But more importantly - what is the conclusion of this thread? Do we
>> have a green light from QE to take this in?
>>
>>
>> >
>> > Thanks,
>> > michal
>> >
>> > On 5 May 2018, at 19:22, Elad Ben Aharon <ebenahar@redhat.com> wrote:
>> >
>> > Start VM fails on:
>> >
>> > 2018-05-05 17:53:27,399+0300 INFO  (vm/e6ce66ce) [virt.vm]
>> > (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') drive 'vda' path:
>> >
>> > 'dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-9a78-bdd13a843c62/images/6cdabfe5-
>> > d1ca-40af-ae63-9834f235d1c8/7ef97445-30e6-4435-8425-f35a01928211' ->
>> >
>> > u'*dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-9a78-bdd13a843c62/images/6cdabfe5-d1ca-40af-ae63-9834f235d1c8/7ef97445-30e6-4435-8425-
>> > f35a01928211' (storagexml:334)
>> > 2018-05-05 17:53:27,888+0300 INFO  (jsonrpc/1) [vdsm.api] START
>> > getSpmStatus(spUUID='940fe6f3-b0c6-4d0c-a921-198e7819c1cc',
>> > options=None)
>> > from=::ffff:10.35.161.127,53512,
>> > task_id=c70ace39-dbfe-4f5c-ae49-a1e3a82c
>> > 2758 (api:46)
>> > 2018-05-05 17:53:27,909+0300 INFO  (vm/e6ce66ce) [root]
>> > /usr/libexec/vdsm/hooks/before_device_create/10_allocate_net: rc=2
>> > err=vm
>> > net allocation hook: [unexpected error]: Traceback (most recent call
>> > last):
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
>> > line
>> > 105, in <module>
>> >    main()
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
>> > line
>> > 93, in main
>> >    allocate_random_network(device_xml)
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
>> > line
>> > 62, in allocate_random_network
>> >    net = _get_random_network()
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
>> > line
>> > 50, in _get_random_network
>> >    available_nets = _parse_nets()
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
>> > line
>> > 46, in _parse_nets
>> >    return [net for net in os.environ[AVAIL_NETS_KEY].split()]
>> >  File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
>> >    raise KeyError(key)
>> > KeyError: 'equivnets'
>> >
>> >
>> > (hooks:110)
>> > 2018-05-05 17:53:27,915+0300 ERROR (vm/e6ce66ce) [virt.vm]
>> > (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') The vm start process
>> > failed
>> > (vm:943)
>> > Traceback (most recent call last):
>> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in
>> > _startUnderlyingVm
>> >    self._run()
>> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2861, in
>> > _run
>> >    domxml = hooks.before_vm_start(self._buildDomainXML(),
>> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2254, in
>> > _buildDomainXML
>> >    dom, self.id, self._custom['custom'])
>> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/domxml_preprocess.py",
>> > line 240, in replace_device_xml_with_hooks_xml
>> >    dev_custom)
>> >  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 134,
>> > in
>> > before_device_create
>> >    params=customProperties)
>> >  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120,
>> > in
>> > _runHooksDir
>> >    raise exception.HookError(err)
>> > HookError: Hook Error: ('vm net allocation hook: [unexpected error]:
>> > Traceback (most recent call last):\n  File
>> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line
>> > 105, in
>> > <module>\n    main()\n
>> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
>> > line
>> > 93, in main\n    allocate_random_network(device_xml)\n  File
>> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 62,
>> > i
>> > n allocate_random_network\n    net = _get_random_network()\n  File
>> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 50,
>> > in
>> > _get_random_network\n    available_nets = _parse_nets()\n  File "/us
>> > r/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 46, in
>> > _parse_nets\n    return [net for net in
>> > os.environ[AVAIL_NETS_KEY].split()]\n  File
>> > "/usr/lib64/python2.7/UserDict.py", line 23, in __getit
>> > em__\n    raise KeyError(key)\nKeyError: \'equivnets\'\n\n\n',)
>> >
>> >
>> >
>> > Hence, the success rate was 28% against 100% running with d/s (d/s). If
>> > needed, I'll compare against the latest master, but I think you get the
>> > picture with d/s.
>> >
>> > vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64
>> > libvirt-3.9.0-14.el7_5.3.x86_64
>> > qemu-kvm-rhev-2.10.0-21.el7_5.2.x86_64
>> > kernel 3.10.0-862.el7.x86_64
>> > rhel7.5
>> >
>> >
>> > Logs attached
>> >
>> > On Sat, May 5, 2018 at 1:26 PM, Elad Ben Aharon <ebenahar@redhat.com>
>> > wrote:
>> >>
>> >> nvm, found gluster 3.12 repo, managed to install vdsm
>> >>
>> >> On Sat, May 5, 2018 at 1:12 PM, Elad Ben Aharon <ebenahar@redhat.com>
>> >> wrote:
>> >>>
>> >>> No, vdsm requires it:
>> >>>
>> >>> Error: Package: vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64
>> >>> (/vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64)
>> >>>           Requires: glusterfs-fuse >= 3.12
>> >>>           Installed: glusterfs-fuse-3.8.4-54.8.el7.x86_64 (@rhv-4.2.3)
>> >>>
>> >>> Therefore, vdsm package installation is skipped upon force install.
>> >>>
>> >>> On Sat, May 5, 2018 at 11:42 AM, Michal Skrivanek
>> >>> <michal.skrivanek@redhat.com> wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 5 May 2018, at 00:38, Elad Ben Aharon <ebenahar@redhat.com> wrote:
>> >>>>
>> >>>> Hi guys,
>> >>>>
>> >>>> The vdsm build from the patch requires glusterfs-fuse > 3.12. This is
>> >>>> while the latest 4.2.3-5 d/s build requires 3.8.4 (3.4.0.59rhs-1.el7)
>> >>>>
>> >>>>
>> >>>> because it is still oVirt, not a downstream build. We can’t really do
>> >>>> downstream builds with unmerged changes:/
>> >>>>
>> >>>> Trying to get this gluster-fuse build, so far no luck.
>> >>>> Is this requirement intentional?
>> >>>>
>> >>>>
>> >>>> it should work regardless, I guess you can force install it without
>> >>>> the
>> >>>> dependency
>> >>>>
>> >>>>
>> >>>> On Fri, May 4, 2018 at 2:38 PM, Michal Skrivanek
>> >>>> <michal.skrivanek@redhat.com> wrote:
>> >>>>>
>> >>>>> Hi Elad,
>> >>>>> to make it easier to compare, Martin backported the change to 4.2 so
>> >>>>> it
>> >>>>> is actually comparable with a run without that patch. Would you
>> >>>>> please try
>> >>>>> that out?
>> >>>>> It would be best to have 4.2 upstream and this[1] run to really
>> >>>>> minimize the noise.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> michal
>> >>>>>
>> >>>>> [1]
>> >>>>>
>> >>>>> http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-demand-el7-x86_64/28/
>> >>>>>
>> >>>>> On 27 Apr 2018, at 09:23, Martin Polednik <mpolednik@redhat.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> On 24/04/18 00:37 +0300, Elad Ben Aharon wrote:
>> >>>>>
>> >>>>> I will update with the results of the next tier1 execution on latest
>> >>>>> 4.2.3
>> >>>>>
>> >>>>>
>> >>>>> That isn't master but old branch though. Could you run it against
>> >>>>> *current* VDSM master?
>> >>>>>
>> >>>>> On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik
>> >>>>> <mpolednik@redhat.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote:
>> >>>>>
>> >>>>> Hi, I've triggered another execution [1] due to some issues I saw in
>> >>>>> the
>> >>>>> first which are not related to the patch.
>> >>>>>
>> >>>>> The success rate is 78% which is low comparing to tier1 executions
>> >>>>> with
>> >>>>> code from downstream builds (95-100% success rates) [2].
>> >>>>>
>> >>>>>
>> >>>>> Could you run the current master (without the dynamic_ownership
>> >>>>> patch)
>> >>>>> so that we have viable comparision?
>> >>>>>
>> >>>>> From what I could see so far, there is an issue with move and copy
>> >>>>>
>> >>>>> operations to and from Gluster domains. For example [3].
>> >>>>>
>> >>>>> The logs are attached.
>> >>>>>
>> >>>>>
>> >>>>> [1]
>> >>>>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
>> >>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/
>> >>>>> <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
>> >>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/>*
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> [2]
>> >>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/
>> >>>>>
>> >>>>> rhv-4.2-ge-runner-tier1-after-upgrade/7/
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> [3]
>> >>>>> 2018-04-22 13:06:28,316+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH
>> >>>>> deleteImage error=Image does not exist in domain:
>> >>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>> >>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
>> >>>>> from=:
>> >>>>> :ffff:10.35.161.182,40936,
>> >>>>> flow_id=disks_syncAction_ba6b2630-5976-4935,
>> >>>>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
>> >>>>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7)
>> >>>>> [storage.TaskManager.Task]
>> >>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error
>> >>>>> (task:875)
>> >>>>> Traceback (most recent call last):
>> >>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line
>> >>>>> 882,
>> >>>>> in
>> >>>>> _run
>> >>>>>  return fn(*args, **kargs)
>> >>>>> File "<string>", line 2, in deleteImage
>> >>>>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49,
>> >>>>> in
>> >>>>> method
>> >>>>>  ret = func(*args, **kwargs)
>> >>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line
>> >>>>> 1503,
>> >>>>> in
>> >>>>> deleteImage
>> >>>>>  raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
>> >>>>> ImageDoesNotExistInSD: Image does not exist in domain:
>> >>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>> >>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
>> >>>>>
>> >>>>> 2018-04-22 13:06:28,317+0300 INFO  (jsonrpc/7)
>> >>>>> [storage.TaskManager.Task]
>> >>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is
>> >>>>> aborted:
>> >>>>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
>> >>>>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code
>> >>>>> 268
>> >>>>> (task:1181)
>> >>>>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher]
>> >>>>> FINISH
>> >>>>> deleteImage error=Image does not exist in domain:
>> >>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>> >>>>> domain=e5fd29c8-52ba-467e-be09
>> >>>>> -ca40ff054d
>> >>>>> d4' (dispatcher:82)
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon
>> >>>>> <ebenahar@redhat.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> Triggered a sanity tier1 execution [1] using [2], which covers all
>> >>>>> the
>> >>>>>
>> >>>>> requested areas, on iSCSI, NFS and Gluster.
>> >>>>> I'll update with the results.
>> >>>>>
>> >>>>> [1]
>> >>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2
>> >>>>> _dev/job/rhv-4.2-ge-flow-storage/1161/
>> >>>>>
>> >>>>> [2]
>> >>>>> https://gerrit.ovirt.org/#/c/89830/
>> >>>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik
>> >>>>> <mpolednik@redhat.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:
>> >>>>>
>> >>>>>
>> >>>>> Hi Martin,
>> >>>>>
>> >>>>>
>> >>>>> I see [1] requires a rebase, can you please take care?
>> >>>>>
>> >>>>>
>> >>>>> Should be rebased.
>> >>>>>
>> >>>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster
>> >>>>> and
>> >>>>>
>> >>>>> FC.
>> >>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR,
>> >>>>> it's
>> >>>>> not
>> >>>>> stable enough at the moment.
>> >>>>>
>> >>>>>
>> >>>>> That is still pretty good.
>> >>>>>
>> >>>>>
>> >>>>> [1] https://gerrit.ovirt.org/#/c/89830/
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Thanks
>> >>>>>
>> >>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik
>> >>>>> <mpolednik@redhat.com
>> >>>>> >
>> >>>>> wrote:
>> >>>>>
>> >>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:
>> >>>>>
>> >>>>>
>> >>>>> Hi, sorry if I misunderstood, I waited for more input regarding what
>> >>>>>
>> >>>>> areas
>> >>>>> have to be tested here.
>> >>>>>
>> >>>>>
>> >>>>> I'd say that you have quite a bit of freedom in this regard.
>> >>>>>
>> >>>>> GlusterFS
>> >>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some
>> >>>>> suite
>> >>>>> that covers basic operations (start & stop VM, migrate it),
>> >>>>> snapshots
>> >>>>> and merging them, and whatever else would be important for storage
>> >>>>> sanity.
>> >>>>>
>> >>>>> mpolednik
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <
>> >>>>> mpolednik@redhat.com
>> >>>>> >
>> >>>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>
>> >>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:
>> >>>>>
>> >>>>>
>> >>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and
>> >>>>> cinder,
>> >>>>>
>> >>>>> will
>> >>>>>
>> >>>>> have to check, since usually, we don't execute our automation on
>> >>>>> them.
>> >>>>>
>> >>>>>
>> >>>>> Any update on this? I believe the gluster tests were successful,
>> >>>>> OST
>> >>>>>
>> >>>>> passes fine and unit tests pass fine, that makes the storage
>> >>>>> backends
>> >>>>> test the last required piece.
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <ratamir@redhat.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>
>> >>>>> +Elad
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <danken@redhat.com
>> >>>>>
>> >>>>> >
>> >>>>> wrote:
>> >>>>>
>> >>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsoffer@redhat.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <eedri@redhat.com>
>> >>>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>
>> >>>>> Please make sure to run as much OST suites on this patch as
>> >>>>>
>> >>>>> possible
>> >>>>>
>> >>>>> before merging ( using 'ci please build' )
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> But note that OST is not a way to verify the patch.
>> >>>>>
>> >>>>>
>> >>>>> Such changes require testing with all storage types we support.
>> >>>>>
>> >>>>> Nir
>> >>>>>
>> >>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <
>> >>>>> mpolednik@redhat.com
>> >>>>> >
>> >>>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>
>> >>>>> Hey,
>> >>>>>
>> >>>>>
>> >>>>> I've created a patch[0] that is finally able to activate
>> >>>>>
>> >>>>> libvirt's
>> >>>>> dynamic_ownership for VDSM while not negatively affecting
>> >>>>> functionality of our storage code.
>> >>>>>
>> >>>>> That of course comes with quite a bit of code removal, mostly
>> >>>>> in
>> >>>>> the
>> >>>>> area of host devices, hwrng and anything that touches devices;
>> >>>>> bunch
>> >>>>> of test changes and one XML generation caveat (storage is
>> >>>>> handled
>> >>>>> by
>> >>>>> VDSM, therefore disk relabelling needs to be disabled on the
>> >>>>> VDSM
>> >>>>> level).
>> >>>>>
>> >>>>> Because of the scope of the patch, I welcome
>> >>>>> storage/virt/network
>> >>>>> people to review the code and consider the implication this
>> >>>>> change
>> >>>>> has
>> >>>>> on current/future features.
>> >>>>>
>> >>>>> [0] https://gerrit.ovirt.org/#/c/89830/
>> >>>>>
>> >>>>>
>> >>>>> In particular:  dynamic_ownership was set to 0 prehistorically
>> >>>>> (as
>> >>>>>
>> >>>>>
>> >>>>> part
>> >>>>>
>> >>>>>
>> >>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because
>> >>>>>
>> >>>>> libvirt,
>> >>>>> running as root, was not able to play properly with root-squash
>> >>>>> nfs
>> >>>>> mounts.
>> >>>>>
>> >>>>> Have you attempted this use case?
>> >>>>>
>> >>>>> I join to Nir's request to run this with storage QE.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Raz Tamir
>> >>>>> Manager, RHV QE
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Devel mailing list
>> >>>>> Devel@ovirt.org
>> >>>>> http://lists.ovirt.org/mailman/listinfo/devel
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>> > <logs.tar.gz>
>> >
>> >
>
>