[ovirt-devel] dynamic ownership changes
Michal Skrivanek
michal.skrivanek at redhat.com
Mon May 7 12:53:14 UTC 2018
Hi Elad,
why did you install vdsm-hook-allocate_net?
adding Dan as I think the hook is not supposed to fail this badly in any case
Thanks,
michal
> On 5 May 2018, at 19:22, Elad Ben Aharon <ebenahar at redhat.com> wrote:
>
> Start VM fails on:
>
> 2018-05-05 17:53:27,399+0300 INFO (vm/e6ce66ce) [virt.vm] (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') drive 'vda' path: 'dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-9a78-bdd13a843c62/images/6cdabfe5-
> d1ca-40af-ae63-9834f235d1c8/7ef97445-30e6-4435-8425-f35a01928211' -> u'*dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-9a78-bdd13a843c62/images/6cdabfe5-d1ca-40af-ae63-9834f235d1c8/7ef97445-30e6-4435-8425-
> f35a01928211' (storagexml:334)
> 2018-05-05 17:53:27,888+0300 INFO (jsonrpc/1) [vdsm.api] START getSpmStatus(spUUID='940fe6f3-b0c6-4d0c-a921-198e7819c1cc', options=None) from=::ffff:10.35.161.127,53512, task_id=c70ace39-dbfe-4f5c-ae49-a1e3a82c
> 2758 (api:46)
> 2018-05-05 17:53:27,909+0300 INFO (vm/e6ce66ce) [root] /usr/libexec/vdsm/hooks/before_device_create/10_allocate_net: rc=2 err=vm net allocation hook: [unexpected error]: Traceback (most recent call last):
> File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 105, in <module>
> main()
> File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 93, in main
> allocate_random_network(device_xml)
> File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 62, in allocate_random_network
> net = _get_random_network()
> File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 50, in _get_random_network
> available_nets = _parse_nets()
> File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 46, in _parse_nets
> return [net for net in os.environ[AVAIL_NETS_KEY].split()]
> File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
> raise KeyError(key)
> KeyError: 'equivnets'
>
>
> (hooks:110)
> 2018-05-05 17:53:27,915+0300 ERROR (vm/e6ce66ce) [virt.vm] (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') The vm start process failed (vm:943)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
> self._run()
> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2861, in _run
> domxml = hooks.before_vm_start(self._buildDomainXML(),
> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2254, in _buildDomainXML
> dom, self.id <http://self.id/>, self._custom['custom'])
> File "/usr/lib/python2.7/site-packages/vdsm/virt/domxml_preprocess.py", line 240, in replace_device_xml_with_hooks_xml
> dev_custom)
> File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 134, in before_device_create
> params=customProperties)
> File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120, in _runHooksDir
> raise exception.HookError(err)
> HookError: Hook Error: ('vm net allocation hook: [unexpected error]: Traceback (most recent call last):\n File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 105, in <module>\n main()\n
> File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 93, in main\n allocate_random_network(device_xml)\n File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 62, i
> n allocate_random_network\n net = _get_random_network()\n File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 50, in _get_random_network\n available_nets = _parse_nets()\n File "/us
> r/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 46, in _parse_nets\n return [net for net in os.environ[AVAIL_NETS_KEY].split()]\n File "/usr/lib64/python2.7/UserDict.py", line 23, in __getit
> em__\n raise KeyError(key)\nKeyError: \'equivnets\'\n\n\n',)
>
>
>
> Hence, the success rate was 28% against 100% running with d/s (d/s). If needed, I'll compare against the latest master, but I think you get the picture with d/s.
>
> vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64
> libvirt-3.9.0-14.el7_5.3.x86_64
> qemu-kvm-rhev-2.10.0-21.el7_5.2.x86_64
> kernel 3.10.0-862.el7.x86_64
> rhel7.5
>
>
> Logs attached
>
> On Sat, May 5, 2018 at 1:26 PM, Elad Ben Aharon <ebenahar at redhat.com <mailto:ebenahar at redhat.com>> wrote:
> nvm, found gluster 3.12 repo, managed to install vdsm
>
> On Sat, May 5, 2018 at 1:12 PM, Elad Ben Aharon <ebenahar at redhat.com <mailto:ebenahar at redhat.com>> wrote:
> No, vdsm requires it:
>
> Error: Package: vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64 (/vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64)
> Requires: glusterfs-fuse >= 3.12
> Installed: glusterfs-fuse-3.8.4-54.8.el7.x86_64 (@rhv-4.2.3)
>
> Therefore, vdsm package installation is skipped upon force install.
>
> On Sat, May 5, 2018 at 11:42 AM, Michal Skrivanek <michal.skrivanek at redhat.com <mailto:michal.skrivanek at redhat.com>> wrote:
>
>
>> On 5 May 2018, at 00:38, Elad Ben Aharon <ebenahar at redhat.com <mailto:ebenahar at redhat.com>> wrote:
>>
>> Hi guys,
>>
>> The vdsm build from the patch requires glusterfs-fuse > 3.12. This is while the latest 4.2.3-5 d/s build requires 3.8.4 (3.4.0.59rhs-1.el7)
>
> because it is still oVirt, not a downstream build. We can’t really do downstream builds with unmerged changes:/
>
>> Trying to get this gluster-fuse build, so far no luck.
>> Is this requirement intentional?
>
> it should work regardless, I guess you can force install it without the dependency
>
>>
>> On Fri, May 4, 2018 at 2:38 PM, Michal Skrivanek <michal.skrivanek at redhat.com <mailto:michal.skrivanek at redhat.com>> wrote:
>> Hi Elad,
>> to make it easier to compare, Martin backported the change to 4.2 so it is actually comparable with a run without that patch. Would you please try that out?
>> It would be best to have 4.2 upstream and this[1] run to really minimize the noise.
>>
>> Thanks,
>> michal
>>
>> [1] http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-demand-el7-x86_64/28/ <http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-demand-el7-x86_64/28/>
>>
>>> On 27 Apr 2018, at 09:23, Martin Polednik <mpolednik at redhat.com <mailto:mpolednik at redhat.com>> wrote:
>>>
>>> On 24/04/18 00:37 +0300, Elad Ben Aharon wrote:
>>>> I will update with the results of the next tier1 execution on latest 4.2.3
>>>
>>> That isn't master but old branch though. Could you run it against
>>> *current* VDSM master?
>>>
>>>> On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik <mpolednik at redhat.com <mailto:mpolednik at redhat.com>>
>>>> wrote:
>>>>
>>>>> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote:
>>>>>
>>>>>> Hi, I've triggered another execution [1] due to some issues I saw in the
>>>>>> first which are not related to the patch.
>>>>>>
>>>>>> The success rate is 78% which is low comparing to tier1 executions with
>>>>>> code from downstream builds (95-100% success rates) [2].
>>>>>>
>>>>>
>>>>> Could you run the current master (without the dynamic_ownership patch)
>>>>> so that we have viable comparision?
>>>>>
>>>>> From what I could see so far, there is an issue with move and copy
>>>>>> operations to and from Gluster domains. For example [3].
>>>>>>
>>>>>> The logs are attached.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv>
>>>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/
>>>>>> <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv>
>>>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> [2]
>>>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/>
>>>>>>
>>>>>> rhv-4.2-ge-runner-tier1-after-upgrade/7/
>>>>>>
>>>>>>
>>>>>>
>>>>>> [3]
>>>>>> 2018-04-22 13:06:28,316+0300 INFO (jsonrpc/7) [vdsm.api] FINISH
>>>>>> deleteImage error=Image does not exist in domain:
>>>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>>>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
>>>>>> from=:
>>>>>> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935,
>>>>>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
>>>>>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task]
>>>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875)
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
>>>>>> in
>>>>>> _run
>>>>>> return fn(*args, **kargs)
>>>>>> File "<string>", line 2, in deleteImage
>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
>>>>>> method
>>>>>> ret = func(*args, **kwargs)
>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503,
>>>>>> in
>>>>>> deleteImage
>>>>>> raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
>>>>>> ImageDoesNotExistInSD: Image does not exist in domain:
>>>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>>>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
>>>>>>
>>>>>> 2018-04-22 13:06:28,317+0300 INFO (jsonrpc/7) [storage.TaskManager.Task]
>>>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted:
>>>>>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
>>>>>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268
>>>>>> (task:1181)
>>>>>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH
>>>>>> deleteImage error=Image does not exist in domain:
>>>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>>>>>> domain=e5fd29c8-52ba-467e-be09
>>>>>> -ca40ff054d
>>>>>> d4' (dispatcher:82)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon <ebenahar at redhat.com <mailto:ebenahar at redhat.com>>
>>>>>> wrote:
>>>>>>
>>>>>> Triggered a sanity tier1 execution [1] using [2], which covers all the
>>>>>>> requested areas, on iSCSI, NFS and Gluster.
>>>>>>> I'll update with the results.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2 <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2>
>>>>>>> _dev/job/rhv-4.2-ge-flow-storage/1161/
>>>>>>>
>>>>>>> [2]
>>>>>>> https://gerrit.ovirt.org/#/c/89830/ <https://gerrit.ovirt.org/#/c/89830/>
>>>>>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik <mpolednik at redhat.com <mailto:mpolednik at redhat.com>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:
>>>>>>>>
>>>>>>>> Hi Martin,
>>>>>>>>>
>>>>>>>>> I see [1] requires a rebase, can you please take care?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Should be rebased.
>>>>>>>>
>>>>>>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and
>>>>>>>>
>>>>>>>>> FC.
>>>>>>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's
>>>>>>>>> not
>>>>>>>>> stable enough at the moment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> That is still pretty good.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://gerrit.ovirt.org/#/c/89830/ <https://gerrit.ovirt.org/#/c/89830/>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <mpolednik at redhat.com <mailto:mpolednik at redhat.com>
>>>>>>>>> >
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi, sorry if I misunderstood, I waited for more input regarding what
>>>>>>>>>>
>>>>>>>>>>> areas
>>>>>>>>>>> have to be tested here.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'd say that you have quite a bit of freedom in this regard.
>>>>>>>>>> GlusterFS
>>>>>>>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
>>>>>>>>>> that covers basic operations (start & stop VM, migrate it), snapshots
>>>>>>>>>> and merging them, and whatever else would be important for storage
>>>>>>>>>> sanity.
>>>>>>>>>>
>>>>>>>>>> mpolednik
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <
>>>>>>>>>> mpolednik at redhat.com <mailto:mpolednik at redhat.com>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and
>>>>>>>>>>>> cinder,
>>>>>>>>>>>>
>>>>>>>>>>>> will
>>>>>>>>>>>>> have to check, since usually, we don't execute our automation on
>>>>>>>>>>>>> them.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any update on this? I believe the gluster tests were successful,
>>>>>>>>>>>>> OST
>>>>>>>>>>>>>
>>>>>>>>>>>> passes fine and unit tests pass fine, that makes the storage
>>>>>>>>>>>> backends
>>>>>>>>>>>> test the last required piece.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <ratamir at redhat.com <mailto:ratamir at redhat.com>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +Elad
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <danken at redhat.com <mailto:danken at redhat.com>
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsoffer at redhat.com <mailto:nsoffer at redhat.com>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <eedri at redhat.com <mailto:eedri at redhat.com>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please make sure to run as much OST suites on this patch as
>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> before merging ( using 'ci please build' )
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But note that OST is not a way to verify the patch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Such changes require testing with all storage types we support.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Nir
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <
>>>>>>>>>>>>>>>> mpolednik at redhat.com <mailto:mpolednik at redhat.com>
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've created a patch[0] that is finally able to activate
>>>>>>>>>>>>>>>>>> libvirt's
>>>>>>>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting
>>>>>>>>>>>>>>>>>> functionality of our storage code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly
>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices;
>>>>>>>>>>>>>>>>>> bunch
>>>>>>>>>>>>>>>>>> of test changes and one XML generation caveat (storage is
>>>>>>>>>>>>>>>>>> handled
>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the
>>>>>>>>>>>>>>>>>> VDSM
>>>>>>>>>>>>>>>>>> level).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Because of the scope of the patch, I welcome
>>>>>>>>>>>>>>>>>> storage/virt/network
>>>>>>>>>>>>>>>>>> people to review the code and consider the implication this
>>>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>> on current/future features.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/ <https://gerrit.ovirt.org/#/c/89830/>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In particular: dynamic_ownership was set to 0 prehistorically
>>>>>>>>>>>>>>>>>> (as
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> part
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 <https://bugzilla.redhat.com/show_bug.cgi?id=554961> ) because
>>>>>>>>>>>>>>> libvirt,
>>>>>>>>>>>>>>> running as root, was not able to play properly with root-squash
>>>>>>>>>>>>>>> nfs
>>>>>>>>>>>>>>> mounts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have you attempted this use case?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I join to Nir's request to run this with storage QE.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Raz Tamir
>>>>>>>>>>>>>> Manager, RHV QE
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>>> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>>>
>>>
>>
>>
>
>
>
>
> <logs.tar.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20180507/516614d8/attachment-0001.html>
More information about the Devel
mailing list