[ovirt-devel] dynamic ownership changes

Michal Skrivanek michal.skrivanek at redhat.com
Fri May 4 11:38:50 UTC 2018


Hi Elad,
to make it easier to compare, Martin backported the change to 4.2 so it is actually comparable with a run without that patch. Would you please try that out? 
It would be best to have 4.2 upstream and this[1] run to really minimize the noise.

Thanks,
michal

[1] http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-demand-el7-x86_64/28/ <http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-demand-el7-x86_64/28/>

> On 27 Apr 2018, at 09:23, Martin Polednik <mpolednik at redhat.com> wrote:
> 
> On 24/04/18 00:37 +0300, Elad Ben Aharon wrote:
>> I will update with the results of the next tier1 execution on latest 4.2.3
> 
> That isn't master but old branch though. Could you run it against
> *current* VDSM master?
> 
>> On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik <mpolednik at redhat.com>
>> wrote:
>> 
>>> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote:
>>> 
>>>> Hi, I've triggered another execution [1] due to some issues I saw in the
>>>> first which are not related to the patch.
>>>> 
>>>> The success rate is 78% which is low comparing to tier1 executions with
>>>> code from downstream builds (95-100% success rates) [2].
>>>> 
>>> 
>>> Could you run the current master (without the dynamic_ownership patch)
>>> so that we have viable comparision?
>>> 
>>> From what I could see so far, there is an issue with move and copy
>>>> operations to and from Gluster domains. For example [3].
>>>> 
>>>> The logs are attached.
>>>> 
>>>> 
>>>> [1]
>>>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/
>>>> <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/>*
>>>> 
>>>> 
>>>> 
>>>> [2]
>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/
>>>> 
>>>> rhv-4.2-ge-runner-tier1-after-upgrade/7/
>>>> 
>>>> 
>>>> 
>>>> [3]
>>>> 2018-04-22 13:06:28,316+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH
>>>> deleteImage error=Image does not exist in domain:
>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
>>>> from=:
>>>> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935,
>>>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
>>>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task]
>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error (task:875)
>>>> Traceback (most recent call last):
>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
>>>> in
>>>> _run
>>>>  return fn(*args, **kargs)
>>>> File "<string>", line 2, in deleteImage
>>>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
>>>> method
>>>>  ret = func(*args, **kwargs)
>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503,
>>>> in
>>>> deleteImage
>>>>  raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
>>>> ImageDoesNotExistInSD: Image does not exist in domain:
>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
>>>> 
>>>> 2018-04-22 13:06:28,317+0300 INFO  (jsonrpc/7) [storage.TaskManager.Task]
>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted:
>>>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
>>>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268
>>>> (task:1181)
>>>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH
>>>> deleteImage error=Image does not exist in domain:
>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>>>> domain=e5fd29c8-52ba-467e-be09
>>>> -ca40ff054d
>>>> d4' (dispatcher:82)
>>>> 
>>>> 
>>>> 
>>>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon <ebenahar at redhat.com>
>>>> wrote:
>>>> 
>>>> Triggered a sanity tier1 execution [1] using [2], which covers all the
>>>>> requested areas, on iSCSI, NFS and Gluster.
>>>>> I'll update with the results.
>>>>> 
>>>>> [1]
>>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2
>>>>> _dev/job/rhv-4.2-ge-flow-storage/1161/
>>>>> 
>>>>> [2]
>>>>> https://gerrit.ovirt.org/#/c/89830/
>>>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik <mpolednik at redhat.com>
>>>>> wrote:
>>>>> 
>>>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:
>>>>>> 
>>>>>> Hi Martin,
>>>>>>> 
>>>>>>> I see [1] requires a rebase, can you please take care?
>>>>>>> 
>>>>>>> 
>>>>>> Should be rebased.
>>>>>> 
>>>>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and
>>>>>> 
>>>>>>> FC.
>>>>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's
>>>>>>> not
>>>>>>> stable enough at the moment.
>>>>>>> 
>>>>>>> 
>>>>>> That is still pretty good.
>>>>>> 
>>>>>> 
>>>>>> [1] https://gerrit.ovirt.org/#/c/89830/
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <mpolednik at redhat.com
>>>>>>> >
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi, sorry if I misunderstood, I waited for more input regarding what
>>>>>>>> 
>>>>>>>>> areas
>>>>>>>>> have to be tested here.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'd say that you have quite a bit of freedom in this regard.
>>>>>>>> GlusterFS
>>>>>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
>>>>>>>> that covers basic operations (start & stop VM, migrate it), snapshots
>>>>>>>> and merging them, and whatever else would be important for storage
>>>>>>>> sanity.
>>>>>>>> 
>>>>>>>> mpolednik
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <
>>>>>>>> mpolednik at redhat.com
>>>>>>>> >
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and
>>>>>>>>>> cinder,
>>>>>>>>>> 
>>>>>>>>>> will
>>>>>>>>>>> have to check, since usually, we don't execute our automation on
>>>>>>>>>>> them.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Any update on this? I believe the gluster tests were successful,
>>>>>>>>>>> OST
>>>>>>>>>>> 
>>>>>>>>>> passes fine and unit tests pass fine, that makes the storage
>>>>>>>>>> backends
>>>>>>>>>> test the last required piece.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <ratamir at redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> +Elad
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <danken at redhat.com
>>>>>>>>>>>> >
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsoffer at redhat.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <eedri at redhat.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please make sure to run as much OST suites on this patch as
>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> before merging ( using 'ci please build' )
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> But note that OST is not a way to verify the patch.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Such changes require testing with all storage types we support.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Nir
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <
>>>>>>>>>>>>>> mpolednik at redhat.com
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I've created a patch[0] that is finally able to activate
>>>>>>>>>>>>>>>> libvirt's
>>>>>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting
>>>>>>>>>>>>>>>> functionality of our storage code.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices;
>>>>>>>>>>>>>>>> bunch
>>>>>>>>>>>>>>>> of test changes and one XML generation caveat (storage is
>>>>>>>>>>>>>>>> handled
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the
>>>>>>>>>>>>>>>> VDSM
>>>>>>>>>>>>>>>> level).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Because of the scope of the patch, I welcome
>>>>>>>>>>>>>>>> storage/virt/network
>>>>>>>>>>>>>>>> people to review the code and consider the implication this
>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>> on current/future features.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In particular:  dynamic_ownership was set to 0 prehistorically
>>>>>>>>>>>>>>>> (as
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> part
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because
>>>>>>>>>>>>> libvirt,
>>>>>>>>>>>>> running as root, was not able to play properly with root-squash
>>>>>>>>>>>>> nfs
>>>>>>>>>>>>> mounts.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Have you attempted this use case?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I join to Nir's request to run this with storage QE.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Raz Tamir
>>>>>>>>>>>> Manager, RHV QE
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>> 
>>> 
>>> 
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20180504/80f23a7e/attachment-0001.html>


More information about the Devel mailing list