[ovirt-devel] dynamic ownership changes

Elad Ben Aharon ebenahar at redhat.com
Sun Apr 22 22:34:03 UTC 2018


Also, snapshot preview failed (2nd snapshot):

2018-04-22 18:01:06,253+0300 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC
call Volume.create succeeded in 0.84 seconds (__init__:311)
2018-04-22 18:01:06,261+0300 INFO  (tasks/6)
[storage.ThreadPool.WorkerThread] START task
6823d724-cb1b-4706-a58a-83428363cce5 (cmd=<bound method Task.commit of
<vdsm.storage.task.Task instance at 0x7f1aac54fc68>>, args=None)
(threadPool
:208)
2018-04-22 18:01:06,906+0300 WARN  (check/loop) [storage.asyncutils] Call
<bound method DirectioChecker._check of <DirectioChecker
/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__l

ocal__ge2__nfs__0/46d2fd2b-bdd0-40f5-be4c-0aaf2a629f1b/dom_md/metadata
running next_check=4920812.91 at 0x7f1aac3ed790>> delayed by 0.51 seconds
(asyncutils:138)
2018-04-22 18:01:07,082+0300 WARN  (tasks/6) [storage.ResourceManager]
Resource factory failed to create resource
'01_img_7df9d2b2-52b5-4ac2-a9f0-a1d1e93eb6d2.095ad9d6-3154-449c-868c-f975dcdcb729'.
Canceling request. (resourceManager:543
)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py",
line 539, in registerResource
   obj = namespaceObj.factory.createResource(name, lockType)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py",
line 193, in createResource
   lockType)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py",
line 122, in __getResourceCandidatesList
   imgUUID=resourceName)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 198,
in getChain
   uuidlist = volclass.getImageVolumes(sdUUID, imgUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1537,
in getImageVolumes
   return cls.manifestClass.getImageVolumes(sdUUID, imgUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
337, in getImageVolumes
   if (sd.produceVolume(imgUUID, volid).getImage() == imgUUID):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 438, in
produceVolume
   volUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
69, in __init__
   volUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86,
in __init__
   self.validate()
 File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 112,
in validate
   self.validateVolumePath()
 File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
129, in validateVolumePath
   raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist:
(u'a404bfc9-57ef-4dcc-9f1b-458dfb08ad74',)
2018-04-22 18:01:07,083+0300 WARN  (tasks/6)
[storage.ResourceManager.Request]
(ResName='01_img_7df9d2b2-52b5-4ac2-a9f0-a1d1e93eb6d2.095ad9d6-3154-449c-868c-f975dcdcb729',
ReqID='79c96e70-7334-4402-a390-dc87f939b7d2') Tried to cancel a p
rocessed request (resourceManager:187)
2018-04-22 18:01:07,084+0300 ERROR (tasks/6) [storage.TaskManager.Task]
(Task='6823d724-cb1b-4706-a58a-83428363cce5') Unexpected error (task:875)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run
   return fn(*args, **kargs)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in
run
   return self.cmd(*self.argslist, **self.argsdict)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
79, in wrapper
   return method(self, *args, **kwargs)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1939, in
createVolume
   with rm.acquireResource(img_ns, imgUUID, rm.EXCLUSIVE):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py",
line 1025, in acquireResource
   return _manager.acquireResource(namespace, name, lockType,
timeout=timeout)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py",
line 475, in acquireResource
   raise se.ResourceAcqusitionFailed()
ResourceAcqusitionFailed: Could not acquire resource. Probably resource
factory threw an exception.: ()
2018-04-22 18:01:07,735+0300 INFO  (tasks/6)
[storage.ThreadPool.WorkerThread] FINISH task
6823d724-cb1b-4706-a58a-83428363cce5 (threadPool:210)



*Steps from [1]:*

*17:54:41* 2018-04-22 17:54:41,574 INFO       Test Setup   2: Creating
VM vm_TestCase11660_2217544157*17:54:55* 2018-04-22 17:54:55,593 INFO
049: storage/rhevmtests.storage.storage_snapshots.test_live_snapshot.TestCase11660.test_live_snapshot[glusterfs]*17:54:55*
2018-04-22 17:54:55,593 INFO     Create a snapshot while VM is
running*17:54:55* 2018-04-22 17:54:55,593 INFO STORAGE:
GLUSTERFS*17:58:04* 2018-04-22 17:58:04,761 INFO       Test Step   3:
Start writing continuously on VM vm_TestCase11660_2217544157 via
dd*17:58:35* 2018-04-22 17:58:35,334 INFO       Test Step   4:
Creating live snapshot on a VM vm_TestCase11660_2217544157*17:58:35*
2018-04-22 17:58:35,334 INFO       Test Step   5: Adding new snapshot
to VM vm_TestCase11660_2217544157 with all disks*17:58:35* 2018-04-22
17:58:35,337 INFO       Test Step   6: Add snapshot to VM
vm_TestCase11660_2217544157 with {'description':
'snap_TestCase11660_2217545559', 'wait': True}*17:59:26* 2018-04-22
17:59:26,179 INFO       Test Step   7: Writing files to VM's
vm_TestCase11660_2217544157 disk*18:00:33* 2018-04-22 18:00:33,117
INFO       Test Step   8: Shutdown vm vm_TestCase11660_2217544157 with
{'async': 'false'}*18:01:04* 2018-04-22 18:01:04,038 INFO       Test
Step   9: Previewing snapshot snap_TestCase11660_2217545559 on VM
vm_TestCase11660_2217544157




[1]

https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-storage/1048/consoleFull



On Mon, Apr 23, 2018 at 1:29 AM, Elad Ben Aharon <ebenahar at redhat.com>
wrote:

> Sorry, this is the new execution link:
> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/
> rhv-4.2-ge-runner-storage/1048/testReport/
>
> On Mon, Apr 23, 2018 at 1:23 AM, Elad Ben Aharon <ebenahar at redhat.com>
> wrote:
>
>> Hi, I've triggered another execution [1] due to some issues I saw in the
>> first which are not related to the patch.
>>
>> The success rate is 78% which is low comparing to tier1 executions with
>> code from downstream builds (95-100% success rates) [2].
>>
>> From what I could see so far, there is an issue with move and copy
>> operations to and from Gluster domains. For example [3].
>>
>> The logs are attached.
>>
>>
>> [1]
>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/
>> <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/>*
>>
>>
>>
>> [2]
>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-
>> 4.2-ge-runner-tier1-after-upgrade/7/
>>
>>
>>
>> [3]
>> 2018-04-22 13:06:28,316+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH
>> deleteImage error=Image does not exist in domain:
>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' from=:
>> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935,
>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task]
>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error
>> (task:875)
>> Traceback (most recent call last):
>>  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
>> in _run
>>    return fn(*args, **kargs)
>>  File "<string>", line 2, in deleteImage
>>  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
>> method
>>    ret = func(*args, **kwargs)
>>  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503,
>> in deleteImage
>>    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
>> ImageDoesNotExistInSD: Image does not exist in domain:
>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
>> 2018-04-22 13:06:28,317+0300 INFO  (jsonrpc/7) [storage.TaskManager.Task]
>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted:
>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268
>> (task:1181)
>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher]
>> FINISH deleteImage error=Image does not exist in domain:
>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
>> domain=e5fd29c8-52ba-467e-be09-ca40ff054d
>> d4' (dispatcher:82)
>>
>>
>>
>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon <ebenahar at redhat.com>
>> wrote:
>>
>>> Triggered a sanity tier1 execution [1] using [2], which covers all the
>>> requested areas, on iSCSI, NFS and Gluster.
>>> I'll update with the results.
>>>
>>> [1]
>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2
>>> _dev/job/rhv-4.2-ge-flow-storage/1161/
>>>
>>> [2]
>>> https://gerrit.ovirt.org/#/c/89830/
>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64
>>>
>>>
>>>
>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik <mpolednik at redhat.com>
>>> wrote:
>>>
>>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:
>>>>
>>>>> Hi Martin,
>>>>>
>>>>> I see [1] requires a rebase, can you please take care?
>>>>>
>>>>
>>>> Should be rebased.
>>>>
>>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster and
>>>>> FC.
>>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's
>>>>> not
>>>>> stable enough at the moment.
>>>>>
>>>>
>>>> That is still pretty good.
>>>>
>>>>
>>>> [1] https://gerrit.ovirt.org/#/c/89830/
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <mpolednik at redhat.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:
>>>>>>
>>>>>> Hi, sorry if I misunderstood, I waited for more input regarding what
>>>>>>> areas
>>>>>>> have to be tested here.
>>>>>>>
>>>>>>>
>>>>>> I'd say that you have quite a bit of freedom in this regard. GlusterFS
>>>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
>>>>>> that covers basic operations (start & stop VM, migrate it), snapshots
>>>>>> and merging them, and whatever else would be important for storage
>>>>>> sanity.
>>>>>>
>>>>>> mpolednik
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <
>>>>>> mpolednik at redhat.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and
>>>>>>>> cinder,
>>>>>>>>
>>>>>>>>> will
>>>>>>>>> have to check, since usually, we don't execute our automation on
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any update on this? I believe the gluster tests were successful,
>>>>>>>> OST
>>>>>>>> passes fine and unit tests pass fine, that makes the storage
>>>>>>>> backends
>>>>>>>> test the last required piece.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <ratamir at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> +Elad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <
>>>>>>>>>> danken at redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsoffer at redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <eedri at redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Please make sure to run as much OST suites on this patch as
>>>>>>>>>>>> possible
>>>>>>>>>>>>
>>>>>>>>>>>> before merging ( using 'ci please build' )
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> But note that OST is not a way to verify the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Such changes require testing with all storage types we support.
>>>>>>>>>>>>
>>>>>>>>>>>> Nir
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <
>>>>>>>>>>>> mpolednik at redhat.com
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've created a patch[0] that is finally able to activate
>>>>>>>>>>>>>> libvirt's
>>>>>>>>>>>>>> dynamic_ownership for VDSM while not negatively affecting
>>>>>>>>>>>>>> functionality of our storage code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That of course comes with quite a bit of code removal, mostly
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> area of host devices, hwrng and anything that touches devices;
>>>>>>>>>>>>>> bunch
>>>>>>>>>>>>>> of test changes and one XML generation caveat (storage is
>>>>>>>>>>>>>> handled
>>>>>>>>>>>>>> by
>>>>>>>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the
>>>>>>>>>>>>>> VDSM
>>>>>>>>>>>>>> level).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Because of the scope of the patch, I welcome
>>>>>>>>>>>>>> storage/virt/network
>>>>>>>>>>>>>> people to review the code and consider the implication this
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>> on current/future features.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In particular:  dynamic_ownership was set to 0
>>>>>>>>>>>>>> prehistorically (as
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> part
>>>>>>>>>>>>
>>>>>>>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because
>>>>>>>>>>> libvirt,
>>>>>>>>>>> running as root, was not able to play properly with root-squash
>>>>>>>>>>> nfs
>>>>>>>>>>> mounts.
>>>>>>>>>>>
>>>>>>>>>>> Have you attempted this use case?
>>>>>>>>>>>
>>>>>>>>>>> I join to Nir's request to run this with storage QE.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Raz Tamir
>>>>>>>>>> Manager, RHV QE
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20180423/b59f2300/attachment-0001.html>


More information about the Devel mailing list