Re: [ovirt-devel] dynamic ownership changes

23 Apr 2018

      Also, snapshot preview failed (2nd snapshot):

2018-04-22 18:01:06,253+0300 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC
call Volume.create succeeded in 0.84 seconds (__init__:311)
2018-04-22 18:01:06,261+0300 INFO  (tasks/6)
[storage.ThreadPool.WorkerThread] START task
6823d724-cb1b-4706-a58a-83428363cce5 (cmd=<bound method Task.commit of
<vdsm.storage.task.Task instance at 0x7f1aac54fc68>>, args=None)
(threadPool
:208)
2018-04-22 18:01:06,906+0300 WARN  (check/loop) [storage.asyncutils] Call
<bound method DirectioChecker._check of <DirectioChecker
/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__l

ocal__ge2__nfs__0/46d2fd2b-bdd0-40f5-be4c-0aaf2a629f1b/dom_md/metadata
running next_check=4920812.91 at 0x7f1aac3ed790>> delayed by 0.51 seconds
(asyncutils:138)
2018-04-22 18:01:07,082+0300 WARN  (tasks/6) [storage.ResourceManager]
Resource factory failed to create resource
'01_img_7df9d2b2-52b5-4ac2-a9f0-a1d1e93eb6d2.095ad9d6-3154-449c-868c-f975dcdcb729'.
Canceling request. (resourceManager:543
)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py",
line 539, in registerResource
   obj = namespaceObj.factory.createResource(name, lockType)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py",
line 193, in createResource
   lockType)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py",
line 122, in __getResourceCandidatesList
   imgUUID=resourceName)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 198,
in getChain
   uuidlist = volclass.getImageVolumes(sdUUID, imgUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1537,
in getImageVolumes
   return cls.manifestClass.getImageVolumes(sdUUID, imgUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
337, in getImageVolumes
   if (sd.produceVolume(imgUUID, volid).getImage() == imgUUID):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 438, in
produceVolume
   volUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
69, in __init__
   volUUID)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86,
in __init__
   self.validate()
 File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 112,
in validate
   self.validateVolumePath()
 File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
129, in validateVolumePath
   raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist:
(u'a404bfc9-57ef-4dcc-9f1b-458dfb08ad74',)
2018-04-22 18:01:07,083+0300 WARN  (tasks/6)
[storage.ResourceManager.Request]
(ResName='01_img_7df9d2b2-52b5-4ac2-a9f0-a1d1e93eb6d2.095ad9d6-3154-449c-868c-f975dcdcb729',
ReqID='79c96e70-7334-4402-a390-dc87f939b7d2') Tried to cancel a p
rocessed request (resourceManager:187)
2018-04-22 18:01:07,084+0300 ERROR (tasks/6) [storage.TaskManager.Task]
(Task='6823d724-cb1b-4706-a58a-83428363cce5') Unexpected error (task:875)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run
   return fn(*args, **kargs)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in
run
   return self.cmd(*self.argslist, **self.argsdict)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
79, in wrapper
   return method(self, *args, **kwargs)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1939, in
createVolume
   with rm.acquireResource(img_ns, imgUUID, rm.EXCLUSIVE):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py",
line 1025, in acquireResource
   return _manager.acquireResource(namespace, name, lockType,
timeout=timeout)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py",
line 475, in acquireResource
   raise se.ResourceAcqusitionFailed()
ResourceAcqusitionFailed: Could not acquire resource. Probably resource
factory threw an exception.: ()
2018-04-22 18:01:07,735+0300 INFO  (tasks/6)
[storage.ThreadPool.WorkerThread] FINISH task
6823d724-cb1b-4706-a58a-83428363cce5 (threadPool:210)

*Steps from [1]:*

*17:54:41* 2018-04-22 17:54:41,574 INFO       Test Setup   2: Creating
VM vm_TestCase11660_2217544157*17:54:55* 2018-04-22 17:54:55,593 INFO
049: storage/rhevmtests.storage.storage_snapshots.test_live_snapshot.TestCase11660.test_live_snapshot[glusterfs]*17:54:55*
2018-04-22 17:54:55,593 INFO     Create a snapshot while VM is
running*17:54:55* 2018-04-22 17:54:55,593 INFO STORAGE:
GLUSTERFS*17:58:04* 2018-04-22 17:58:04,761 INFO       Test Step   3:
Start writing continuously on VM vm_TestCase11660_2217544157 via
dd*17:58:35* 2018-04-22 17:58:35,334 INFO       Test Step   4:
Creating live snapshot on a VM vm_TestCase11660_2217544157*17:58:35*
2018-04-22 17:58:35,334 INFO       Test Step   5: Adding new snapshot
to VM vm_TestCase11660_2217544157 with all disks*17:58:35* 2018-04-22
17:58:35,337 INFO       Test Step   6: Add snapshot to VM
vm_TestCase11660_2217544157 with {'description':
'snap_TestCase11660_2217545559', 'wait': True}*17:59:26* 2018-04-22
17:59:26,179 INFO       Test Step   7: Writing files to VM's
vm_TestCase11660_2217544157 disk*18:00:33* 2018-04-22 18:00:33,117
INFO       Test Step   8: Shutdown vm vm_TestCase11660_2217544157 with
{'async': 'false'}*18:01:04* 2018-04-22 18:01:04,038 INFO       Test
Step   9: Previewing snapshot snap_TestCase11660_2217545559 on VM
vm_TestCase11660_2217544157

[1]

https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-st...

On Mon, Apr 23, 2018 at 1:29 AM, Elad Ben Aharon <ebenahar@redhat.com>
wrote:
...
Sorry, this is the new execution link:
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/
rhv-4.2-ge-runner-storage/1048/testReport/
On Mon, Apr 23, 2018 at 1:23 AM, Elad Ben Aharon <ebenahar@redhat.com>
wrote:
...
Hi, I've triggered another execution [1] due to some issues I saw in the
first which are not related to the patch.
The success rate is 78% which is low comparing to tier1 executions with
code from downstream builds (95-100% success rates) [2].
From what I could see so far, there is an issue with move and copy
operations to and from Gluster domains. For example [3].
The logs are attached.
[1]
*https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-ti...
<https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1-after-upgrade/7/testReport/>*
[2]
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-
4.2-ge-runner-tier1-after-upgrade/7/
[3]
2018-04-22 13:06:28,316+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH
deleteImage error=Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4' from=:
:ffff:10.35.161.182,40936, flow_id=disks_syncAction_ba6b2630-5976-4935,
task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task]
(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error
(task:875)
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
in _run
   return fn(*args, **kargs)
 File "<string>", line 2, in deleteImage
 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
method
   ret = func(*args, **kwargs)
 File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1503,
in deleteImage
   raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
ImageDoesNotExistInSD: Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
2018-04-22 13:06:28,317+0300 INFO  (jsonrpc/7) [storage.TaskManager.Task]
(Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is aborted:
"Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code 268
(task:1181)
2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher]
FINISH deleteImage error=Image does not exist in domain:
'image=cabb8846-7a4b-4244-9835-5f603e682f33,
domain=e5fd29c8-52ba-467e-be09-ca40ff054d
d4' (dispatcher:82)
On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon <ebenahar@redhat.com>
wrote:
...
Triggered a sanity tier1 execution [1] using [2], which covers all the
requested areas, on iSCSI, NFS and Gluster.
I'll update with the results.
[1]
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2
_dev/job/rhv-4.2-ge-flow-storage/1161/
[2]
https://gerrit.ovirt.org/#/c/89830/
vdsm-4.30.0-291.git77aef9a.el7.x86_64
On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik <mpolednik@redhat.com>
wrote:
...
On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:
...
Hi Martin,
I see [1] requires a rebase, can you please take care?
Should be rebased.
At the moment, our automation is stable only on iSCSI, NFS, Gluster and
...
FC.
Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's
not
stable enough at the moment.
That is still pretty good.
[1] https://gerrit.ovirt.org/#/c/89830/
...
Thanks
On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <mpolednik@redhat.com
...
wrote:
On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:
...
Hi, sorry if I misunderstood, I waited for more input regarding what
> areas
> have to be tested here.
>
>
I'd say that you have quite a bit of freedom in this regard. GlusterFS
should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
that covers basic operations (start & stop VM, migrate it), snapshots
and merging them, and whatever else would be important for storage
sanity.
mpolednik
On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <
mpolednik@redhat.com>
> wrote:
>
> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:
>
>>
>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and
>> cinder,
>>
>>> will
>>> have to check, since usually, we don't execute our automation on
>>> them.
>>>
>>>
>>> Any update on this? I believe the gluster tests were successful,
>> OST
>> passes fine and unit tests pass fine, that makes the storage
>> backends
>> test the last required piece.
>>
>>
>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <ratamir@redhat.com>
>> wrote:
>>
>>
>>> +Elad
>>>
>>>
>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <
>>>> danken@redhat.com>
>>>> wrote:
>>>>
>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <nsoffer@redhat.com>
>>>> wrote:
>>>>
>>>>
>>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <eedri@redhat.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>> Please make sure to run as much OST suites on this patch as
>>>>>> possible
>>>>>>
>>>>>> before merging ( using 'ci please build' )
>>>>>>>
>>>>>>>
>>>>>>> But note that OST is not a way to verify the patch.
>>>>>>>
>>>>>>
>>>>>> Such changes require testing with all storage types we support.
>>>>>>
>>>>>> Nir
>>>>>>
>>>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <
>>>>>> mpolednik@redhat.com
>>>>>> >
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>>
>>>>>>>> I've created a patch[0] that is finally able to activate
>>>>>>>> libvirt's
>>>>>>>> dynamic_ownership for VDSM while not negatively affecting
>>>>>>>> functionality of our storage code.
>>>>>>>>
>>>>>>>> That of course comes with quite a bit of code removal, mostly
>>>>>>>> in
>>>>>>>> the
>>>>>>>> area of host devices, hwrng and anything that touches devices;
>>>>>>>> bunch
>>>>>>>> of test changes and one XML generation caveat (storage is
>>>>>>>> handled
>>>>>>>> by
>>>>>>>> VDSM, therefore disk relabelling needs to be disabled on the
>>>>>>>> VDSM
>>>>>>>> level).
>>>>>>>>
>>>>>>>> Because of the scope of the patch, I welcome
>>>>>>>> storage/virt/network
>>>>>>>> people to review the code and consider the implication this
>>>>>>>> change
>>>>>>>> has
>>>>>>>> on current/future features.
>>>>>>>>
>>>>>>>> [0] https://gerrit.ovirt.org/#/c/89830/
>>>>>>>>
>>>>>>>>
>>>>>>>> In particular:  dynamic_ownership was set to 0
>>>>>>>> prehistorically (as
>>>>>>>>
>>>>>>>
>>>>>>> part
>>>>>>
>>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because
>>>>> libvirt,
>>>>> running as root, was not able to play properly with root-squash
>>>>> nfs
>>>>> mounts.
>>>>>
>>>>> Have you attempted this use case?
>>>>>
>>>>> I join to Nir's request to run this with storage QE.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>>
>>>>
>>>> Raz Tamir
>>>> Manager, RHV QE
>>>>
>>>>
>>>>
>>>>