[ovirt-users] oVirt gluster sanlock issue

Abi Askushi rightkicktech at gmail.com
Mon Jun 5 19:48:45 UTC 2017


Also when testing with dd i get the following:

*Testing on the gluster mount: *
dd if=/dev/zero
of=/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.img
oflag=direct bs=512 count=1
dd: error writing
β/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.imgβ:
*Transport endpoint is not connected*
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s

*Testing on the /root directory (XFS): *
dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
dd: error writing β/test2.imgβ:* Invalid argument*
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s

Seems that the gluster is trying to do the same and fails.



On Mon, Jun 5, 2017 at 10:10 PM, Abi Askushi <rightkicktech at gmail.com>
wrote:

> The question that rises is what is needed to make gluster aware of the 4K
> physical sectors presented to it (the logical sector is also 4K). The
> offset (127488) at the log does not seem aligned at 4K.
>
> Alex
>
> On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi <rightkicktech at gmail.com>
> wrote:
>
>> Hi Krutika,
>>
>> I am saying that I am facing this issue with 4k drives. I never
>> encountered this issue with 512 drives.
>>
>> Alex
>>
>> On Jun 5, 2017 14:26, "Krutika Dhananjay" <kdhananj at redhat.com> wrote:
>>
>>> This seems like a case of O_DIRECT reads and writes gone wrong, judging
>>> by the 'Invalid argument' errors.
>>>
>>> The two operations that have failed on gluster bricks are:
>>>
>>> [2017-06-05 09:40:39.428979] E [MSGID: 113072]
>>> [posix.c:3453:posix_writev] 0-engine-posix: write failed: offset 0,
>>> [Invalid argument]
>>> [2017-06-05 09:41:00.865760] E [MSGID: 113040]
>>> [posix.c:3178:posix_readv] 0-engine-posix: read failed on
>>> gfid=8c94f658-ac3c-4e3a-b368-8c038513a914, fd=0x7f408584c06c,
>>> offset=127488 size=512, buf=0x7f4083c0b000 [Invalid argument]
>>>
>>> But then, both the write and the read have 512byte-aligned offset, size
>>> and buf address (which is correct).
>>>
>>> Are you saying you don't see this issue with 4K block-size?
>>>
>>> -Krutika
>>>
>>> On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi <rightkicktech at gmail.com>
>>> wrote:
>>>
>>>> Hi Sahina,
>>>>
>>>> Attached are the logs. Let me know if sth else is needed.
>>>>
>>>> I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K
>>>> stripe size at the moment.
>>>> I have prepared the storage as below:
>>>>
>>>> pvcreate --dataalignment 256K /dev/sda4
>>>> vgcreate --physicalextentsize 256K gluster /dev/sda4
>>>>
>>>> lvcreate -n engine --size 120G gluster
>>>> mkfs.xfs -f -i size=512 /dev/gluster/engine
>>>>
>>>> Thanx,
>>>> Alex
>>>>
>>>> On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose <sabose at redhat.com> wrote:
>>>>
>>>>> Can we have the gluster mount logs and brick logs to check if it's the
>>>>> same issue?
>>>>>
>>>>> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi <rightkicktech at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I clean installed everything and ran into the same.
>>>>>> I then ran gdeploy and encountered the same issue when deploying
>>>>>> engine.
>>>>>> Seems that gluster (?) doesn't like 4K sector drives. I am not sure
>>>>>> if it has to do with alignment. The weird thing is that gluster volumes are
>>>>>> all ok, replicating normally and no split brain is reported.
>>>>>>
>>>>>> The solution to the mentioned bug (1386443
>>>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1386443>) was to format
>>>>>> with 512 sector size, which for my case is not an option:
>>>>>>
>>>>>> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
>>>>>> illegal sector size 512; hw sector is 4096
>>>>>>
>>>>>> Is there any workaround to address this?
>>>>>>
>>>>>> Thanx,
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi <rightkicktech at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Maor,
>>>>>>>
>>>>>>> My disk are of 4K block size and from this bug seems that gluster
>>>>>>> replica needs 512B block size.
>>>>>>> Is there a way to make gluster function with 4K drives?
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk <mlipchuk at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> I saw a bug that might be related to the issue you encountered at
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>>>>>>>>
>>>>>>>> Sahina, maybe you have any advise? Do you think that BZ1386443is
>>>>>>>> related?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Maor
>>>>>>>>
>>>>>>>> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi <
>>>>>>>> rightkicktech at gmail.com> wrote:
>>>>>>>> > Hi All,
>>>>>>>> >
>>>>>>>> > I have installed successfully several times oVirt (version 4.1)
>>>>>>>> with 3 nodes
>>>>>>>> > on top glusterfs.
>>>>>>>> >
>>>>>>>> > This time, when trying to configure the same setup, I am facing
>>>>>>>> the
>>>>>>>> > following issue which doesn't seem to go away. During
>>>>>>>> installation i get the
>>>>>>>> > error:
>>>>>>>> >
>>>>>>>> > Failed to execute stage 'Misc configuration': Cannot acquire host
>>>>>>>> id:
>>>>>>>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>> 'Sanlock
>>>>>>>> > lockspace add failure', 'Invalid argument'))
>>>>>>>> >
>>>>>>>> > The only different in this setup is that instead of standard
>>>>>>>> partitioning i
>>>>>>>> > have GPT partitioning and the disks have 4K block size instead of
>>>>>>>> 512.
>>>>>>>> >
>>>>>>>> > The /var/log/sanlock.log has the following lines:
>>>>>>>> >
>>>>>>>> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
>>>>>>>> nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8
>>>>>>>> -46e7-b2c8-91e4a5bb2047/dom_md/ids:0
>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
>>>>>>>> nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b
>>>>>>>> 8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
>>>>>>>> > for 2,9,23040
>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
>>>>>>>> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/m
>>>>>>>> nt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8
>>>>>>>> b4d5e5e922/dom_md/ids:0
>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
>>>>>>>> > 0x7f59b00008c0:0x7f59b00008d0:0x7f59b0101000 result -22:0 match
>>>>>>>> res
>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader
>>>>>>>> offset
>>>>>>>> > 127488 rv -22
>>>>>>>> > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e
>>>>>>>> 7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
>>>>>>>> > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
>>>>>>>> > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
>>>>>>>> > 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail
>>>>>>>> result -22
>>>>>>>> >
>>>>>>>> > And /var/log/vdsm/vdsm.log says:
>>>>>>>> >
>>>>>>>> > 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
>>>>>>>> > [storage.StorageServer.MountConnection] Using user specified
>>>>>>>> > backup-volfile-servers option (storageServer:253)
>>>>>>>> > 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM
>>>>>>>> not
>>>>>>>> > available. (throttledlog:105)
>>>>>>>> > 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM
>>>>>>>> not
>>>>>>>> > available, KSM stats will be missing. (throttledlog:105)
>>>>>>>> > 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
>>>>>>>> > [storage.StorageServer.MountConnection] Using user specified
>>>>>>>> > backup-volfile-servers option (storageServer:253)
>>>>>>>> > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4)
>>>>>>>> [storage.initSANLock] Cannot
>>>>>>>> > initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5
>>>>>>>> e5e922
>>>>>>>> > (clusterlock:238)
>>>>>>>> > Traceback (most recent call last):
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>> > 234, in initSANLock
>>>>>>>> >     sanlock.init_lockspace(sdUUID, idsPath)
>>>>>>>> > SanlockException: (107, 'Sanlock lockspace init failure',
>>>>>>>> 'Transport
>>>>>>>> > endpoint is not connected')
>>>>>>>> > 2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4)
>>>>>>>> > [storage.StorageDomainManifest] lease did not initialize
>>>>>>>> successfully
>>>>>>>> > (sd:557)
>>>>>>>> > Traceback (most recent call last):
>>>>>>>> >   File "/usr/share/vdsm/storage/sd.py", line 552, in
>>>>>>>> initDomainLock
>>>>>>>> >     self._domainLock.initLock(self.getDomainLease())
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>> > 271, in initLock
>>>>>>>> >     initSANLock(self._sdUUID, self._idsPath, lease)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>> > 239, in initSANLock
>>>>>>>> >     raise se.ClusterLockInitError()
>>>>>>>> > ClusterLockInitError: Could not initialize cluster lock: ()
>>>>>>>> > 2017-06-03 19:21:37,867+0200 ERROR (jsonrpc/2)
>>>>>>>> [storage.StoragePool] Create
>>>>>>>> > pool hosted_datacenter canceled  (sp:655)
>>>>>>>> > Traceback (most recent call last):
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 652, in create
>>>>>>>> >     self.attachSD(sdUUID)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>> > 79, in wrapper
>>>>>>>> >     return method(self, *args, **kwargs)
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 971, in attachSD
>>>>>>>> >     dom.acquireHostId(self.id)
>>>>>>>> >   File "/usr/share/vdsm/storage/sd.py", line 790, in
>>>>>>>> acquireHostId
>>>>>>>> >     self._manifest.acquireHostId(hostId, async)
>>>>>>>> >   File "/usr/share/vdsm/storage/sd.py", line 449, in
>>>>>>>> acquireHostId
>>>>>>>> >     self._domainLock.acquireHostId(hostId, async)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>> > 297, in acquireHostId
>>>>>>>> >     raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>>> > AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>> 'Sanlock
>>>>>>>> > lockspace add failure', 'Invalid argument'))
>>>>>>>> > 2017-06-03 19:21:37,870+0200 ERROR (jsonrpc/2)
>>>>>>>> [storage.StoragePool] Domain
>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 detach from MSD
>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 Ver 1 failed. (sp:528)
>>>>>>>> > Traceback (most recent call last):
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 525, in
>>>>>>>> __cleanupDomains
>>>>>>>> >     self.detachSD(sdUUID)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>> > 79, in wrapper
>>>>>>>> >     return method(self, *args, **kwargs)
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 1046, in detachSD
>>>>>>>> >     raise se.CannotDetachMasterStorageDomain(sdUUID)
>>>>>>>> > CannotDetachMasterStorageDomain: Illegal action:
>>>>>>>> > (u'ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047',)
>>>>>>>> > 2017-06-03 19:21:37,872+0200 ERROR (jsonrpc/2)
>>>>>>>> [storage.StoragePool] Domain
>>>>>>>> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922 detach from MSD
>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 Ver 1 failed. (sp:528)
>>>>>>>> > Traceback (most recent call last):
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 525, in
>>>>>>>> __cleanupDomains
>>>>>>>> >     self.detachSD(sdUUID)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>> > 79, in wrapper
>>>>>>>> >     return method(self, *args, **kwargs)
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 1043, in detachSD
>>>>>>>> >     self.validateAttachedDomain(dom)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>> > 79, in wrapper
>>>>>>>> >     return method(self, *args, **kwargs)
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 542, in
>>>>>>>> validateAttachedDomain
>>>>>>>> >     self.validatePoolSD(dom.sdUUID)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>> > 79, in wrapper
>>>>>>>> >     return method(self, *args, **kwargs)
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 535, in
>>>>>>>> validatePoolSD
>>>>>>>> >     raise se.StorageDomainNotMemberOfPool(self.spUUID, sdUUID)
>>>>>>>> > StorageDomainNotMemberOfPool: Domain is not member in pool:
>>>>>>>> > u'pool=a1e7e9dd-0cf4-41ae-ba13-36297ed66309,
>>>>>>>> > domain=a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922'
>>>>>>>> > 2017-06-03 19:21:40,063+0200 ERROR (jsonrpc/2)
>>>>>>>> [storage.TaskManager.Task]
>>>>>>>> > (Task='a2476a33-26f8-4ebd-876d-02fe5d13ef78') Unexpected error
>>>>>>>> (task:870)
>>>>>>>> > Traceback (most recent call last):
>>>>>>>> >  File "/usr/share/vdsm/storage/task.py", line 877, in _run
>>>>>>>> >     return fn(*args, **kargs)
>>>>>>>> >   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line
>>>>>>>> 52, in
>>>>>>>> > wrapper
>>>>>>>> >     res = f(*args, **kwargs)
>>>>>>>> >   File "/usr/share/vdsm/storage/hsm.py", line 959, in
>>>>>>>> createStoragePool
>>>>>>>> >     leaseParams)
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 652, in create
>>>>>>>> >     self.attachSD(sdUUID)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>> > 79, in wrapper
>>>>>>>> >     return method(self, *args, **kwargs)
>>>>>>>> >   File "/usr/share/vdsm/storage/sp.py", line 971, in attachSD
>>>>>>>> >     dom.acquireHostId(self.id)
>>>>>>>> >   File "/usr/share/vdsm/storage/sd.py", line 790, in
>>>>>>>> acquireHostId
>>>>>>>> >     self._manifest.acquireHostId(hostId, async)
>>>>>>>> >   File "/usr/share/vdsm/storage/sd.py", line 449, in
>>>>>>>> acquireHostId
>>>>>>>> >     self._domainLock.acquireHostId(hostId, async)
>>>>>>>> >   File "/usr/lib/python2.7/site-packa
>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>> > 297, in acquireHostId
>>>>>>>> >     raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>>> > AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>> 'Sanlock
>>>>>>>> > lockspace add failure', 'Invalid argument'))
>>>>>>>> > 2017-06-03 19:21:40,067+0200 ERROR (jsonrpc/2)
>>>>>>>> [storage.Dispatcher]
>>>>>>>> > {'status': {'message': "Cannot acquire host id:
>>>>>>>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>> 'Sanlock
>>>>>>>> > lockspace add failure', 'Invalid argument'))", 'code': 661}}
>>>>>>>> (dispatcher:77)
>>>>>>>> >
>>>>>>>> > The gluster volume prepared for engine storage is online and no
>>>>>>>> split brain
>>>>>>>> > is reported. I don't understand what needs to be done to overcome
>>>>>>>> this. Any
>>>>>>>> > idea will be appreciated.
>>>>>>>> >
>>>>>>>> > Thank you,
>>>>>>>> > Alex
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > Users mailing list
>>>>>>>> > Users at ovirt.org
>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170605/bd74256a/attachment-0001.html>


More information about the Users mailing list