I stand corrected.
Just realised the strace command I gave was wrong.
Here's what you would actually need to execute:
strace -y -ff -o <path-where-you-want-your-output-saved> <dd command here>
-Krutika
On Tue, Jun 6, 2017 at 3:20 PM, Krutika Dhananjay <kdhananj(a)redhat.com>
wrote:
OK.
So for the 'Transport endpoint is not connected' issue, could you share
the mount and brick logs?
Hmmm.. 'Invalid argument' error even on the root partition. What if you
change bs to 4096 and run?
The logs I showed in my earlier mail shows that gluster is merely
returning the error it got from the disk file system where the
brick is hosted. But you're right about the fact that the offset 127488 is
not 4K-aligned.
If the dd on /root worked for you with bs=4096, could you try the same
directly on gluster mount point on a dummy file and capture the strace
output of dd?
You can perhaps reuse your existing gluster volume by mounting it at
another location and doing the dd.
Here's what you need to execute:
strace -ff -T -p <pid-of-mount-process> -o
<path-to-the-file-where-you-want-the-output-saved>`
FWIW, here's something I found in man(2) open:
*Under Linux 2.4, transfer sizes, and the alignment of the user
buffer and the file offset must all be multiples of the logical block size
of the filesystem. Since Linux 2.6.0, alignment to the logical block size
of the underlying storage (typically 512 bytes) suffices. The
logical block size can be determined using the ioctl(2) BLKSSZGET operation
or from the shell using the command: blockdev --getss*
-Krutika
On Tue, Jun 6, 2017 at 1:18 AM, Abi Askushi <rightkicktech(a)gmail.com>
wrote:
> Also when testing with dd i get the following:
>
> *Testing on the gluster mount: *
> dd if=/dev/zero of=/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.img
> oflag=direct bs=512 count=1
> dd: error writing β/rhev/data-center/mnt/glusterSD/10.100.100.1:
> _engine/test2.imgβ: *Transport endpoint is not connected*
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s
>
> *Testing on the /root directory (XFS): *
> dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
> dd: error writing β/test2.imgβ:* Invalid argument*
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s
>
> Seems that the gluster is trying to do the same and fails.
>
>
>
> On Mon, Jun 5, 2017 at 10:10 PM, Abi Askushi <rightkicktech(a)gmail.com>
> wrote:
>
>> The question that rises is what is needed to make gluster aware of the
>> 4K physical sectors presented to it (the logical sector is also 4K). The
>> offset (127488) at the log does not seem aligned at 4K.
>>
>> Alex
>>
>> On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi <rightkicktech(a)gmail.com>
>> wrote:
>>
>>> Hi Krutika,
>>>
>>> I am saying that I am facing this issue with 4k drives. I never
>>> encountered this issue with 512 drives.
>>>
>>> Alex
>>>
>>> On Jun 5, 2017 14:26, "Krutika Dhananjay"
<kdhananj(a)redhat.com> wrote:
>>>
>>>> This seems like a case of O_DIRECT reads and writes gone wrong,
>>>> judging by the 'Invalid argument' errors.
>>>>
>>>> The two operations that have failed on gluster bricks are:
>>>>
>>>> [2017-06-05 09:40:39.428979] E [MSGID: 113072]
>>>> [posix.c:3453:posix_writev] 0-engine-posix: write failed: offset 0,
>>>> [Invalid argument]
>>>> [2017-06-05 09:41:00.865760] E [MSGID: 113040]
>>>> [posix.c:3178:posix_readv] 0-engine-posix: read failed on
>>>> gfid=8c94f658-ac3c-4e3a-b368-8c038513a914, fd=0x7f408584c06c,
>>>> offset=127488 size=512, buf=0x7f4083c0b000 [Invalid argument]
>>>>
>>>> But then, both the write and the read have 512byte-aligned offset,
>>>> size and buf address (which is correct).
>>>>
>>>> Are you saying you don't see this issue with 4K block-size?
>>>>
>>>> -Krutika
>>>>
>>>> On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi
<rightkicktech(a)gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sahina,
>>>>>
>>>>> Attached are the logs. Let me know if sth else is needed.
>>>>>
>>>>> I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K
>>>>> stripe size at the moment.
>>>>> I have prepared the storage as below:
>>>>>
>>>>> pvcreate --dataalignment 256K /dev/sda4
>>>>> vgcreate --physicalextentsize 256K gluster /dev/sda4
>>>>>
>>>>> lvcreate -n engine --size 120G gluster
>>>>> mkfs.xfs -f -i size=512 /dev/gluster/engine
>>>>>
>>>>> Thanx,
>>>>> Alex
>>>>>
>>>>> On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose
<sabose(a)redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Can we have the gluster mount logs and brick logs to check if
it's
>>>>>> the same issue?
>>>>>>
>>>>>> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi <
>>>>>> rightkicktech(a)gmail.com> wrote:
>>>>>>
>>>>>>> I clean installed everything and ran into the same.
>>>>>>> I then ran gdeploy and encountered the same issue when
deploying
>>>>>>> engine.
>>>>>>> Seems that gluster (?) doesn't like 4K sector drives. I
am not sure
>>>>>>> if it has to do with alignment. The weird thing is that
gluster volumes are
>>>>>>> all ok, replicating normally and no split brain is reported.
>>>>>>>
>>>>>>> The solution to the mentioned bug (1386443
>>>>>>> <
https://bugzilla.redhat.com/show_bug.cgi?id=1386443>)
was to
>>>>>>> format with 512 sector size, which for my case is not an
option:
>>>>>>>
>>>>>>> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
>>>>>>> illegal sector size 512; hw sector is 4096
>>>>>>>
>>>>>>> Is there any workaround to address this?
>>>>>>>
>>>>>>> Thanx,
>>>>>>> Alex
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi <
>>>>>>> rightkicktech(a)gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Maor,
>>>>>>>>
>>>>>>>> My disk are of 4K block size and from this bug seems that
gluster
>>>>>>>> replica needs 512B block size.
>>>>>>>> Is there a way to make gluster function with 4K drives?
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk
<mlipchuk(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> I saw a bug that might be related to the issue you
encountered at
>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>>>>>>>>>
>>>>>>>>> Sahina, maybe you have any advise? Do you think that
BZ1386443is
>>>>>>>>> related?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Maor
>>>>>>>>>
>>>>>>>>> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi <
>>>>>>>>> rightkicktech(a)gmail.com> wrote:
>>>>>>>>> > Hi All,
>>>>>>>>> >
>>>>>>>>> > I have installed successfully several times
oVirt (version 4.1)
>>>>>>>>> with 3 nodes
>>>>>>>>> > on top glusterfs.
>>>>>>>>> >
>>>>>>>>> > This time, when trying to configure the same
setup, I am facing
>>>>>>>>> the
>>>>>>>>> > following issue which doesn't seem to go
away. During
>>>>>>>>> installation i get the
>>>>>>>>> > error:
>>>>>>>>> >
>>>>>>>>> > Failed to execute stage 'Misc
configuration': Cannot acquire
>>>>>>>>> host id:
>>>>>>>>> >
(u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>>> 'Sanlock
>>>>>>>>> > lockspace add failure', 'Invalid
argument'))
>>>>>>>>> >
>>>>>>>>> > The only different in this setup is that instead
of standard
>>>>>>>>> partitioning i
>>>>>>>>> > have GPT partitioning and the disks have 4K
block size instead
>>>>>>>>> of 512.
>>>>>>>>> >
>>>>>>>>> > The /var/log/sanlock.log has the following
lines:
>>>>>>>>> >
>>>>>>>>> > 2017-06-03 19:21:15+0200 23450 [943]: s9
lockspace
>>>>>>>>> >
ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
>>>>>>>>>
nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8
>>>>>>>>> -46e7-b2c8-91e4a5bb2047/dom_md/ids:0
>>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5
resource
>>>>>>>>> >
ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
>>>>>>>>>
nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b
>>>>>>>>> 8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
>>>>>>>>> > for 2,9,23040
>>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [943]: s10
lockspace
>>>>>>>>> >
a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/m
>>>>>>>>>
nt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8
>>>>>>>>> b4d5e5e922/dom_md/ids:0
>>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7
aio collect RD
>>>>>>>>> > 0x7f59b00008c0:0x7f59b00008d0:0x7f59b0101000
result -22:0
>>>>>>>>> match res
>>>>>>>>> > 2017-06-03 19:21:36+0200 23471 [23522]:
read_sectors
>>>>>>>>> delta_leader offset
>>>>>>>>> > 127488 rv -22
>>>>>>>>> >
/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e
>>>>>>>>> 7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
>>>>>>>>> > 2017-06-03 19:21:37+0200 23472 [930]: s9 host
250 1 23450
>>>>>>>>> > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
>>>>>>>>> > 2017-06-03 19:21:37+0200 23472 [943]: s10
add_lockspace fail
>>>>>>>>> result -22
>>>>>>>>> >
>>>>>>>>> > And /var/log/vdsm/vdsm.log says:
>>>>>>>>> >
>>>>>>>>> > 2017-06-03 19:19:38,176+0200 WARN (jsonrpc/3)
>>>>>>>>> > [storage.StorageServer.MountConnection] Using
user specified
>>>>>>>>> > backup-volfile-servers option
(storageServer:253)
>>>>>>>>> > 2017-06-03 19:21:12,379+0200 WARN (periodic/1)
[throttled] MOM
>>>>>>>>> not
>>>>>>>>> > available. (throttledlog:105)
>>>>>>>>> > 2017-06-03 19:21:12,380+0200 WARN (periodic/1)
[throttled] MOM
>>>>>>>>> not
>>>>>>>>> > available, KSM stats will be missing.
(throttledlog:105)
>>>>>>>>> > 2017-06-03 19:21:14,714+0200 WARN (jsonrpc/1)
>>>>>>>>> > [storage.StorageServer.MountConnection] Using
user specified
>>>>>>>>> > backup-volfile-servers option
(storageServer:253)
>>>>>>>>> > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4)
>>>>>>>>> [storage.initSANLock] Cannot
>>>>>>>>> > initialize SANLock for domain
a5a6b0e7-fc3f-4838-8e26-c8b4d5
>>>>>>>>> e5e922
>>>>>>>>> > (clusterlock:238)
>>>>>>>>> > Traceback (most recent call last):
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>>> > 234, in initSANLock
>>>>>>>>> > sanlock.init_lockspace(sdUUID, idsPath)
>>>>>>>>> > SanlockException: (107, 'Sanlock lockspace
init failure',
>>>>>>>>> 'Transport
>>>>>>>>> > endpoint is not connected')
>>>>>>>>> > 2017-06-03 19:21:15,515+0200 WARN (jsonrpc/4)
>>>>>>>>> > [storage.StorageDomainManifest] lease did not
initialize
>>>>>>>>> successfully
>>>>>>>>> > (sd:557)
>>>>>>>>> > Traceback (most recent call last):
>>>>>>>>> > File
"/usr/share/vdsm/storage/sd.py", line 552, in
>>>>>>>>> initDomainLock
>>>>>>>>> >
self._domainLock.initLock(self.getDomainLease())
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>>> > 271, in initLock
>>>>>>>>> > initSANLock(self._sdUUID, self._idsPath,
lease)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>>> > 239, in initSANLock
>>>>>>>>> > raise se.ClusterLockInitError()
>>>>>>>>> > ClusterLockInitError: Could not initialize
cluster lock: ()
>>>>>>>>> > 2017-06-03 19:21:37,867+0200 ERROR (jsonrpc/2)
>>>>>>>>> [storage.StoragePool] Create
>>>>>>>>> > pool hosted_datacenter canceled (sp:655)
>>>>>>>>> > Traceback (most recent call last):
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 652, in create
>>>>>>>>> > self.attachSD(sdUUID)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>>> > 79, in wrapper
>>>>>>>>> > return method(self, *args, **kwargs)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 971, in attachSD
>>>>>>>>> > dom.acquireHostId(self.id)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sd.py", line 790, in
>>>>>>>>> acquireHostId
>>>>>>>>> > self._manifest.acquireHostId(hostId, async)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sd.py", line 449, in
>>>>>>>>> acquireHostId
>>>>>>>>> > self._domainLock.acquireHostId(hostId,
async)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>>> > 297, in acquireHostId
>>>>>>>>> > raise se.AcquireHostIdFailure(self._sdUUID,
e)
>>>>>>>>> > AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>>> >
(u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>>> 'Sanlock
>>>>>>>>> > lockspace add failure', 'Invalid
argument'))
>>>>>>>>> > 2017-06-03 19:21:37,870+0200 ERROR (jsonrpc/2)
>>>>>>>>> [storage.StoragePool] Domain
>>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 detach from
MSD
>>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 Ver 1
failed. (sp:528)
>>>>>>>>> > Traceback (most recent call last):
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 525, in
>>>>>>>>> __cleanupDomains
>>>>>>>>> > self.detachSD(sdUUID)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>>> > 79, in wrapper
>>>>>>>>> > return method(self, *args, **kwargs)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 1046, in detachSD
>>>>>>>>> > raise
se.CannotDetachMasterStorageDomain(sdUUID)
>>>>>>>>> > CannotDetachMasterStorageDomain: Illegal
action:
>>>>>>>>> >
(u'ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047',)
>>>>>>>>> > 2017-06-03 19:21:37,872+0200 ERROR (jsonrpc/2)
>>>>>>>>> [storage.StoragePool] Domain
>>>>>>>>> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922 detach from
MSD
>>>>>>>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 Ver 1
failed. (sp:528)
>>>>>>>>> > Traceback (most recent call last):
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 525, in
>>>>>>>>> __cleanupDomains
>>>>>>>>> > self.detachSD(sdUUID)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>>> > 79, in wrapper
>>>>>>>>> > return method(self, *args, **kwargs)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 1043, in detachSD
>>>>>>>>> > self.validateAttachedDomain(dom)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>>> > 79, in wrapper
>>>>>>>>> > return method(self, *args, **kwargs)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 542, in
>>>>>>>>> validateAttachedDomain
>>>>>>>>> > self.validatePoolSD(dom.sdUUID)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>>> > 79, in wrapper
>>>>>>>>> > return method(self, *args, **kwargs)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 535, in
>>>>>>>>> validatePoolSD
>>>>>>>>> > raise
se.StorageDomainNotMemberOfPool(self.spUUID, sdUUID)
>>>>>>>>> > StorageDomainNotMemberOfPool: Domain is not
member in pool:
>>>>>>>>> >
u'pool=a1e7e9dd-0cf4-41ae-ba13-36297ed66309,
>>>>>>>>> >
domain=a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922'
>>>>>>>>> > 2017-06-03 19:21:40,063+0200 ERROR (jsonrpc/2)
>>>>>>>>> [storage.TaskManager.Task]
>>>>>>>>> >
(Task='a2476a33-26f8-4ebd-876d-02fe5d13ef78') Unexpected error
>>>>>>>>> (task:870)
>>>>>>>>> > Traceback (most recent call last):
>>>>>>>>> > File
"/usr/share/vdsm/storage/task.py", line 877, in _run
>>>>>>>>> > return fn(*args, **kargs)
>>>>>>>>> > File
"/usr/lib/python2.7/site-packages/vdsm/logUtils.py",
>>>>>>>>> line 52, in
>>>>>>>>> > wrapper
>>>>>>>>> > res = f(*args, **kwargs)
>>>>>>>>> > File
"/usr/share/vdsm/storage/hsm.py", line 959, in
>>>>>>>>> createStoragePool
>>>>>>>>> > leaseParams)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 652, in create
>>>>>>>>> > self.attachSD(sdUUID)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/securable.py", line
>>>>>>>>> > 79, in wrapper
>>>>>>>>> > return method(self, *args, **kwargs)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sp.py", line 971, in attachSD
>>>>>>>>> > dom.acquireHostId(self.id)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sd.py", line 790, in
>>>>>>>>> acquireHostId
>>>>>>>>> > self._manifest.acquireHostId(hostId, async)
>>>>>>>>> > File
"/usr/share/vdsm/storage/sd.py", line 449, in
>>>>>>>>> acquireHostId
>>>>>>>>> > self._domainLock.acquireHostId(hostId,
async)
>>>>>>>>> > File "/usr/lib/python2.7/site-packa
>>>>>>>>> ges/vdsm/storage/clusterlock.py", line
>>>>>>>>> > 297, in acquireHostId
>>>>>>>>> > raise se.AcquireHostIdFailure(self._sdUUID,
e)
>>>>>>>>> > AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>>> >
(u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>>> 'Sanlock
>>>>>>>>> > lockspace add failure', 'Invalid
argument'))
>>>>>>>>> > 2017-06-03 19:21:40,067+0200 ERROR (jsonrpc/2)
>>>>>>>>> [storage.Dispatcher]
>>>>>>>>> > {'status': {'message':
"Cannot acquire host id:
>>>>>>>>> >
(u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>>>>>>>> 'Sanlock
>>>>>>>>> > lockspace add failure', 'Invalid
argument'))", 'code': 661}}
>>>>>>>>> (dispatcher:77)
>>>>>>>>> >
>>>>>>>>> > The gluster volume prepared for engine storage
is online and no
>>>>>>>>> split brain
>>>>>>>>> > is reported. I don't understand what needs
to be done to
>>>>>>>>> overcome this. Any
>>>>>>>>> > idea will be appreciated.
>>>>>>>>> >
>>>>>>>>> > Thank you,
>>>>>>>>> > Alex
>>>>>>>>> >
>>>>>>>>> > _______________________________________________
>>>>>>>>> > Users mailing list
>>>>>>>>> > Users(a)ovirt.org
>>>>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users(a)ovirt.org
>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>
>