will
resolve the issue running multiple brick instance for same brick.
As we can see in below logs glusterd is trying to start the same brick
instance twice at the same time
[2019-04-01 10:23:21.752401] I [glusterd-utils.c:6301:glusterd_brick_start]
0-management: starting a fresh brick process for brick
/data/gfs/bricks/brick1/ovirt-engine
[2019-04-01 10:23:30.348091] I [glusterd-utils.c:6301:glusterd_brick_start]
0-management: starting a fresh brick process for brick
/data/gfs/bricks/brick1/ovirt-engine
[2019-04-01 10:24:13.353396] I [glusterd-utils.c:6301:glusterd_brick_start]
0-management: starting a fresh brick process for brick
/data/gfs/bricks/brick1/ovirt-engine
[2019-04-01 10:24:24.253764] I [glusterd-utils.c:6301:glusterd_brick_start]
0-management: starting a fresh brick process for brick
/data/gfs/bricks/brick1/ovirt-engine
We are seeing below message between starting of two instances
The message "E [MSGID: 101012] [common-utils.c:4075:gf_is_service_running]
0-: Unable to read pidfile:
/var/run/gluster/vols/ovirt-engine/10.32.9.5-data-gfs-bricks-brick1-ovirt-engine.pid"
repeated 2 times between [2019-04-01 10:23:21.748492] and [2019-04-01
10:23:21.752432]
I will backport the same.
Thanks,
Mohit Agrawal
On Wed, Apr 3, 2019 at 3:58 PM Olaf Buitelaar <olaf.buitelaar(a)gmail.com>
wrote:
Dear Mohit,
Sorry i thought Krutika was referring to the ovirt-kube brick logs. due
the large size (18MB compressed), i've placed the files here;
https://edgecastcdn.net/0004FA/files/bricklogs.tar.bz2
Also i see i've attached the wrong files, i intended to
attach profile_data4.txt | profile_data3.txt
Sorry for the confusion.
Thanks Olaf
Op wo 3 apr. 2019 om 04:56 schreef Mohit Agrawal <moagrawa(a)redhat.com>:
> Hi Olaf,
>
> As per current attached "multi-glusterfsd-vol3.txt |
> multi-glusterfsd-vol4.txt" it is showing multiple processes are running
> for "ovirt-core ovirt-engine" brick names but there are no logs
> available in bricklogs.zip specific to this bricks, bricklogs.zip
> has a dump of ovirt-kube logs only
>
> Kindly share brick logs specific to the bricks "ovirt-core
> ovirt-engine" and share glusterd logs also.
>
> Regards
> Mohit Agrawal
>
> On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar <olaf.buitelaar(a)gmail.com>
> wrote:
>
>> Dear Krutika,
>>
>> 1.
>> I've changed the volume settings, write performance seems to increased
>> somewhat, however the profile doesn't really support that since latencies
>> increased. However read performance has diminished, which does seem to be
>> supported by the profile runs (attached).
>> Also the IO does seem to behave more consistent than before.
>> I don't really understand the idea behind them, maybe you can explain
>> why these suggestions are good?
>> These settings seems to avoid as much local caching and access as
>> possible and push everything to the gluster processes. While i would expect
>> local access and local caches are a good thing, since it would lead to
>> having less network access or disk access.
>> I tried to investigate these settings a bit more, and this is what i
>> understood of them;
>> - network.remote-dio; when on it seems to ignore the O_DIRECT flag in
>> the client, thus causing the files to be cached and buffered in the page
>> cache on the client, i would expect this to be a good thing especially if
>> the server process would access the same page cache?
>> At least that is what grasp from this commit;
>>
https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client...
line
>> 867
>> Also found this commit;
>>
https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60...
suggesting
>> remote-dio actually improves performance, not sure it's a write or read
>> benchmark
>> When a file is opened with O_DIRECT it will also disable the
>> write-behind functionality
>>
>> - performance.strict-o-direct: when on, the AFR, will not ignore the
>> O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper,
>> which seems to stack the operation, no idea why that is. But generally i
>> suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a
>> processes requests to have O_DIRECT. So this makes sense to me.
>>
>> - cluster.choose-local: when off, it doesn't prefer the local node, but
>> would always choose a brick. Since it's a 9 node cluster, with 3
>> subvolumes, only a 1/3 could end-up local, and the other 2/3 should be
>> pushed to external nodes anyway. Or am I making the total wrong assumption
>> here?
>>
>> It seems to this config is moving to the gluster-block config side of
>> things, which does make sense.
>> Since we're running quite some mysql instances, which opens the files
>> with O_DIRECt i believe, it would mean the only layer of cache is within
>> mysql it self. Which you could argue is a good thing. But i would expect a
>> little of write-behind buffer, and maybe some of the data cached within
>> gluster would alleviate things a bit on gluster's side. But i wouldn't
know
>> if that's the correct mind set, and so might be totally off here.
>> Also i would expect these gluster v set <VOL> command to be online
>> operations, but somehow the bricks went down, after applying these changes.
>> What appears to have happened is that after the update the brick process
>> was restarted, but due to multiple brick process start issue, multiple
>> processes were started, and the brick didn't came online again.
>> However i'll try to reproduce this, since i would like to test with
>> cluster.choose-local: on, and see how performance compares. And hopefully
>> when it occurs collect some useful info.
>> Question; are network.remote-dio and performance.strict-o-direct
>> mutually exclusive settings, or can they both be on?
>>
>> 2. I've attached all brick logs, the only thing relevant i found was;
>> [2019-03-28 20:20:07.170452] I [MSGID: 113030]
>> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix:
>> open-fd-key-status: 0 for
>>
/data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
>> [2019-03-28 20:20:07.170491] I [MSGID: 113031]
>> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr
>> status: 0 for
>>
/data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
>> [2019-03-28 20:20:07.248480] I [MSGID: 113030]
>> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix:
>> open-fd-key-status: 0 for
>>
/data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
>> [2019-03-28 20:20:07.248491] I [MSGID: 113031]
>> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr
>> status: 0 for
>>
/data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
>>
>> Thanks Olaf
>>
>> ps. sorry needed to resend since it exceed the file limit
>>
>> Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay <
>> kdhananj(a)redhat.com>:
>>
>>> Adding back gluster-users
>>> Comments inline ...
>>>
>>> On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar <
>>> olaf.buitelaar(a)gmail.com> wrote:
>>>
>>>> Dear Krutika,
>>>>
>>>>
>>>>
>>>> 1. I’ve made 2 profile runs of around 10 minutes (see files
>>>> profile_data.txt and profile_data2.txt). Looking at it, most time seems
be
>>>> spent at the fop’s fsync and readdirp.
>>>>
>>>> Unfortunate I don’t have the profile info for the 3.12.15 version so
>>>> it’s a bit hard to compare.
>>>>
>>>> One additional thing I do notice on 1 machine (10.32.9.5) the iowait
>>>> time increased a lot, from an average below the 1% it’s now around the
12%
>>>> after the upgrade.
>>>>
>>>> So first suspicion with be lighting strikes twice, and I’ve also just
>>>> now a bad disk, but that doesn’t appear to be the case, since all smart
>>>> status report ok.
>>>>
>>>> Also dd shows performance I would more or less expect;
>>>>
>>>> dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync
>>>>
>>>> 1+0 records in
>>>>
>>>> 1+0 records out
>>>>
>>>> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
>>>>
>>>> dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync
>>>>
>>>> 1+0 records in
>>>>
>>>> 1+0 records out
>>>>
>>>> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
>>>>
>>>> if=/dev/urandom of=/data/test_file bs=1024 count=1000000
>>>>
>>>> 1000000+0 records in
>>>>
>>>> 1000000+0 records out
>>>>
>>>> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
>>>>
>>>> dd if=/dev/zero of=/data/test_file bs=1024 count=1000000
>>>>
>>>> 1000000+0 records in
>>>>
>>>> 1000000+0 records out
>>>>
>>>> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
>>>>
>>>> When I disable this brick (service glusterd stop; pkill glusterfsd)
>>>> performance in gluster is better, but not on par with what it was. Also
the
>>>> cpu usages on the “neighbor” nodes which hosts the other bricks in the
same
>>>> subvolume increases quite a lot in this case, which I wouldn’t expect
>>>> actually since they shouldn't handle much more work, except flagging
shards
>>>> to heal. Iowait also goes to idle once gluster is stopped, so it’s for
>>>> sure gluster which waits for io.
>>>>
>>>>
>>>>
>>>
>>> So I see that FSYNC %-latency is on the higher side. And I also noticed
>>> you don't have direct-io options enabled on the volume.
>>> Could you set the following options on the volume -
>>> # gluster volume set <VOLNAME> network.remote-dio off
>>> # gluster volume set <VOLNAME> performance.strict-o-direct on
>>> and also disable choose-local
>>> # gluster volume set <VOLNAME> cluster.choose-local off
>>>
>>> let me know if this helps.
>>>
>>> 2. I’ve attached the mnt log and volume info, but I couldn’t find
>>>> anything relevant in in those logs. I think this is because we run the
VM’s
>>>> with libgfapi;
>>>>
>>>> [root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported
>>>>
>>>> LibgfApiSupported: true version: 4.2
>>>>
>>>> LibgfApiSupported: true version: 4.1
>>>>
>>>> LibgfApiSupported: true version: 4.3
>>>>
>>>> And I can confirm the qemu process is invoked with the gluster://
>>>> address for the images.
>>>>
>>>> The message is logged in the /var/lib/libvert/qemu/<machine>
file,
>>>> which I’ve also included. For a sample case see around; 2019-03-28
20:20:07
>>>>
>>>> Which has the error; E [MSGID: 133010]
>>>> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup
on
>>>> shard 109886 failed. Base file gfid =
a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
>>>> [Stale file handle]
>>>>
>>>
>>> Could you also attach the brick logs for this volume?
>>>
>>>
>>>>
>>>> 3. yes I see multiple instances for the same brick directory, like;
>>>>
>>>> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id
>>>> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p
>>>>
/var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid
>>>> -S /var/run/gluster/452591c9165945d9.socket --brick-name
>>>> /data/gfs/bricks/brick1/ovirt-core -l
>>>> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log
>>>> --xlator-option
*-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1
>>>> --process-name brick --brick-port 49154 --xlator-option
>>>> ovirt-core-server.listen-port=49154
>>>>
>>>>
>>>>
>>>> I’ve made an export of the output of ps from the time I observed these
>>>> multiple processes.
>>>>
>>>> In addition the brick_mux bug as noted by Atin. I might also have
>>>> another possible cause, as ovirt moves nodes from none-operational state
or
>>>> maintenance state to active/activating, it also seems to restart
gluster,
>>>> however I don’t have direct proof for this theory.
>>>>
>>>>
>>>>
>>>
>>> +Atin Mukherjee <amukherj(a)redhat.com> ^^
>>> +Mohit Agrawal <moagrawa(a)redhat.com> ^^
>>>
>>> -Krutika
>>>
>>> Thanks Olaf
>>>>
>>>> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola <
>>>> sbonazzo(a)redhat.com>:
>>>>
>>>>>
>>>>>
>>>>> Il giorno gio 28 mar 2019 alle ore 17:48
<olaf.buitelaar(a)gmail.com>
>>>>> ha scritto:
>>>>>
>>>>>> Dear All,
>>>>>>
>>>>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1.
While
>>>>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this
one was a
>>>>>> different experience. After first trying a test upgrade on a 3
node setup,
>>>>>> which went fine. i headed to upgrade the 9 node production
platform,
>>>>>> unaware of the backward compatibility issues between gluster
3.12.15 ->
>>>>>> 5.3. After upgrading 2 nodes, the HA engine stopped and
wouldn't start.
>>>>>> Vdsm wasn't able to mount the engine storage domain, since
/dom_md/metadata
>>>>>> was missing or couldn't be accessed. Restoring this file by
getting a good
>>>>>> copy of the underlying bricks, removing the file from the
underlying bricks
>>>>>> where the file was 0 bytes and mark with the stickybit, and the
>>>>>> corresponding gfid's. Removing the file from the mount point,
and copying
>>>>>> back the file on the mount point. Manually mounting the engine
domain, and
>>>>>> manually creating the corresponding symbolic links in
/rhev/data-center and
>>>>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm
(which was
>>>>>> root.root), i was able to start the HA engine again. Since the
engine was
>>>>>> up again, and things seemed rather unstable i decided to continue
the
>>>>>> upgrade on the other nodes suspecting an incompatibility in
gluster
>>>>>> versions, i thought would be best to have them all on the same
version
>>>>>> rather soonish. However things went from bad to worse, the engine
stopped
>>>>>> again, and all vm’s stopped working as well. So on a machine
outside the
>>>>>> setup and restored a backup of the engine taken from version
4.2.8 just
>>>>>> before the upgrade. With this engine I was at least able to start
some vm’s
>>>>>> again, and finalize the upgrade. Once the upgraded, things didn’t
stabilize
>>>>>> and also lose 2 vm’s during the process due to image corruption.
After
>>>>>> figuring out gluster 5.3 had quite some issues I was as lucky to
see
>>>>>> gluster 5.5 was about to be released, on the moment the RPM’s
were
>>>>>> available I’ve installed those. This helped a lot in terms of
stability,
>>>>>> for which I’m very grateful! However the performance is
unfortunate
>>>>>> terrible, it’s about 15% of what the performance was running
gluster
>>>>>> 3.12.15. It’s strange since a simple dd shows ok performance, but
our
>>>>>> actual workload doesn’t. While I would expect the performance to
be better,
>>>>>> due to all improvements made since gluster version 3.12. Does
anybody share
>>>>>> the same experience?
>>>>>> I really hope gluster 6 will soon be tested with ovirt and
released,
>>>>>> and things start to perform and stabilize again..like the good
old days. Of
>>>>>> course when I can do anything, I’m happy to help.
>>>>>>
>>>>>
>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track
>>>>> the rebase on Gluster 6.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> I think the following short list of issues we have after the
>>>>>> migration;
>>>>>> Gluster 5.5;
>>>>>> - Poor performance for our workload (mostly write
dependent)
>>>>>> - VM’s randomly pause on unknown storage errors, which are
>>>>>> “stale file’s”. corresponding log; Lookup on shard 797 failed.
Base file
>>>>>> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle]
>>>>>> - Some files are listed twice in a directory (probably
related
>>>>>> the stale file issue?)
>>>>>> Example;
>>>>>> ls -la
>>>>>>
/rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/
>>>>>> total 3081
>>>>>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 .
>>>>>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 ..
>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55
>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55
>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018
>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease
>>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018
>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018
>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>>>>>>
>>>>>> - brick processes sometimes starts multiple times. Sometimes I’ve
5
>>>>>> brick processes for a single volume. Killing all glusterfsd’s for
the
>>>>>> volume on the machine and running gluster v start <vol>
force usually just
>>>>>> starts one after the event, from then on things look all right.
>>>>>>
>>>>>>
>>>>> May I kindly ask to open bugs on Gluster for above issues at
>>>>>
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ?
>>>>> Sahina?
>>>>>
>>>>>
>>>>>> Ovirt 4.3.2.1-1.el7
>>>>>> - All vms images ownership are changed to root.root after
the
>>>>>> vm is shutdown, probably related to;
>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only
>>>>>> scoped to the HA engine. I’m still in compatibility mode 4.2 for
the
>>>>>> cluster and for the vm’s, but upgraded to version ovirt 4.3.2
>>>>>>
>>>>>
>>>>> Ryan?
>>>>>
>>>>>
>>>>>> - The network provider is set to ovn, which is
fine..actually
>>>>>> cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
>>>>>>
>>>>>
>>>>> Miguel? Dominik?
>>>>>
>>>>>
>>>>>> - It seems on all nodes vdsm tries to get the the stats
for
>>>>>> the HA engine, which is filling the logs with (not sure if this
is new);
>>>>>> [api.virt] FINISH getStats return={'status':
{'message': "Virtual
>>>>>> machine does not exist: {'vmId':
u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}",
>>>>>> 'code': 1}} from=::1,59290,
vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9
>>>>>> (api:54)
>>>>>>
>>>>>
>>>>> Simone?
>>>>>
>>>>>
>>>>>> - It seems the package os_brick [root] managedvolume not
>>>>>> supported: Managed Volume Not Supported. Missing package
os-brick.:
>>>>>> ('Cannot import os_brick',) (caps:149) which fills the
vdsm.log, but for
>>>>>> this I also saw another message, so I suspect this will already
be resolved
>>>>>> shortly
>>>>>> - The machine I used to run the backup HA engine, doesn’t
want
>>>>>> to get removed from the hosted-engine –vm-status, not even after
running;
>>>>>> hosted-engine --clean-metadata --host-id=10 --force-clean or
hosted-engine
>>>>>> --clean-metadata --force-clean from the machine itself.
>>>>>>
>>>>>
>>>>> Simone?
>>>>>
>>>>>
>>>>>>
>>>>>> Think that's about it.
>>>>>>
>>>>>> Don’t get me wrong, I don’t want to rant, I just wanted to share
my
>>>>>> experience and see where things can made better.
>>>>>>
>>>>>
>>>>> If not already done, can you please open bugs for above issues at
>>>>>
https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Best Olaf
>>>>>> _______________________________________________
>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>> oVirt Code of Conduct:
>>>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> List Archives:
>>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMW...
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> SANDRO BONAZZOLA
>>>>>
>>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>>>>>
>>>>> Red Hat EMEA <
https://www.redhat.com/>
>>>>>
>>>>> sbonazzo(a)redhat.com
>>>>> <
https://red.ht/sig>
>>>>>
>>>> _______________________________________________
>>>> Users mailing list -- users(a)ovirt.org
>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>> oVirt Code of Conduct:
>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>> List Archives:
>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7L...
>>>>
>>>