Yes.

And at one time it was fine.   I did a graceful shutdown.. and after booting it always seems to now have issue with the one server... of course the one hosting the ovirt-engine :P

# Three nodes in cluster
image.png
# Error when you hover over node
image.png

# when i select node and choose "activate"
image.png


#Gluster is working fine... this is oVirt who is confused.
[root@medusa vmstore]# mount |grep media/vmstore
medusast.penguinpages.local:/vmstore on /media/vmstore type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
[root@medusa vmstore]# echo > /media/vmstore/test.out
[root@medusa vmstore]# ssh -f thor 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# ssh -f odin 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# ssh -f medusa 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# cat /media/vmstore/test.out

thor.penguinpages.local
odin.penguinpages.local
medusa.penguinpages.local


Ideas to fix oVirt?



On Tue, Sep 22, 2020 at 10:42 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
By the way, did you add the third host in the oVirt ?

If not , maybe that is the real problem :)


Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 17:23:28 Гринуич+3, Jeremey Wise <jeremey.wise@gmail.com> написа:





Its like oVirt thinks there are only two nodes in gluster replication





# Yet it is clear the CLI shows three bricks.
[root@medusa vms]# gluster volume status vmstore
Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       9444
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       3269
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore                      49154     0          Y       7841
Self-heal Daemon on localhost               N/A       N/A        Y       80152
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       141750
Self-heal Daemon on thorst.penguinpages.loc
al                                          N/A       N/A        Y       245870

Task Status of Volume vmstore
------------------------------------------------------------------------------
There are no active volume tasks



How do I get oVirt to re-establish reality to what Gluster sees?



On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 bricks up , but usually it was an UI issue and you go to UI and mark a "force start" which will try to start any bricks that were down (won't affect gluster) and will wake up the UI task to verify again brick status.
>
>
> https://github.com/gluster/gstatus is a good one to verify your cluster health , yet human's touch is priceless in any kind of technology.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise <jeremey.wise@gmail.com> написа:
>
>
>
>
>
>
>
> when I posted last..  in the tread I paste a roling restart.    And...  now it is replicating.
>
> oVirt still showing wrong.  BUT..   I did my normal test from each of the three nodes.
>
> 1) Mount Gluster file system with localhost as primary and other two as tertiary to local mount (like a client would do)
> 2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
> 3) repeat from each node then read back that all are in sync.
>
> I REALLY hate reboot (restart) as a fix.  I need to get better with root cause of gluster issues if I am going to trust it.  Before when I manually made the volumes and it was simply (vdo + gluster) then worst case was that gluster would break... but I could always go into "brick" path and copy data out.
>
> Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from simple file recovery..  Without GLUSTER AND oVirt Engine up... all my environment  and data is lost.  This means nodes moved more to "pets" then cattle.
>
> And with three nodes.. I can't afford to loose any pets. 
>
> I will post more when I get cluster settled and work on those wierd notes about quorum volumes noted on two nodes when glusterd is restarted.
>
> Thanks,
>
> On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
>> Replication issue could mean that one of the client (FUSE mounts) is not attached to all bricks.
>>
>> You can check the amount of clients via:
>> gluster volume status all client-list
>>
>>
>> As a prevention , just do a rolling restart:
>> - set a host in maintenance and mark it to stop glusterd service (I'm reffering to the UI)
>> - Activate the host , once it was moved to maintenance
>>
>> Wait for the host's HE score to recover (silver/gold crown in UI) and then proceed with the next one.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>>
>>
>>
>> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise <jeremey.wise@gmail.com> написа:
>>
>>
>>
>>
>>
>>
>> I did.
>>
>> Here are all three nodes with restart. I find it odd ... their has been a set of messages at end (see below) which I don't know enough about what oVirt laid out to know if it is bad.
>>
>> #######
>> [root@thor vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>>      Docs: man:glusterd(8)
>>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 2113 (glusterd)
>>     Tasks: 151 (limit: 1235410)
>>    Memory: 3.8G
>>       CPU: 6min 46.050s
>>    CGroup: /glusterfs.slice/glusterd.service
>>            ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>            ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu>
>>            ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid -S /var/r>
>>            ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
>>            ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
>>            └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/run/glu>
>>
>> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server...
>> Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server.
>> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.605674] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting lo>
>> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.639490] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Starting >
>> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.680665] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Starting>
>> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, discon>
>> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, disc>
>> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, dis>
>> Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 seconds, disconn>
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]#
>> [root@thor vmstore]# systemctl restart glusterd
>> [root@thor vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Tue 2020-09-22 07:24:34 EDT; 2s ago
>>      Docs: man:glusterd(8)
>>   Process: 245831 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 245832 (glusterd)
>>     Tasks: 151 (limit: 1235410)
>>    Memory: 3.8G
>>       CPU: 132ms
>>    CGroup: /glusterfs.slice/glusterd.service
>>            ├─  2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-u>
>>            ├─  9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid -S /var/>
>>            ├─  9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.>
>>            ├─  9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vm>
>>            ├─ 35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/run/gl>
>>            └─245832 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>
>> Sep 22 07:24:34 thor.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server...
>> Sep 22 07:24:34 thor.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server.
>> [root@thor vmstore]# gluster volume status
>> Status of volume: data
>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>> ------------------------------------------------------------------------------
>> Brick thorst.penguinpages.local:/gluster_br
>> icks/data/data                              49152     0          Y       9342
>> Brick odinst.penguinpages.local:/gluster_br
>> icks/data/data                              49152     0          Y       3231
>> Brick medusast.penguinpages.local:/gluster_
>> bricks/data/data                            49152     0          Y       7819
>> Self-heal Daemon on localhost               N/A       N/A        Y       245870
>> Self-heal Daemon on odinst.penguinpages.loc
>> al                                          N/A       N/A        Y       2693
>> Self-heal Daemon on medusast.penguinpages.l
>> ocal                                        N/A       N/A        Y       7863
>>
>> Task Status of Volume data
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> Status of volume: engine
>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>> ------------------------------------------------------------------------------
>> Brick thorst.penguinpages.local:/gluster_br
>> icks/engine/engine                          49153     0          Y       9433
>> Brick odinst.penguinpages.local:/gluster_br
>> icks/engine/engine                          49153     0          Y       3249
>> Brick medusast.penguinpages.local:/gluster_
>> bricks/engine/engine                        49153     0          Y       7830
>> Self-heal Daemon on localhost               N/A       N/A        Y       245870
>> Self-heal Daemon on odinst.penguinpages.loc
>> al                                          N/A       N/A        Y       2693
>> Self-heal Daemon on medusast.penguinpages.l
>> ocal                                        N/A       N/A        Y       7863
>>
>> Task Status of Volume engine
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> Status of volume: iso
>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>> ------------------------------------------------------------------------------
>> Brick thorst.penguinpages.local:/gluster_br
>> icks/iso/iso                                49155     49156      Y       35639
>> Brick odinst.penguinpages.local:/gluster_br
>> icks/iso/iso                                49155     49156      Y       21735
>> Brick medusast.penguinpages.local:/gluster_
>> bricks/iso/iso                              49155     49156      Y       21228
>> Self-heal Daemon on localhost               N/A       N/A        Y       245870
>> Self-heal Daemon on odinst.penguinpages.loc
>> al                                          N/A       N/A        Y       2693
>> Self-heal Daemon on medusast.penguinpages.l
>> ocal                                        N/A       N/A        Y       7863
>>
>> Task Status of Volume iso
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> Status of volume: vmstore
>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>> ------------------------------------------------------------------------------
>> Brick thorst.penguinpages.local:/gluster_br
>> icks/vmstore/vmstore                        49154     0          Y       9444
>> Brick odinst.penguinpages.local:/gluster_br
>> icks/vmstore/vmstore                        49154     0          Y       3269
>> Brick medusast.penguinpages.local:/gluster_
>> bricks/vmstore/vmstore                      49154     0          Y       7841
>> Self-heal Daemon on localhost               N/A       N/A        Y       245870
>> Self-heal Daemon on odinst.penguinpages.loc
>> al                                          N/A       N/A        Y       2693
>> Self-heal Daemon on medusast.penguinpages.l
>> ocal                                        N/A       N/A        Y       7863
>>
>> Task Status of Volume vmstore
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> [root@thor vmstore]# ls /gluster_bricks/vmstore/vmstore/
>> example.log  f118dcae-6162-4e9a-89e4-f30ffcfb9ccf  ns02_20200910.tgz
>> [root@thor vmstore]#
>>
>>
>> ###  VS nodes that are listed as "ok"
>>
>> [root@odin vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Mon 2020-09-21 20:41:31 EDT; 11h ago
>>      Docs: man:glusterd(8)
>>   Process: 1792 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 1818 (glusterd)
>>     Tasks: 149 (limit: 409666)
>>    Memory: 1.0G
>>       CPU: 7min 13.719s
>>    CGroup: /glusterfs.slice/glusterd.service
>>            ├─ 1818 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>            ├─ 2693 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3971f0a4d5e2fd53.socket --xlator-option *replicate*.node-uu>
>>            ├─ 3231 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id data.odinst.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/odinst.penguinpages.local-gluster_bricks-data-data.pid -S /var/r>
>>            ├─ 3249 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id engine.odinst.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/odinst.penguinpages.local-gluster_bricks-engine-engine.p>
>>            ├─ 3269 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id vmstore.odinst.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/odinst.penguinpages.local-gluster_bricks-vmstore-vms>
>>            └─21735 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id iso.odinst.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/odinst.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/run/glu>
>>
>> Sep 21 20:41:28 odin.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server...
>> Sep 21 20:41:31 odin.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server.
>> Sep 21 20:41:34 odin.penguinpages.local glusterd[1818]: [2020-09-22 00:41:34.478890] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting lo>
>> Sep 21 20:41:34 odin.penguinpages.local glusterd[1818]: [2020-09-22 00:41:34.483375] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Starting >
>> Sep 21 20:41:34 odin.penguinpages.local glusterd[1818]: [2020-09-22 00:41:34.487583] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Starting>
>> [root@odin vmstore]# systemctl restart glusterd
>> [root@odin vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Tue 2020-09-22 07:50:52 EDT; 1s ago
>>      Docs: man:glusterd(8)
>>   Process: 141691 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 141692 (glusterd)
>>     Tasks: 134 (limit: 409666)
>>    Memory: 1.0G
>>       CPU: 2.265s
>>    CGroup: /glusterfs.slice/glusterd.service
>>            ├─  3231 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id data.odinst.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/odinst.penguinpages.local-gluster_bricks-data-data.pid -S /var/>
>>            ├─  3249 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id engine.odinst.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/odinst.penguinpages.local-gluster_bricks-engine-engine.>
>>            ├─  3269 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id vmstore.odinst.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/odinst.penguinpages.local-gluster_bricks-vmstore-vm>
>>            ├─ 21735 /usr/sbin/glusterfsd -s odinst.penguinpages.local --volfile-id iso.odinst.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/odinst.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/run/gl>
>>            └─141692 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>
>> Sep 22 07:50:49 odin.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server...
>> Sep 22 07:50:52 odin.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server.
>> Sep 22 07:50:52 odin.penguinpages.local glusterd[141692]: [2020-09-22 11:50:52.964585] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting >
>> Sep 22 07:50:52 odin.penguinpages.local glusterd[141692]: [2020-09-22 11:50:52.969084] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Startin>
>> Sep 22 07:50:52 odin.penguinpages.local glusterd[141692]: [2020-09-22 11:50:52.973197] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Starti>
>> lines 1-23/23 (END)
>> [root@odin vmstore]# ls  /gluster_bricks/vmstore/vmstore/
>> example.log  f118dcae-6162-4e9a-89e4-f30ffcfb9ccf  ns02_20200910.tgz  ns02.qcow2  ns02_var.qcow2
>> [root@odin vmstore]#
>> ##################
>> [root@medusa sw2_usb_A2]#
>> [root@medusa sw2_usb_A2]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Mon 2020-09-21 20:31:29 EDT; 11h ago
>>      Docs: man:glusterd(8)
>>   Process: 1713 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 1718 (glusterd)
>>     Tasks: 153 (limit: 409064)
>>    Memory: 265.5M
>>       CPU: 10min 10.739s
>>    CGroup: /glusterfs.slice/glusterd.service
>>            ├─ 1718 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>            ├─ 7819 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id data.medusast.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/medusast.penguinpages.local-gluster_bricks-data-data.pid -S >
>>            ├─ 7830 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id engine.medusast.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/medusast.penguinpages.local-gluster_bricks-engine-en>
>>            ├─ 7841 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id vmstore.medusast.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/medusast.penguinpages.local-gluster_bricks-vmsto>
>>            ├─ 7863 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/709d753e1e04185a.socket --xlator-option *replicate*.node-uu>
>>            └─21228 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id iso.medusast.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/medusast.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/r>
>>
>> Sep 21 20:31:29 medusa.penguinpages.local glusterd[1718]: [2020-09-22 00:31:29.352090] C [MSGID: 106002] [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume engine. Stopping lo>
>> Sep 21 20:31:29 medusa.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server.
>> Sep 21 20:31:29 medusa.penguinpages.local glusterd[1718]: [2020-09-22 00:31:29.352297] C [MSGID: 106002] [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume vmstore. Stopping l>
>> Sep 21 20:32:29 medusa.penguinpages.local glusterd[1718]: [2020-09-22 00:32:29.104708] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting >
>> Sep 21 20:32:29 medusa.penguinpages.local glusterd[1718]: [2020-09-22 00:32:29.125119] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Startin>
>> Sep 21 20:32:29 medusa.penguinpages.local glusterd[1718]: [2020-09-22 00:32:29.145341] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Starti>
>> Sep 21 20:33:24 medusa.penguinpages.local glustershd[7863]: [2020-09-22 00:33:24.815657] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, disc>
>> Sep 21 20:33:24 medusa.penguinpages.local glustershd[7863]: [2020-09-22 00:33:24.817641] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, di>
>> Sep 21 20:33:24 medusa.penguinpages.local glustershd[7863]: [2020-09-22 00:33:24.821774] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, d>
>> Sep 21 20:33:36 medusa.penguinpages.local glustershd[7863]: [2020-09-22 00:33:36.819762] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 seconds, disco>
>> [root@medusa sw2_usb_A2]# systemctl restart glusterd
>> [root@medusa sw2_usb_A2]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Tue 2020-09-22 07:51:46 EDT; 2s ago
>>      Docs: man:glusterd(8)
>>   Process: 80099 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 80100 (glusterd)
>>     Tasks: 146 (limit: 409064)
>>    Memory: 207.7M
>>       CPU: 2.705s
>>    CGroup: /glusterfs.slice/glusterd.service
>>            ├─ 7819 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id data.medusast.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/medusast.penguinpages.local-gluster_bricks-data-data.pid -S >
>>            ├─ 7830 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id engine.medusast.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/medusast.penguinpages.local-gluster_bricks-engine-en>
>>            ├─ 7841 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id vmstore.medusast.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/medusast.penguinpages.local-gluster_bricks-vmsto>
>>            ├─21228 /usr/sbin/glusterfsd -s medusast.penguinpages.local --volfile-id iso.medusast.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/medusast.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/r>
>>            ├─80100 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>            └─80152 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/709d753e1e04185a.socket --xlator-option *replicate*.node-uu>
>>
>> Sep 22 07:51:43 medusa.penguinpages.local systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
>> Sep 22 07:51:43 medusa.penguinpages.local systemd[1]: glusterd.service: Found left-over process 7863 (glusterfs) in control group while starting unit. Ignoring.
>> Sep 22 07:51:43 medusa.penguinpages.local systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
>> Sep 22 07:51:43 medusa.penguinpages.local systemd[1]: glusterd.service: Found left-over process 21228 (glusterfsd) in control group while starting unit. Ignoring.
>> Sep 22 07:51:43 medusa.penguinpages.local systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
>> Sep 22 07:51:43 medusa.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server...
>> Sep 22 07:51:46 medusa.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server.
>> Sep 22 07:51:46 medusa.penguinpages.local glusterd[80100]: [2020-09-22 11:51:46.789628] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting>
>> Sep 22 07:51:46 medusa.penguinpages.local glusterd[80100]: [2020-09-22 11:51:46.807618] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Starti>
>> Sep 22 07:51:46 medusa.penguinpages.local glusterd[80100]: [2020-09-22 11:51:46.825589] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Start>
>> [root@medusa sw2_usb_A2]# ls /gluster_bricks/vmstore/vmstore/
>> example.log  f118dcae-6162-4e9a-89e4-f30ffcfb9ccf  isos  media  ns01_20200910.tgz  ns02_20200910.tgz  ns02.qcow2  ns02_var.qcow2  qemu
>>
>>
>> As for files... there is replication issues.  Not really sure how bricks show ok but it is not replicating
>>
>>
>>
>> On Tue, Sep 22, 2020 at 2:38 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
>>> Have you restarted glusterd.service on the affected node.
>>> glusterd is just management layer and it won't affect the brick processes.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>>
>>>
>>>
>>>
>>>
>>> В вторник, 22 септември 2020 г., 01:43:36 Гринуич+3, Jeremey Wise <jeremey.wise@gmail.com> написа:
>>>
>>>
>>>
>>>
>>>
>>>
>>> Start is not an option.
>>>
>>> It notes two bricks.  but command line denotes three bricks and all present
>>>
>>> [root@odin thorst.penguinpages.local:_vmstore]# gluster volume status data
>>> Status of volume: data
>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>> ------------------------------------------------------------------------------
>>> Brick thorst.penguinpages.local:/gluster_br
>>> icks/data/data                              49152     0          Y       33123
>>> Brick odinst.penguinpages.local:/gluster_br
>>> icks/data/data                              49152     0          Y       2970
>>> Brick medusast.penguinpages.local:/gluster_
>>> bricks/data/data                            49152     0          Y       2646
>>> Self-heal Daemon on localhost               N/A       N/A        Y       3004
>>> Self-heal Daemon on thorst.penguinpages.loc
>>> al                                          N/A       N/A        Y       33230
>>> Self-heal Daemon on medusast.penguinpages.l
>>> ocal                                        N/A       N/A        Y       2475
>>>
>>> Task Status of Volume data
>>> ------------------------------------------------------------------------------
>>> There are no active volume tasks
>>>
>>> [root@odin thorst.penguinpages.local:_vmstore]# gluster peer status
>>> Number of Peers: 2
>>>
>>> Hostname: thorst.penguinpages.local
>>> Uuid: 7726b514-e7c3-4705-bbc9-5a90c8a966c9
>>> State: Peer in Cluster (Connected)
>>>
>>> Hostname: medusast.penguinpages.local
>>> Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
>>> State: Peer in Cluster (Connected)
>>> [root@odin thorst.penguinpages.local:_vmstore]#
>>>
>>>
>>>
>>>
>>> On Mon, Sep 21, 2020 at 4:32 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
>>>> Just select the volume and press "start" . It will automatically mark "force start" and will fix itself.
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> В понеделник, 21 септември 2020 г., 20:53:15 Гринуич+3, Jeremey Wise <jeremey.wise@gmail.com> написа:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> oVirt engine shows  one of the gluster servers having an issue.  I did a graceful shutdown of all three nodes over weekend as I have to move around some power connections in prep for UPS.
>>>>
>>>> Came back up.. but....
>>>>
>>>>
>>>>
>>>> And this is reflected in 2 bricks online (should be three for each volume)
>>>>
>>>>
>>>> Command line shows gluster should be happy.
>>>>
>>>> [root@thor engine]# gluster peer status
>>>> Number of Peers: 2
>>>>
>>>> Hostname: odinst.penguinpages.local
>>>> Uuid: 83c772aa-33cd-430f-9614-30a99534d10e
>>>> State: Peer in Cluster (Connected)
>>>>
>>>> Hostname: medusast.penguinpages.local
>>>> Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
>>>> State: Peer in Cluster (Connected)
>>>> [root@thor engine]#
>>>>
>>>> # All bricks showing online
>>>> [root@thor engine]# gluster volume status
>>>> Status of volume: data
>>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>>> ------------------------------------------------------------------------------
>>>> Brick thorst.penguinpages.local:/gluster_br
>>>> icks/data/data                              49152     0          Y       11001
>>>> Brick odinst.penguinpages.local:/gluster_br
>>>> icks/data/data                              49152     0          Y       2970
>>>> Brick medusast.penguinpages.local:/gluster_
>>>> bricks/data/data                            49152     0          Y       2646
>>>> Self-heal Daemon on localhost               N/A       N/A        Y       50560
>>>> Self-heal Daemon on odinst.penguinpages.loc
>>>> al                                          N/A       N/A        Y       3004
>>>> Self-heal Daemon on medusast.penguinpages.l
>>>> ocal                                        N/A       N/A        Y       2475
>>>>
>>>> Task Status of Volume data
>>>> ------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>> Status of volume: engine
>>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>>> ------------------------------------------------------------------------------
>>>> Brick thorst.penguinpages.local:/gluster_br
>>>> icks/engine/engine                          49153     0          Y       11012
>>>> Brick odinst.penguinpages.local:/gluster_br
>>>> icks/engine/engine                          49153     0          Y       2982
>>>> Brick medusast.penguinpages.local:/gluster_
>>>> bricks/engine/engine                        49153     0          Y       2657
>>>> Self-heal Daemon on localhost               N/A       N/A        Y       50560
>>>> Self-heal Daemon on odinst.penguinpages.loc
>>>> al                                          N/A       N/A        Y       3004
>>>> Self-heal Daemon on medusast.penguinpages.l
>>>> ocal                                        N/A       N/A        Y       2475
>>>>
>>>> Task Status of Volume engine
>>>> ------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>> Status of volume: iso
>>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>>> ------------------------------------------------------------------------------
>>>> Brick thorst.penguinpages.local:/gluster_br
>>>> icks/iso/iso                                49156     49157      Y       151426
>>>> Brick odinst.penguinpages.local:/gluster_br
>>>> icks/iso/iso                                49156     49157      Y       69225
>>>> Brick medusast.penguinpages.local:/gluster_
>>>> bricks/iso/iso                              49156     49157      Y       45018
>>>> Self-heal Daemon on localhost               N/A       N/A        Y       50560
>>>> Self-heal Daemon on odinst.penguinpages.loc
>>>> al                                          N/A       N/A        Y       3004
>>>> Self-heal Daemon on medusast.penguinpages.l
>>>> ocal                                        N/A       N/A        Y       2475
>>>>
>>>> Task Status of Volume iso
>>>> ------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>> Status of volume: vmstore
>>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>>> ------------------------------------------------------------------------------
>>>> Brick thorst.penguinpages.local:/gluster_br
>>>> icks/vmstore/vmstore                        49154     0          Y       11023
>>>> Brick odinst.penguinpages.local:/gluster_br
>>>> icks/vmstore/vmstore                        49154     0          Y       2993
>>>> Brick medusast.penguinpages.local:/gluster_
>>>> bricks/vmstore/vmstore                      49154     0          Y       2668
>>>> Self-heal Daemon on localhost               N/A       N/A        Y       50560
>>>> Self-heal Daemon on medusast.penguinpages.l
>>>> ocal                                        N/A       N/A        Y       2475
>>>> Self-heal Daemon on odinst.penguinpages.loc
>>>> al                                          N/A       N/A        Y       3004
>>>>
>>>> Task Status of Volume vmstore
>>>> ------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>> [root@thor engine]# gluster volume heal
>>>> data     engine   iso      vmstore
>>>> [root@thor engine]# gluster volume heal data info
>>>> Brick thorst.penguinpages.local:/gluster_bricks/data/data
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> Brick odinst.penguinpages.local:/gluster_bricks/data/data
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> Brick medusast.penguinpages.local:/gluster_bricks/data/data
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> [root@thor engine]# gluster volume heal engine
>>>> Launching heal operation to perform index self heal on volume engine has been successful
>>>> Use heal info commands to check status.
>>>> [root@thor engine]# gluster volume heal engine info
>>>> Brick thorst.penguinpages.local:/gluster_bricks/engine/engine
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> Brick odinst.penguinpages.local:/gluster_bricks/engine/engine
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> Brick medusast.penguinpages.local:/gluster_bricks/engine/engine
>>>> Status: Connected
>>>> Number of entries: 0
>>>>
>>>> [root@thor engine]# gluster volume heal vmwatore info
>>>> Volume vmwatore does not exist
>>>> Volume heal failed.
>>>> [root@thor engine]#
>>>>
>>>> So not sure what to do with oVirt Engine to make it happy again.
>>>>
>>>>
>>>>
>>>> --
>>>> penguinpages
>>>> _______________________________________________
>>>> Users mailing list -- users@ovirt.org
>>>> To unsubscribe send an email to users-leave@ovirt.org
>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>>>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RRHEVF2STA5EVJ5OPRPP6FOYDMLGZM5T/
>>>>
>>>
>>>
>>> --
>>> jeremey.wise@gmail.com
>>> _______________________________________________
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-leave@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZFVGMWRMELRGEEDJTLSZCK3PRPXBXYEF/
>>>
>>
>>
>> --
>> jeremey.wise@gmail.com
>>
>
>
> --
> jeremey.wise@gmail.com
>


--
jeremey.wise@gmail.com
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FGEETDX4S5GSF75DLBBKIYW2PDGZPGQU/


--
jeremey.wise@gmail.com