New subject: Active Active Stretched Node cluster not working as expected.

19 Aug 2023

      Hi,

i have a Six node Stretched HCI cluster with 2 x 3 =6 distributed Replicated bricks, want to achieve Active - Active Disaster Recovery , if my three hosts in DC power Down or Failed my DR Site Three node should take care of running virtual machines.

During Split-brain scenario Ovirt-Hosted-Engine VM goes in PAUSE mode and not restarting on DR hosts and waited for long time, due to this my all production VMs are not restarting on DR Site.

please guide to achieve Active-Active Disaster Recovery with 6 node stretched cluster. i referred document hosted official site but did not get success, it would be great help of anybody can extend help or guide me for the same.
 https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/pdf... 

My Setup insight.

root@phnode06 /]# gluster volume status
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.1.1:/gluster_bricks/data/data  58744     0          Y       1746524
Brick 172.16.1.2:/gluster_bricks/data/data  58732     0          Y       2933385
Brick 172.16.1.3:/gluster_bricks/data/data  60442     0          Y       6158
Brick phnode04..local:/gluster_bricks
/data/data1                                 51661     0          Y       5905
Brick phnode05..local:/gluster_bricks
/data/data1                                 52177     0          Y       6178
Brick phnode06..local:/gluster_bricks
/data/data1                                 59910     0          Y       6208
Self-heal Daemon on localhost               N/A       N/A        Y       5878
Self-heal Daemon on phnode03..local   N/A       N/A        Y       6133
Self-heal Daemon on gluster01..local  N/A       N/A        Y       6072
Self-heal Daemon on phnode05..local   N/A       N/A        Y       5751
Self-heal Daemon on gluster02..local  N/A       N/A        Y       5746
Self-heal Daemon on phnode04..local   N/A       N/A        Y       4734

Task Status of Volume data
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 1c30fbfa-2707-457e-b844-ccaf2e07aa7f
Status               : completed

Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.1.1:/gluster_bricks/engine/eng
ine                                         53723     0          Y       1746535
Brick 172.16.1.2:/gluster_bricks/engine/eng
ine                                         49937     0          Y       2933395
Brick 172.16.1.3:/gluster_bricks/engine/eng
ine                                         53370     0          Y       6171
Brick phnode04..local:/gluster_bricks
/engine/engine1                             52024     0          Y       5916
Brick phnode05..local:/gluster_bricks
/engine/engine1                             51785     0          Y       6189
Brick phnode06..local:/gluster_bricks
/engine/engine1                             55443     0          Y       6221
Self-heal Daemon on localhost               N/A       N/A        Y       5878
Self-heal Daemon on gluster02..local  N/A       N/A        Y       5746
Self-heal Daemon on gluster01..local  N/A       N/A        Y       6072
Self-heal Daemon on phnode05..local   N/A       N/A        Y       5751
Self-heal Daemon on phnode03..local   N/A       N/A        Y       6133
Self-heal Daemon on phnode04..local   N/A       N/A        Y       4734

Task Status of Volume engine
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 21304de4-b100-4860-b408-5e58103080a2
Status               : completed

Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.1.1:/gluster_bricks/vmstore/vm
store                                       50244     0          Y       1746546
Brick 172.16.1.2:/gluster_bricks/vmstore/vm
store                                       51607     0          Y       2933405
Brick 172.16.1.3:/gluster_bricks/vmstore/vm
store                                       49835     0          Y       6182
Brick phnode04..local:/gluster_bricks
/vmstore/vmstore1                           54098     0          Y       5927
Brick phnode05..local:/gluster_bricks
/vmstore/vmstore1                           56565     0          Y       6200
Brick phnode06..local:/gluster_bricks
/vmstore/vmstore1                           58653     0          Y       1853456
Self-heal Daemon on localhost               N/A       N/A        Y       5878
Self-heal Daemon on gluster02..local  N/A       N/A        Y       5746
Self-heal Daemon on gluster01..local  N/A       N/A        Y       6072
Self-heal Daemon on phnode05..local   N/A       N/A        Y       5751
Self-heal Daemon on phnode03..local   N/A       N/A        Y       6133
Self-heal Daemon on phnode04..local   N/A       N/A        Y       4734

Task Status of Volume vmstore
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 7d1849f8-433a-43b1-93bc-d674a7b0aef3
Status               : completed

[root@phnode06 /]# hosted-engine --vm-status

--== Host phnode01..local (id: 1) status ==--

Host ID                            : 1
Host timestamp                     : 9113939
Score                              : 3400
Engine status                      : {"vm": "up", "health": "good", "detail": "Up"}
Hostname                           : phnode01..local
Local maintenance                  : False
stopped                            : False
crc32                              : 00979124
conf_on_shared_storage             : True
local_conf_timestamp               : 9113939
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9113939 (Sat Aug 19 14:39:41 2023)
        host-id=1
        score=3400
        vm_conf_refresh_time=9113939 (Sat Aug 19 14:39:41 2023)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False

--== Host phnode03..local (id: 2) status ==--

Host ID                            : 2
Host timestamp                     : 405629
Score                              : 3400
Engine status                      : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname                           : phnode03..local
Local maintenance                  : False
stopped                            : False
crc32                              : 0b1c7489
conf_on_shared_storage             : True
local_conf_timestamp               : 405630
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=405629 (Sat Aug 19 14:39:43 2023)
        host-id=2
        score=3400
        vm_conf_refresh_time=405630 (Sat Aug 19 14:39:43 2023)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

--== Host phnode02..local (id: 3) status ==--

Host ID                            : 3
Host timestamp                     : 9069311
Score                              : 3400
Engine status                      : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname                           : phnode02..local
Local maintenance                  : False
stopped                            : False
crc32                              : b84baa31
conf_on_shared_storage             : True
local_conf_timestamp               : 9069311
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9069311 (Sat Aug 19 14:39:41 2023)
        host-id=3
        score=3400
        vm_conf_refresh_time=9069311 (Sat Aug 19 14:39:41 2023)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

--== Host phnode04..local (id: 4) status ==--

Host ID                            : 4
Host timestamp                     : 1055893
Score                              : 3400
Engine status                      : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname                           : phnode04..local
Local maintenance                  : False
stopped                            : False
crc32                              : 58a9b84c
conf_on_shared_storage             : True
local_conf_timestamp               : 1055893
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=1055893 (Sat Aug 19 14:39:38 2023)
        host-id=4
        score=3400
        vm_conf_refresh_time=1055893 (Sat Aug 19 14:39:38 2023)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

--== Host phnode05..local (id: 5) status ==--

Host ID                            : 5
Host timestamp                     : 404187
Score                              : 3400
Engine status                      : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname                           : phnode05..local
Local maintenance                  : False
stopped                            : False
crc32                              : 48cf9186
conf_on_shared_storage             : True
local_conf_timestamp               : 404187
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=404187 (Sat Aug 19 14:39:40 2023)
        host-id=5
        score=3400
        vm_conf_refresh_time=404187 (Sat Aug 19 14:39:40 2023)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

--== Host phnode06..local (id: 6) status ==--

Host ID                            : 6
Host timestamp                     : 1055902
Score                              : 3400
Engine status                      : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname                           : phnode06..local
Local maintenance                  : False
stopped                            : False
crc32                              : 913ff6c2
conf_on_shared_storage             : True
local_conf_timestamp               : 1055902
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=1055902 (Sat Aug 19 14:39:48 2023)
        host-id=6
        score=3400
        vm_conf_refresh_time=1055902 (Sat Aug 19 14:39:48 2023)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@phnode06 /]#

Active Active Stretched Node cluster not working as expected.

gaurang.patel＠allotgroup.com

Gianluca Cecchi

Gaurang Patel

tags

participants (3)