Reinstalled Host 4.4.4 bricks failing to start

3 Node HCI - Local Storage Replica 2+1 After changing the Cluster 'Default Network Provider' from ovirt-provider-ovn to 'No Default Provider' notification (as expected) to reinstall was flagged against each host. Host 3 (Arbiter) was reinstalled but failed to Activate due to Gluster Bricks not starting. No GlusterFS process for the bricks is listed when checking Status of glusterd. Please can someone help to identify the issue and resolution. Attempted further reinstallation of the Host fails. Kind Regards Shimme

Hi everyone - I really would appreciate it if someone could look over this please. At 12:12:26 both bricks (engine & isos) are added but at 12:29:50 unable to open pid file. This was following a reinstallation via the Compute - Hosts window in oVirt Manager. [2021-02-26 12:12:24.285936] W [MSGID: 106061] [glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2021-02-26 12:12:24.287513] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2021-02-26 12:12:26.040796] I [MSGID: 106496] [glusterd-handshake.c:935:__server_getspec] 0-management: Received mount request for volume engine.bdtovirtprod03-strg.domain.com.gluster_bricks-engine-engine [2021-02-26 12:12:26.042570] I [MSGID: 106142] [glusterd-pmap.c:290:pmap_registry_bind] 0-pmap: adding brick /gluster_bricks/engine/engine on port 49152 [2021-02-26 12:12:26.042737] I [MSGID: 106496] [glusterd-handshake.c:935:__server_getspec] 0-management: Received mount request for volume isos.bdtovirtprod03-strg.domain.com.gluster_bricks-isos-isos [2021-02-26 12:12:26.043978] I [MSGID: 106496] [glusterd-handshake.c:935:__server_getspec] 0-management: Received mount request for volume shd/engine [2021-02-26 12:12:26.044199] I [MSGID: 106142] [glusterd-pmap.c:290:pmap_registry_bind] 0-pmap: adding brick /gluster_bricks/isos/isos on port 49153 [2021-02-26 12:12:26.044283] I [MSGID: 106496] [glusterd-handshake.c:935:__server_getspec] 0-management: Received mount request for volume shd/isos [2021-02-26 12:12:26.045114] I [MSGID: 106163] [glusterd-handshake.c:1433:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 70200 [2021-02-26 12:12:26.046995] I [MSGID: 106163] [glusterd-handshake.c:1433:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 70200 [2021-02-26 12:13:24.990790] I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2021-02-26 12:15:07.088792] I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2021-02-26 12:15:08.254477] I [MSGID: 106505] [glusterd-replace-brick.c:70:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2021-02-26 12:15:08.254549] I [MSGID: 106587] [glusterd-replace-brick.c:146:__glusterd_handle_replace_brick] 0-management: Received reset-brick start request. [2021-02-26 12:18:08.254818] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock timer is cancelled for volume_type isos_vol [2021-02-26 12:20:07.618217] I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2021-02-26 12:22:24.298000] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(Peer mgmt), op(--(2)), xid = 0x5, unique = 5, sent = 2021-02-26 12:12:24.294207, timeout = 600 for 10.237.8.30:24007 [2021-02-26 12:22:24.298060] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(Peer mgmt), op(--(2)), xid = 0x5, unique = 4, sent = 2021-02-26 12:12:24.292298, timeout = 600 for 10.237.8.31:24007 [2021-02-26 12:23:57.348599] W [glusterfsd.c:1596:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x814a) [0x7fac7879814a] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xfd) [0x55d2b9ebfc1d] -->/usr/sbin/glusterd(cleanup_and_exit+0x58) [0x55d2b9ebfa68] ) 0-: received signum (15), shutting down [2021-02-26 12:29:44.658217] I [MSGID: 100030] [glusterfsd.c:2867:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 7.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2021-02-26 12:29:44.659077] I [glusterfsd.c:2594:daemonize] 0-glusterfs: Pid of current running process is 4198 [2021-02-26 12:29:44.670056] I [MSGID: 106478] [glusterd.c:1426:init] 0-management: Maximum allowed open file descriptors set to 65536 [2021-02-26 12:29:44.670093] I [MSGID: 106479] [glusterd.c:1482:init] 0-management: Using /var/lib/glusterd as working directory [2021-02-26 12:29:44.670098] I [MSGID: 106479] [glusterd.c:1488:init] 0-management: Using /var/run/gluster as pid file working directory [2021-02-26 12:29:44.674080] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007) [2021-02-26 12:29:44.677700] W [MSGID: 103071] [rdma.c:4472:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2021-02-26 12:29:44.677717] W [MSGID: 103055] [rdma.c:4782:init] 0-rdma.management: Failed to initialize IB Device [2021-02-26 12:29:44.677724] W [rpc-transport.c:366:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2021-02-26 12:29:44.677792] W [rpcsvc.c:1981:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2021-02-26 12:29:44.677798] E [MSGID: 106244] [glusterd.c:1781:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2021-02-26 12:29:44.679532] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12 [2021-02-26 12:29:44.680102] I [MSGID: 106059] [glusterd.c:1865:init] 0-management: max-port override: 60999 [2021-02-26 12:29:45.975798] I [MSGID: 106513] [glusterd-store.c:2257:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 70200 [2021-02-26 12:29:45.995205] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: tier-enabled [2021-02-26 12:29:45.995735] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-0 [2021-02-26 12:29:45.995771] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-1 [2021-02-26 12:29:45.995792] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-2 [2021-02-26 12:29:46.003176] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: 769e5b75-d500-4897-8c8e-a1c0afe5bd58 [2021-02-26 12:29:46.086953] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: tier-enabled [2021-02-26 12:29:46.087071] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-0 [2021-02-26 12:29:46.087078] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-1 [2021-02-26 12:29:46.087084] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-2 [2021-02-26 12:29:46.091410] I [MSGID: 106498] [glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2021-02-26 12:29:46.091635] I [MSGID: 106498] [glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2021-02-26 12:29:46.091668] W [MSGID: 106061] [glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2021-02-26 12:29:46.091685] I [rpc-clnt.c:1014:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2021-02-26 12:29:46.093819] I [rpc-clnt.c:1014:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2021-02-26 12:29:46.093814] W [MSGID: 106061] [glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2021-02-26 12:29:46.097011] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2021-02-26 12:29:46.097399] I [MSGID: 106495] [glusterd-handler.c:2978:__glusterd_handle_getwd] 0-glusterd: Received getwd req [2021-02-26 12:29:46.179627] I [MSGID: 106495] [glusterd-handler.c:2978:__glusterd_handle_getwd] 0-glusterd: Received getwd req [2021-02-26 12:29:50.225765] I [MSGID: 106004] [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer <bdtovirtprod01-strg> (<67b5345f-dd3c-4781-8ee3-1f68e37a1e7f>), in state <Peer in Cluster>, has disconnected from glusterd. [2021-02-26 12:29:50.226371] C [MSGID: 106002] [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume engine. Stopping local bricks. [2021-02-26 12:29:50.227042] E [MSGID: 106028] [glusterd-utils.c:8665:glusterd_brick_signal] 0-glusterd: Unable to open pidfile: /var/run/gluster/vols/engine/bdtovirtprod03-strg.domain.com-gluster_bricks-engine-engine.pid [No such file or directory] [2021-02-26 12:29:50.227103] C [MSGID: 106002] [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume isos. Stopping local bricks. [2021-02-26 12:29:50.227322] E [MSGID: 106028] [glusterd-utils.c:8665:glusterd_brick_signal] 0-glusterd: Unable to open pidfile: /var/run/gluster/vols/isos/bdtovirtprod03-strg.domain.com-gluster_bricks-isos-isos.pid [No such file or directory] [2021-02-26 12:29:57.606348] I [MSGID: 106163] [glusterd-handshake.c:1433:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 70200 [2021-02-26 12:30:02.364287] I [MSGID: 106163] [glusterd-handshake.c:1433:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 70200 [2021-02-26 12:30:03.956673] I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2021-02-26 12:31:00.325661] I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2021-02-26 12:33:19.901681] I [MSGID: 106488] [glusterd-handler.c:1400:__glusterd_handle_cli_get_volume] 0-management: Received get vol req [2021-02-26 12:33:51.556143] I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2021-02-26 12:34:05.896966] I [MSGID: 106499] [glusterd-handler.c:4264:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos [2021-02-26 12:34:05.897209] W [glusterd-locks.c:579:glusterd_mgmt_v3_lock] (-->/usr/lib64/glusterfs/7.9/xlator/mgmt/glusterd.so(+0xdfd24) [0x7f588c535d24] -->/usr/lib64/glusterfs/7.9/xlator/mgmt/glusterd.so(+0xdf82b) [0x7f588c53582b] -->/usr/lib64/glusterfs/7.9/xlator/mgmt/glusterd.so(+0xe51d2) [0x7f588c53b1d2] ) 0-management: Lock for isos held by 769e5b75-d500-4897-8c8e-a1c0afe5bd58 [2021-02-26 12:34:05.897234] E [MSGID: 106118] [glusterd-syncop.c:1883:gd_sync_task_begin] 0-management: Unable to acquire lock for isos The message "I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req" repeated 2 times between [2021-02-26 12:33:51.556143] and [2021-02-26 12:35:09.074298] [2021-02-26 12:36:31.857113] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock timer is cancelled for volume_type isos_vol [2021-02-26 12:39:56.312206] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(Peer mgmt), op(--(2)), xid = 0x5, unique = 4, sent = 2021-02-26 12:29:56.307354, timeout = 600 for 10.237.8.30:24007 [2021-02-26 12:40:01.498272] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(Peer mgmt), op(--(2)), xid = 0x5, unique = 7, sent = 2021-02-26 12:30:01.493213, timeout = 600 for 10.237.8.31:24007 [2021-02-26 12:40:09.087003] I [MSGID: 106487] [glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2021-02-26 12:41:01.499083] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt), op(--(3)), xid = 0x6, unique = 12, sent = 2021-02-26 12:30:58.209581, timeout = 600 for 10.237.8.31:24007 [2021-02-26 12:41:01.499141] E [MSGID: 106152] [glusterd-syncop.c:104:gd_collate_errors] 0-glusterd: Staging failed on bdtovirtprod02-strg.domain.com. Please check log file for details. [2021-02-26 12:41:06.313130] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt), op(--(3)), xid = 0x6, unique = 11, sent = 2021-02-26 12:30:58.209559, timeout = 600 for 10.237.8.30:24007 [2021-02-26 12:41:06.313173] E [MSGID: 106152] [glusterd-syncop.c:104:gd_collate_errors] 0-glusterd: Staging failed on bdtovirtprod01-strg. Please check log file for details. [2021-02-26 12:41:06.313379] I [socket.c:3892:socket_submit_outgoing_msg] 0-socket.management: not connected (priv->connected = -1) [2021-02-26 12:41:06.313399] E [rpcsvc.c:1573:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2021-02-26 12:41:06.313452] E [MSGID: 106430] [glusterd-utils.c:553:glusterd_submit_reply] 0-glusterd: Reply submission failed [2021-02-26 12:41:11.499217] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt), op(--(3)), xid = 0x7, unique = 16, sent = 2021-02-26 12:31:09.951183, timeout = 600 for 10.237.8.31:24007 [2021-02-26 12:41:16.313265] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt), op(--(3)), xid = 0x7, unique = 15, sent = 2021-02-26 12:31:09.951163, timeout = 600 for 10.237.8.30:24007 [2021-02-26 12:41:16.313505] I [socket.c:3892:socket_submit_outgoing_msg] 0-socket.management: not connected (priv->connected = -1) [2021-02-26 12:41:16.313527] E [rpcsvc.c:1573:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2021-02-26 12:41:11.499265] E [MSGID: 106152] [glusterd-syncop.c:104:gd_collate_errors] 0-glusterd: Staging failed on bdtovirtprod02-strg.domain.com. Please check log file for details. [2021-02-26 12:41:16.313308] E [MSGID: 106152] [glusterd-syncop.c:104:gd_collate_errors] 0-glusterd: Staging failed on bdtovirtprod01-strg. Please check log file for details. [2021-02-26 12:41:16.313546] E [MSGID: 106430] [glusterd-utils.c:553:glusterd_submit_reply] 0-glusterd: Reply submission failed [2021-02-26 12:42:42.423584] W [glusterfsd.c:1596:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x814a) [0x7f589239e14a] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xfd) [0x564c16673c1d] -->/usr/sbin/glusterd(cleanup_and_exit+0x58) [0x564c16673a68] ) 0-: received signum (15), shutting down [2021-02-26 13:04:03.568312] I [MSGID: 100030] [glusterfsd.c:2867:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 7.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2021-02-26 13:04:03.569859] I [glusterfsd.c:2594:daemonize] 0-glusterfs: Pid of current running process is 34532 [2021-02-26 13:04:03.573369] I [MSGID: 106478] [glusterd.c:1426:init] 0-management: Maximum allowed open file descriptors set to 65536 [2021-02-26 13:04:03.573489] I [MSGID: 106479] [glusterd.c:1482:init] 0-management: Using /var/lib/glusterd as working directory [2021-02-26 13:04:03.573508] I [MSGID: 106479] [glusterd.c:1488:init] 0-management: Using /var/run/gluster as pid file working directory [2021-02-26 13:04:03.579026] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007) [2021-02-26 13:04:03.580716] W [MSGID: 103071] [rdma.c:4472:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2021-02-26 13:04:03.580736] W [MSGID: 103055] [rdma.c:4782:init] 0-rdma.management: Failed to initialize IB Device [2021-02-26 13:04:03.580743] W [rpc-transport.c:366:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2021-02-26 13:04:03.580826] W [rpcsvc.c:1981:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2021-02-26 13:04:03.580833] E [MSGID: 106244] [glusterd.c:1781:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2021-02-26 13:04:03.581928] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12 [2021-02-26 13:04:03.582266] I [MSGID: 106059] [glusterd.c:1865:init] 0-management: max-port override: 60999 [2021-02-26 13:04:05.129916] I [MSGID: 106513] [glusterd-store.c:2257:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 70200 [2021-02-26 13:04:05.130323] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: tier-enabled [2021-02-26 13:04:05.130687] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-0 [2021-02-26 13:04:05.130714] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-1 [2021-02-26 13:04:05.130734] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-2 [2021-02-26 13:04:05.131161] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: 769e5b75-d500-4897-8c8e-a1c0afe5bd58 [2021-02-26 13:04:05.200090] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: tier-enabled [2021-02-26 13:04:05.200190] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-0 [2021-02-26 13:04:05.200197] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-1 [2021-02-26 13:04:05.200202] W [MSGID: 106204] [glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown key: brick-2 [2021-02-26 13:04:05.201636] I [MSGID: 106498] [glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2021-02-26 13:04:05.201736] I [MSGID: 106498] [glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2021-02-26 13:04:05.201793] W [MSGID: 106061] [glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout The logs for engine and isos bricks have not been changed since this happened. Can anyone point me in the right direction. I have tried creating new bricks and selecting replace bricks but it's as if it is no longer part of the gluster cluster. Peer status shows connected and gluster starts - although it appears no longer configured as part of the original 3 nodes. Any help ASAP would be greatly appreciated. Regards Shimme
participants (1)
-
simon@justconnect.ie