glusterfs backup-volfile-servers lost

I have a 4 node setup (centos 7) with hosted engine on glusterfs (replica 3 arbiter1). Gluster fs is like this ohost01 104G brick (real data) ohost02 104g brick (real data) ohost04 104g brick (arbiter) ohost05 104g partition used as nfs-storage. hosted engine is on gluster. I also have an fc domain of 3,6 TB mount of gluster is like this storage=172.16.224.10:/engine mnt_options=backup-volfile-servers=172.16.224.11:172.16.224.13 172.16.224.10 is ohost01 storage network 172.16.224.12 is ohost02 storage network 172.16.224.13 is ohost04 storage network Today I upgraded all nodes. I did it like this: hosted-engine was running on ohost05 at the time Put ohost04 (arbiter) on maintenance and did upgrade (ok) Same with ohost02 Ohost01 was spm so I did ohost05 spm then put ohost01 on maintenance and then upgraded it. I notticed that engine VM paused during the process (which usualy does not happen) as I have backup-volfile-servers mount option. But today I notticed that this option is ignored. On the hosts I also noted that mount is like this 172.16.224.10:/engine on /rhev/data-center/mnt/glusterSD/172.16.224.10:_engine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) so reduduncy is gone from gluster and I cannot figure out why. If I restart ohost01 (after maintenance) hosted engine get paused until ohost01 comes back up. How can I solve this issue

it seems that reduduncy of glusterfs is working. It doesn't show on mount options but it is there in the processes. This must be something else that caused the engine to pause. So ignore this. Is there a way to debug why the hosted engine paused?

Can you provide the engine mount logs under /var/log/glusterfs/rhev_data-center*engine.log and also the ovirt-ha/agent.log? On Mon, Jun 18, 2018 at 8:42 AM, <g.vasilopoulos@uoc.gr> wrote:
it seems that reduduncy of glusterfs is working. It doesn't show on mount options but it is there in the processes. This must be something else that caused the engine to pause. So ignore this. Is there a way to debug why the hosted engine paused? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/7ATGYEUZG3YZR2BMYTEMCZZ267QQ4LCW/

from glusterfs/rhev_datacenter [2018-06-18 12:32:50.854668] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on 172.16.224.10:24007 failed (No data available) [2018-06-18 12:33:38.194322] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-engine-client-0: server 172.16.224.10:49152 has not responded in the last 30 seconds, disconnecting. [2018-06-18 12:33:38.195797] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(STATFS(14)) called at 2018-06-18 12:33:07.800145 (xid=0x21be) [2018-06-18 12:33:38.195824] W [MSGID: 114031] [client-rpc-fops.c:777:client3_3_statfs_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected] [2018-06-18 12:33:38.196036] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2018-06-18 12:33:07.800158 (xid=0x21bf) [2018-06-18 12:33:38.196057] W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-engine-client-0: socket disconnected [2018-06-18 12:33:38.196224] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:08.307804 (xid=0x21c0) [2018-06-18 12:33:38.196243] W [MSGID: 114031] [client-rpc-fops.c:2922:client3_3_readv_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected] [2018-06-18 12:33:38.196279] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (6f81ab66-8991-45a2-9576-5c7defdec302) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:33:38.196545] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:08.307812 (xid=0x21c4) last part get repeated a lot of times. 2018-06-18 12:33:38.199480] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (2bddf734-3baa-439f-90e9-93b483260e01) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:33:38.199721] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:16.836613 (xid=0x21cb) this part also get repeated sometimes
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:37.337567 (xid=0x21db) The message "W [MSGID: 114031] [client-rpc-fops.c:2922:client3_3_readv_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected]" repeated 8 times between [2018-06-18 12:33:38.202531] and [2018-06-18 12:33:38.205070] [2018-06-18 12:33:38.205097] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (2bddf734-3baa-439f-90e9-93b483260e01) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:33:38.205461] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:37.337596 (xid=0x21dc)
[2018-06-18 12:33:38.208554] W [MSGID: 114031] [client-rpc-fops.c:2922:client3_3_readv_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected] [2018-06-18 12:33:38.208590] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (2bddf734-3baa-439f-90e9-93b483260e01) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:34:07.234501] E [socket.c:2369:socket_connect_finish] 0-engine-client-0: connection to 172.16.224.10:24007 failed (No route to host); disconnecting socket [2018-06-18 12:39:25.760451] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-engine-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ---------------------------------------- that's what's on gluster log on the host that engine was running . (Do you use some external site to embeed logs ? sorry I'm new on the list) agent.log is prety big so I 'll try to find some pastebin like site to link it

Adding Krutika to look at gluster logs On Mon, Jun 18, 2018 at 10:39 AM, <g.vasilopoulos@uoc.gr> wrote:
from glusterfs/rhev_datacenter [2018-06-18 12:32:50.854668] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on 172.16.224.10:24007 failed (No data available) [2018-06-18 12:33:38.194322] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-engine-client-0: server 172.16.224.10:49152 has not responded in the last 30 seconds, disconnecting. [2018-06-18 12:33:38.195797] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(STATFS(14)) called at 2018-06-18 12:33:07.800145 (xid=0x21be) [2018-06-18 12:33:38.195824] W [MSGID: 114031] [client-rpc-fops.c:777:client3_3_statfs_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected] [2018-06-18 12:33:38.196036] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2018-06-18 12:33:07.800158 (xid=0x21bf) [2018-06-18 12:33:38.196057] W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-engine-client-0: socket disconnected [2018-06-18 12:33:38.196224] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:08.307804 (xid=0x21c0) [2018-06-18 12:33:38.196243] W [MSGID: 114031] [client-rpc-fops.c:2922:client3_3_readv_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected] [2018-06-18 12:33:38.196279] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (6f81ab66-8991-45a2-9576-5c7defdec302) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:33:38.196545] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:08.307812 (xid=0x21c4)
last part get repeated a lot of times.
2018-06-18 12:33:38.199480] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (2bddf734-3baa-439f-90e9-93b483260e01) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:33:38.199721] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:16.836613 (xid=0x21cb)
this part also get repeated sometimes
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:37.337567 (xid=0x21db) The message "W [MSGID: 114031] [client-rpc-fops.c:2922:client3_3_readv_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected]" repeated 8 times between [2018-06-18 12:33:38.202531] and [2018-06-18 12:33:38.205070] [2018-06-18 12:33:38.205097] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (2bddf734-3baa-439f-90e9-93b483260e01) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:33:38.205461] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb89f2c0efb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb89f085e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb89f085f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb89f087710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb89f088200] ))))) 0-engine-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2018-06-18 12:33:37.337596 (xid=0x21dc)
[2018-06-18 12:33:38.208554] W [MSGID: 114031] [client-rpc-fops.c:2922:client3_3_readv_cbk] 0-engine-client-0: remote operation failed [Transport endpoint is not connected] [2018-06-18 12:33:38.208590] W [MSGID: 114061] [client-common.c:704:client_pre_fstat] 0-engine-client-0: (2bddf734-3baa-439f-90e9-93b483260e01) remote_fd is -1. EBADFD [File descriptor in bad state] [2018-06-18 12:34:07.234501] E [socket.c:2369:socket_connect_finish] 0-engine-client-0: connection to 172.16.224.10:24007 failed (No route to host); disconnecting socket [2018-06-18 12:39:25.760451] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-engine-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ---------------------------------------- that's what's on gluster log on the host that engine was running . (Do you use some external site to embeed logs ? sorry I'm new on the list)
agent.log is prety big so I 'll try to find some pastebin like site to link it _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/QKNGD7TZT2UVAFG6H6HCPDIJQ6ROAX74/
participants (2)
-
g.vasilopoulos@uoc.gr
-
Sahina Bose