On Tue, Sep 11, 2018 at 2:13 AM, <g.vasilopoulos(a)uoc.gr> wrote:
> It seems that a vm with 3 disks boot in domain engine another disk in
> domain vol1 and a third in domain v3 became non responsive when one gluster
> host went down.
> To explain a bit the situation I have 3 glusterfs hosts with 3 volumes
> hosts are g1,g2,g3 each have 3 bricks
> g1 has vol1,vol2 and vol3 arbiter
> g2 has vol1, vol2arbiter and vol3
> g3 has vol1arb vol2 and vol3
> libgfapi is enabled . I put a host in maintenance to update the bios and
> the vm who had disks in two domain became unresponsive..
> is this normal? qemu logs showing that it tries Domain configuration
> shows host1 as primary for vol1 and host2 as primary for vol3 with the
> other two as backup-volfile servers..
> it seems it always try to connect to the server that is down and not to
> one of the alternative hosts...
> is this libgapi/libvirt problem ?
>
Yes, this is gfapi + libvirt issue. see
https://bugzilla.redhat.com/
show_bug.cgi?id=1484660 for details
> Here are some libvirt logs showing what it tries to do..
> [2018-09-10 19:43:42.876114] T [socket.c:3133:socket_connect]
> 0-vol1-client-2: connecting 0x55ed673525c0, state=2 gen=0 sock=-1
> [2018-09-10 19:43:42.876124] T [name.c:243:af_inet_client_get_remote_sockaddr]
> 0-vol1-client-2: option remote-port missing in volume vol1-client-2.
> Defaulting to 24007
> [2018-09-10 19:43:42.878566] D [socket.c:3051:socket_fix_ssl_opts]
> 0-vol1-client-2: disabling SSL for portmapper connection
> [2018-09-10 19:43:42.878770] T [socket.c:834:__socket_nodelay]
> 0-vol1-client-2: NODELAY enabled for socket 30
> [2018-09-10 19:43:42.878780] T [socket.c:920:__socket_keepalive]
> 0-vol1-client-2: Keep-alive enabled for socket: 30, (idle: 20, interval: 2,
> max-probes: 9, timeout: 0)
> [2018-09-10 19:43:42.878830] T [rpc-clnt.c:406:rpc_clnt_reconnect]
> 0-vol3-client-1: attempting reconnect
> [2018-09-10 19:43:42.878846] T [socket.c:3133:socket_connect]
> 0-vol3-client-1: connecting 0x55ed673546c0, state=2 gen=0 sock=-1
> [2018-09-10 19:43:42.878856] T [name.c:243:af_inet_client_get_remote_sockaddr]
> 0-vol3-client-1: option remote-port missing in volume vol3-client-1.
> Defaulting to 24007
> [2018-09-10 19:43:42.881229] D [socket.c:3051:socket_fix_ssl_opts]
> 0-vol3-client-1: disabling SSL for portmapper connection
> [2018-09-10 19:43:42.881255] T [socket.c:834:__socket_nodelay]
> 0-vol3-client-1: NODELAY enabled for socket 38
> [2018-09-10 19:43:42.881264] T [socket.c:920:__socket_keepalive]
> 0-vol3-client-1: Keep-alive enabled for socket: 38, (idle: 20, interval: 2,
> max-probes: 9, timeout: 0)
> [2018-09-10 19:43:45.569298] T [socket.c:724:__socket_disconnect]
> 0-vol3-client-1: disconnecting 0x55ed673546c0, state=2 gen=0 sock=38
> [2018-09-10 19:43:45.569308] T [socket.c:724:__socket_disconnect]
> 0-vol1-client-2: disconnecting 0x55ed673525c0, state=2 gen=0 sock=30
> [2018-09-10 19:43:45.570000] T [socket.c:728:__socket_disconnect] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-t
> ransport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a]
> (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/s
> ocket.so(+0x9a08)[0x7fdda7bc4a08] (-->
/lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4]
> ))))) 0-vol3-client-1: tearing down socket connection
> [2018-09-10 19:43:45.570020] D [socket.c:686:__socket_shutdown]
> 0-vol3-client-1: shutdown() returned -1. Transport endpoint is not connected
> [2018-09-10 19:43:45.570038] D [socket.c:733:__socket_disconnect]
> 0-vol3-client-1: __socket_teardown_connection () failed: Transport endpoint
> is not connected
> [2018-09-10 19:43:45.570043] D [socket.c:2474:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2018-09-10 19:43:45.570907] T [socket.c:728:__socket_disconnect] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-t
> ransport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a]
> (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/s
> ocket.so(+0x9a08)[0x7fdda7bc4a08] (-->
/lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4]
> ))))) 0-vol1-client-2: tearing down socket connection
> [2018-09-10 19:43:45.570928] D [socket.c:686:__socket_shutdown]
> 0-vol1-client-2: shutdown() returned -1. Transport endpoint is not connected
> [2018-09-10 19:43:45.570936] D [socket.c:733:__socket_disconnect]
> 0-vol1-client-2: __socket_teardown_connection () failed: Transport endpoint
> is not connected
> [2018-09-10 19:43:45.570940] D [socket.c:2474:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2018-09-10 19:43:45.570960] D
[rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b]
> (--> /lib64/libgfrp
> c.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_no
> tify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_trans
> port_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping
> timer event already removed
> [2018-09-10 19:43:45.571098] D
[rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b]
> (--> /lib64/libgfrp
> c.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_no
> tify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_trans
> port_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping
> timer event already removed
> [2018-09-10 19:43:45.878885] T [rpc-clnt.c:406:rpc_clnt_reconnect]
> 0-vol1-client-2: attempting reconnect
> [2018-09-10 19:43:45.881546] T [socket.c:834:__socket_nodelay]
> 0-vol1-client-2: NODELAY enabled for socket 38
> [2018-09-10 19:43:45.881555] T [socket.c:920:__socket_keepalive]
> 0-vol1-client-2: Keep-alive enabled for socket: 38, (idle: 20, interval: 2,
> max-probes: 9, timeout: 0)
> [2018-09-10 19:43:45.883839] D [socket.c:3051:socket_fix_ssl_opts]
> 0-vol3-client-1: disabling SSL for portmapper connection
> [2018-09-10 19:43:45.883878] T [socket.c:834:__socket_nodelay]
> 0-vol3-client-1: NODELAY enabled for socket 30
> [2018-09-10 19:43:45.883886] T [socket.c:920:__socket_keepalive]
> 0-vol3-client-1: Keep-alive enabled for socket: 30, (idle: 20, interval: 2,
> max-probes: 9, timeout: 0)
> [2018-09-10 19:43:48.575316] T [socket.c:724:__socket_disconnect]
> 0-vol3-client-1: disconnecting 0x55ed673546c0, state=2 gen=0 sock=30
> [2018-09-10 19:43:48.575329] T [socket.c:724:__socket_disconnect]
> 0-vol1-client-2: disconnecting 0x55ed673525c0, state=2 gen=0 sock=38
> [2018-09-10 19:43:48.576022] T [socket.c:728:__socket_disconnect] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08]
> (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] )))))
> 0-vol3-client-1: tearing down socket connection
> [2018-09-10 19:43:48.576045] D [socket.c:686:__socket_shutdown]
> 0-vol3-client-1: shutdown() returned -1. Transport endpoint is not connected
> [2018-09-10 19:43:48.576054] D [socket.c:733:__socket_disconnect]
> 0-vol3-client-1: __socket_teardown_connection () failed: Transport endpoint
> is not connected
> [2018-09-10 19:43:48.576059] D [socket.c:2474:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2018-09-10 19:43:48.576079] T [socket.c:728:__socket_disconnect] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08]
> (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] )))))
> 0-vol1-client-2: tearing down socket connection
> [2018-09-10 19:43:48.576099] D [socket.c:686:__socket_shutdown]
> 0-vol1-client-2: shutdown() returned -1. Transport endpoint is not connected
> [2018-09-10 19:43:48.576106] D [socket.c:733:__socket_disconnect]
> 0-vol1-client-2: __socket_teardown_connection () failed: Transport endpoint
> is not connected
> [2018-09-10 19:43:48.576111] D [socket.c:2474:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2018-09-10 19:43:48.576879] D
[rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (-->
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] )))))
> 0-: 10.xxx.xxx.130:24007: ping timer event already removed
> [2018-09-10 19:43:48.576958] D
[rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (-->
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] )))))
> 0-: 10.xxx.xxx.130:24007: ping timer event already removed
> [2018-09-10 19:43:48.881651] T [rpc-clnt.c:406:rpc_clnt_reconnect]
> 0-vol1-client-2: attempting reconnect
> [2018-09-10 19:43:48.881667] T [socket.c:3133:socket_connect]
> 0-vol1-client-2: connecting 0x55ed673525c0, state=2 gen=0 sock=-1
> [2018-09-10 19:43:48.881689] T [name.c:243:af_inet_client_get_remote_sockaddr]
> 0-vol1-client-2: option remote-port missing in volume vol1-client-2.
> Defaulting to 24007
> [2018-09-10 19:43:48.884056] T [rpc-clnt.c:406:rpc_clnt_reconnect]
> 0-vol3-client-1: attempting reconnect
> [2018-09-10 19:43:48.884072] T [socket.c:3133:socket_connect]
> 0-vol3-client-1: connecting 0x55ed673546c0, state=2 gen=0 sock=-1
> [2018-09-10 19:43:48.884084] T [name.c:243:af_inet_client_get_remote_sockaddr]
> 0-vol3-client-1: option remote-port missing in volume vol3-client-1.
> Defaulting to 24007
> [2018-09-10 19:43:48.884190] D [socket.c:3051:socket_fix_ssl_opts]
> 0-vol1-client-2: disabling SSL for portmapper connection
> [2018-09-10 19:43:48.886524] T [socket.c:834:__socket_nodelay]
> 0-vol3-client-1: NODELAY enabled for socket 30
> [2018-09-10 19:43:48.886532] T [socket.c:920:__socket_keepalive]
> 0-vol3-client-1: Keep-alive enabled for socket: 30, (idle: 20, interval: 2,
> max-probes: 9, timeout: 0)
> [2018-09-10 19:43:51.581293] T [socket.c:724:__socket_disconnect]
> 0-vol3-client-1: disconnecting 0x55ed673546c0, state=2 gen=0 sock=30
> [2018-09-10 19:43:51.581293] T [socket.c:724:__socket_disconnect]
> 0-vol1-client-2: disconnecting 0x55ed673525c0, state=2 gen=0 sock=38
> [2018-09-10 19:43:51.582009] T [socket.c:728:__socket_disconnect] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08]
> (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] )))))
> 0-vol1-client-2: tearing down socket connection
> [2018-09-10 19:43:51.582030] D [socket.c:686:__socket_shutdown]
> 0-vol1-client-2: shutdown() returned -1. Transport endpoint is not connected
> [2018-09-10 19:43:51.582036] D [socket.c:733:__socket_disconnect]
> 0-vol1-client-2: __socket_teardown_connection () failed: Transport endpoint
> is not connected
> [2018-09-10 19:43:51.582040] D [socket.c:2474:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2018-09-10 19:43:51.582084] T [socket.c:728:__socket_disconnect] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (-->
> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a]
> (-->
/usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08]
> (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] )))))
> 0-vol3-client-1: tearing down socket connection
> [2018-09-10 19:43:51.582105] D [socket.c:686:__socket_shutdown]
> 0-vol3-client-1: shutdown() returned -1. Transport endpoint is not connected
> [2018-09-10 19:43:51.582111] D [socket.c:733:__socket_disconnect]
> 0-vol3-client-1: __socket_teardown_connection () failed: Transport endpoint
> is not connected
> [2018-09-10 19:43:51.582116] D [socket.c:2474:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2018-09-10 19:43:51.582812] D
[rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (-->
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] )))))
> 0-: 10.xxx.xxx.130:24007: ping timer event already removed
> [2018-09-10 19:43:51.582865] D
[rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked]
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (-->
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] )))))
> 0-: 10.xxx.xxx.130:24007: ping timer event already removed
> [2018-09-10 19:43:51.884349] T [rpc-clnt.c:406:rpc_clnt_reconnect]
> 0-vol1-client-2: attempting reconnect
> [2018-09-10 19:43:51.884367] T [socket.c:3133:socket_connect]
> 0-vol1-client-2: connecting 0x55ed673525c0, state=2 gen=0 sock=-1
> [2018-09-10 19:43:51.884376] T [name.c:243:af_inet_client_get_remote_sockaddr]
> 0-vol1-client-2: option remote-port missing in volume vol1-client-2.
> Defaulting to 24007
> [2018-09-10 19:43:51.886644] T [rpc-clnt.c:406:rpc_clnt_reconnect]
> 0-vol3-client-1: attempting reconnect
> [2018-09-10 19:43:51.886659] T [socket.c:3133:socket_connect]
> 0-vol3-client-1: connecting 0x55ed673546c0, state=2 gen=0 sock=-1
> [2018-09-10 19:43:51.886669] T [name.c:243:af_inet_client_get_remote_sockaddr]
> 0-vol3-client-1: option remote-port missing in volume vol3-client-1.
> Defaulting to 24007
> [2018-09-10 19:43:51.887251] D [socket.c:3051:socket_fix_ssl_opts]
> 0-vol1-client-2: disabling SSL for portmapper connection
> [2018-09-10 19:43:51.887281] T [socket.c:834:__socket_nodelay]
> 0-vol1-client-2: NODELAY enabled for socket 38
> [2018-09-10 19:43:51.887290] T [socket.c:920:__socket_keepalive]
> 0-vol1-client-2: Keep-alive enabled for socket: 38, (idle: 20, interval: 2,
> max-probes: 9, timeout: 0)
> [2018-09-10 19:43:51.889141] D [socket.c:3051:socket_fix_ssl_opts]
> 0-vol3-client-1: disabling SSL for portmapper connection
> :
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/communit
> y/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archiv
> es/list/users(a)ovirt.org/message/LZ6HYGEQOPARSLOE64MJUZBML4XOLB5L/
>