On Tue, Sep 11, 2018 at 2:13 AM, <g.vasilopoulos@uoc.gr> wrote:
It seems that a vm with 3 disks boot in domain engine another disk in domain vol1 and a third in domain v3 became non responsive when one gluster host went down.
To explain a bit the situation I have 3 glusterfs hosts with 3 volumes
hosts are g1,g2,g3 each have 3 bricks
g1 has vol1,vol2 and vol3 arbiter
g2 has vol1, vol2arbiter and vol3
g3 has vol1arb vol2 and vol3
libgfapi is enabled . I put a host in maintenance to update the bios and the vm who had disks in two domain became unresponsive..
is this normal? qemu logs showing that it tries Domain configuration shows host1 as primary for vol1 and host2 as primary for vol3 with the other two as backup-volfile servers..
it seems it always try to connect to the server that is down and not to one of the alternative hosts...
is this libgapi/libvirt problem ?

Yes, this is gfapi + libvirt issue. see https://bugzilla.redhat.com/show_bug.cgi?id=1484660 for details


Here are some libvirt logs showing what it tries to do..
[2018-09-10 19:43:42.876114] T [socket.c:3133:socket_connect] 0-vol1-client-2: connecting 0x55ed673525c0, state=2 gen=0 sock=-1
[2018-09-10 19:43:42.876124] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-vol1-client-2: option remote-port missing in volume vol1-client-2. Defaulting to 24007
[2018-09-10 19:43:42.878566] D [socket.c:3051:socket_fix_ssl_opts] 0-vol1-client-2: disabling SSL for portmapper connection
[2018-09-10 19:43:42.878770] T [socket.c:834:__socket_nodelay] 0-vol1-client-2: NODELAY enabled for socket 30
[2018-09-10 19:43:42.878780] T [socket.c:920:__socket_keepalive] 0-vol1-client-2: Keep-alive enabled for socket: 30, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
[2018-09-10 19:43:42.878830] T [rpc-clnt.c:406:rpc_clnt_reconnect] 0-vol3-client-1: attempting reconnect
[2018-09-10 19:43:42.878846] T [socket.c:3133:socket_connect] 0-vol3-client-1: connecting 0x55ed673546c0, state=2 gen=0 sock=-1
[2018-09-10 19:43:42.878856] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-vol3-client-1: option remote-port missing in volume vol3-client-1. Defaulting to 24007
[2018-09-10 19:43:42.881229] D [socket.c:3051:socket_fix_ssl_opts] 0-vol3-client-1: disabling SSL for portmapper connection
[2018-09-10 19:43:42.881255] T [socket.c:834:__socket_nodelay] 0-vol3-client-1: NODELAY enabled for socket 38
[2018-09-10 19:43:42.881264] T [socket.c:920:__socket_keepalive] 0-vol3-client-1: Keep-alive enabled for socket: 38, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
[2018-09-10 19:43:45.569298] T [socket.c:724:__socket_disconnect] 0-vol3-client-1: disconnecting 0x55ed673546c0, state=2 gen=0 sock=38
[2018-09-10 19:43:45.569308] T [socket.c:724:__socket_disconnect] 0-vol1-client-2: disconnecting 0x55ed673525c0, state=2 gen=0 sock=30
[2018-09-10 19:43:45.570000] T [socket.c:728:__socket_disconnect] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /usr/lib64/glusterfs/3.12.13/rpc-t
ransport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/s
ocket.so(+0x9a08)[0x7fdda7bc4a08] (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] ))))) 0-vol3-client-1: tearing down socket connection
[2018-09-10 19:43:45.570020] D [socket.c:686:__socket_shutdown] 0-vol3-client-1: shutdown() returned -1. Transport endpoint is not connected
[2018-09-10 19:43:45.570038] D [socket.c:733:__socket_disconnect] 0-vol3-client-1: __socket_teardown_connection () failed: Transport endpoint is not connected
[2018-09-10 19:43:45.570043] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-09-10 19:43:45.570907] T [socket.c:728:__socket_disconnect] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /usr/lib64/glusterfs/3.12.13/rpc-t
ransport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/s
ocket.so(+0x9a08)[0x7fdda7bc4a08] (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] ))))) 0-vol1-client-2: tearing down socket connection
[2018-09-10 19:43:45.570928] D [socket.c:686:__socket_shutdown] 0-vol1-client-2: shutdown() returned -1. Transport endpoint is not connected
[2018-09-10 19:43:45.570936] D [socket.c:733:__socket_disconnect] 0-vol1-client-2: __socket_teardown_connection () failed: Transport endpoint is not connected
[2018-09-10 19:43:45.570940] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-09-10 19:43:45.570960] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /lib64/libgfrp
c.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f] (--> /lib64/libgfrpc.so.0(rpc_clnt_no
tify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping timer event already removed
[2018-09-10 19:43:45.571098] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /lib64/libgfrp
c.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f] (--> /lib64/libgfrpc.so.0(rpc_clnt_no
tify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping timer event already removed
[2018-09-10 19:43:45.878885] T [rpc-clnt.c:406:rpc_clnt_reconnect] 0-vol1-client-2: attempting reconnect
[2018-09-10 19:43:45.881546] T [socket.c:834:__socket_nodelay] 0-vol1-client-2: NODELAY enabled for socket 38
[2018-09-10 19:43:45.881555] T [socket.c:920:__socket_keepalive] 0-vol1-client-2: Keep-alive enabled for socket: 38, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
[2018-09-10 19:43:45.883839] D [socket.c:3051:socket_fix_ssl_opts] 0-vol3-client-1: disabling SSL for portmapper connection
[2018-09-10 19:43:45.883878] T [socket.c:834:__socket_nodelay] 0-vol3-client-1: NODELAY enabled for socket 30
[2018-09-10 19:43:45.883886] T [socket.c:920:__socket_keepalive] 0-vol3-client-1: Keep-alive enabled for socket: 30, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
[2018-09-10 19:43:48.575316] T [socket.c:724:__socket_disconnect] 0-vol3-client-1: disconnecting 0x55ed673546c0, state=2 gen=0 sock=30
[2018-09-10 19:43:48.575329] T [socket.c:724:__socket_disconnect] 0-vol1-client-2: disconnecting 0x55ed673525c0, state=2 gen=0 sock=38
[2018-09-10 19:43:48.576022] T [socket.c:728:__socket_disconnect] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08] (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] ))))) 0-vol3-client-1: tearing down socket connection
[2018-09-10 19:43:48.576045] D [socket.c:686:__socket_shutdown] 0-vol3-client-1: shutdown() returned -1. Transport endpoint is not connected
[2018-09-10 19:43:48.576054] D [socket.c:733:__socket_disconnect] 0-vol3-client-1: __socket_teardown_connection () failed: Transport endpoint is not connected
[2018-09-10 19:43:48.576059] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-09-10 19:43:48.576079] T [socket.c:728:__socket_disconnect] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08] (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] ))))) 0-vol1-client-2: tearing down socket connection
[2018-09-10 19:43:48.576099] D [socket.c:686:__socket_shutdown] 0-vol1-client-2: shutdown() returned -1. Transport endpoint is not connected
[2018-09-10 19:43:48.576106] D [socket.c:733:__socket_disconnect] 0-vol1-client-2: __socket_teardown_connection () failed: Transport endpoint is not connected
[2018-09-10 19:43:48.576111] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-09-10 19:43:48.576879] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping timer event already removed
[2018-09-10 19:43:48.576958] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping timer event already removed
[2018-09-10 19:43:48.881651] T [rpc-clnt.c:406:rpc_clnt_reconnect] 0-vol1-client-2: attempting reconnect
[2018-09-10 19:43:48.881667] T [socket.c:3133:socket_connect] 0-vol1-client-2: connecting 0x55ed673525c0, state=2 gen=0 sock=-1
[2018-09-10 19:43:48.881689] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-vol1-client-2: option remote-port missing in volume vol1-client-2. Defaulting to 24007
[2018-09-10 19:43:48.884056] T [rpc-clnt.c:406:rpc_clnt_reconnect] 0-vol3-client-1: attempting reconnect
[2018-09-10 19:43:48.884072] T [socket.c:3133:socket_connect] 0-vol3-client-1: connecting 0x55ed673546c0, state=2 gen=0 sock=-1
[2018-09-10 19:43:48.884084] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-vol3-client-1: option remote-port missing in volume vol3-client-1. Defaulting to 24007
[2018-09-10 19:43:48.884190] D [socket.c:3051:socket_fix_ssl_opts] 0-vol1-client-2: disabling SSL for portmapper connection
[2018-09-10 19:43:48.886524] T [socket.c:834:__socket_nodelay] 0-vol3-client-1: NODELAY enabled for socket 30
[2018-09-10 19:43:48.886532] T [socket.c:920:__socket_keepalive] 0-vol3-client-1: Keep-alive enabled for socket: 30, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
[2018-09-10 19:43:51.581293] T [socket.c:724:__socket_disconnect] 0-vol3-client-1: disconnecting 0x55ed673546c0, state=2 gen=0 sock=30
[2018-09-10 19:43:51.581293] T [socket.c:724:__socket_disconnect] 0-vol1-client-2: disconnecting 0x55ed673525c0, state=2 gen=0 sock=38
[2018-09-10 19:43:51.582009] T [socket.c:728:__socket_disconnect] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08] (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] ))))) 0-vol1-client-2: tearing down socket connection
[2018-09-10 19:43:51.582030] D [socket.c:686:__socket_shutdown] 0-vol1-client-2: shutdown() returned -1. Transport endpoint is not connected
[2018-09-10 19:43:51.582036] D [socket.c:733:__socket_disconnect] 0-vol1-client-2: __socket_teardown_connection () failed: Transport endpoint is not connected
[2018-09-10 19:43:51.582040] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-09-10 19:43:51.582084] T [socket.c:728:__socket_disconnect] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x4ea0)[0x7fdda7bbfea0] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x530a)[0x7fdda7bc030a] (--> /usr/lib64/glusterfs/3.12.13/rpc-transport/socket.so(+0x9a08)[0x7fdda7bc4a08] (--> /lib64/libglusterfs.so.0(+0x883c4)[0x7fddbae093c4] ))))) 0-vol3-client-1: tearing down socket connection
[2018-09-10 19:43:51.582105] D [socket.c:686:__socket_shutdown] 0-vol3-client-1: shutdown() returned -1. Transport endpoint is not connected
[2018-09-10 19:43:51.582111] D [socket.c:733:__socket_disconnect] 0-vol3-client-1: __socket_teardown_connection () failed: Transport endpoint is not connected
[2018-09-10 19:43:51.582116] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-09-10 19:43:51.582812] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping timer event already removed
[2018-09-10 19:43:51.582865] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fddbadade9b] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fddbab7828b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fddbab7460f] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fddbab75130] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fddbab70ea3] ))))) 0-: 10.xxx.xxx.130:24007: ping timer event already removed
[2018-09-10 19:43:51.884349] T [rpc-clnt.c:406:rpc_clnt_reconnect] 0-vol1-client-2: attempting reconnect
[2018-09-10 19:43:51.884367] T [socket.c:3133:socket_connect] 0-vol1-client-2: connecting 0x55ed673525c0, state=2 gen=0 sock=-1
[2018-09-10 19:43:51.884376] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-vol1-client-2: option remote-port missing in volume vol1-client-2. Defaulting to 24007
[2018-09-10 19:43:51.886644] T [rpc-clnt.c:406:rpc_clnt_reconnect] 0-vol3-client-1: attempting reconnect
[2018-09-10 19:43:51.886659] T [socket.c:3133:socket_connect] 0-vol3-client-1: connecting 0x55ed673546c0, state=2 gen=0 sock=-1
[2018-09-10 19:43:51.886669] T [name.c:243:af_inet_client_get_remote_sockaddr] 0-vol3-client-1: option remote-port missing in volume vol3-client-1. Defaulting to 24007
[2018-09-10 19:43:51.887251] D [socket.c:3051:socket_fix_ssl_opts] 0-vol1-client-2: disabling SSL for portmapper connection
[2018-09-10 19:43:51.887281] T [socket.c:834:__socket_nodelay] 0-vol1-client-2: NODELAY enabled for socket 38
[2018-09-10 19:43:51.887290] T [socket.c:920:__socket_keepalive] 0-vol1-client-2: Keep-alive enabled for socket: 38, (idle: 20, interval: 2, max-probes: 9, timeout: 0)
[2018-09-10 19:43:51.889141] D [socket.c:3051:socket_fix_ssl_opts] 0-vol3-client-1: disabling SSL for portmapper connection
:
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LZ6HYGEQOPARSLOE64MJUZBML4XOLB5L/