simone, these logs indicates that first host is not able to connect to the brick in the third host since it was powered down. It reads remote operation failed on engine_client_2 .On 11/24/2016 07:27 PM, Simone Tiraboschi wrote:
On Thu, Nov 24, 2016 at 2:39 PM, knarra <knarra@redhat.com> wrote:
On 11/24/2016 06:56 PM, Simone Tiraboschi wrote:
It is on the first host.
On Thu, Nov 24, 2016 at 2:08 PM, knarra <knarra@redhat.com> wrote:
On 11/24/2016 06:15 PM, Simone Tiraboschi wrote:
On Thu, Nov 24, 2016 at 1:26 PM, knarra <knarra@redhat.com> wrote:
Hi,
I have three nodes with glusterfs as storage domain. For some reason i see that vm.conf from /var/run/ovirt-hosted-engine-ha is missing and due to this on one of my host i see that Hosted Engine HA : Not Active. Once i copy the file from some other node and restart ovirt-ha-broker and ovirt-ha-agent services everything works fine. But then this happens again. Can some please help me identify why this happens. Below is the log i see in ovirt-ha-agent.logs.
https://paste.fedoraproject.org/489120/79990345/
Once the engine correctly imported the hosted-engine storage domain, a couple of OVF_STORE volumes will appear there.
Every modification to the engine VM configuration will be written by the engine into that OVF_STORE, so all the ovirt-ha-agent running on the hosted-engine hosts will be able to re-start the engine VM with a coherent configuration.
Till the engine imports the hosted-engine storage domain, ovirt-ha-agent will fall back to the initial vm.conf.
In you case the OVF_STORE volume is there,
but the agent fails extracting the engine VM configuration:
MainThread::INFO::2016-11-24 17:55:04,914::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2016-11-24 17:55:04,919::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/glusterS D/10.70.36.79:_engine/27f054c3 -c245-4039-b42a-c28b37043016/i mages/fdf49778-9a06-49c6-bf7a- a0f12425911c/8c954add-6bcf- 47f8-ac2e-4c85fc3f8699
MainThread::ERROR::2016-11-24 17:55:04,928::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) Unable to extract HEVM OVF
So it tries to rollback to the initial vm.conf, but also that one seams to miss some values and so the agent is failing:
MainThread::ERROR::2016-11-24 17:55:04,974::agent::205::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) Error: ''Configuration value not found: file=/var/run/ovirt-hosted-eng ine-ha/vm.conf, key=memSize'' - trying to restart agent
Both of the issue seams storage related, could yuo please share your gluster logs?
Thanks
kasturi
Hi Simone,
Below [1] is the link for the sosreports on the first two hosts. The third host has some issue. Once it is up will give the sosreport from there as well.
And the host where you see the initial issue was the third one?
It seams that host1 is failing reading from the the hosted-engine storage domain:
[2016-11-24 12:33:43.678467] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: remote operation failed. Path: / (00000000-0000-0000-0000- 000000000001) [Transport endpoint is not connected] [2016-11-24 12:33:43.678747] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_ log_callingfn+0x192)[ 0x7f077eba1642] (--> /lib64/libgfrpc.so.0(saved_ frames_unwind+0x1de)[ 0x7f077e96775e] (--> /lib64/libgfrpc.so.0(saved_ frames_destroy+0xe)[ 0x7f077e96786e] (--> /lib64/libgfrpc.so.0(rpc_clnt_ connection_cleanup+0x84)[ 0x7f077e968fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_ notify+0x120)[0x7f077e9698a0] ))))) 0-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24 12:33:07.495178 (xid=0x82a1c) [2016-11-24 12:33:43.678982] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_ log_callingfn+0x192)[ 0x7f077eba1642] (--> /lib64/libgfrpc.so.0(saved_ frames_unwind+0x1de)[ 0x7f077e96775e] (--> /lib64/libgfrpc.so.0(saved_ frames_destroy+0xe)[ 0x7f077e96786e] (--> /lib64/libgfrpc.so.0(rpc_clnt_ connection_cleanup+0x84)[ 0x7f077e968fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_ notify+0x120)[0x7f077e9698a0] ))))) 0-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24 12:33:08.770637 (xid=0x82a1d) [2016-11-24 12:33:43.679001] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: remote operation failed. Path: /27f054c3-c245-4039-b42a- c28b37043016/images/39960f40- 4aae-4714-ba73-1637785fae7c/ 38fa3519-f21e-4671-8c69- d1497ff8a490 (1090c25b-9c90-434e-a133- faf9647cc992) [Transport endpoint is not connected] [2016-11-24 12:33:43.679303] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_ log_callingfn+0x192)[ 0x7f077eba1642] (--> /lib64/libgfrpc.so.0(saved_ frames_unwind+0x1de)[ 0x7f077e96775e] (--> /lib64/libgfrpc.so.0(saved_ frames_destroy+0xe)[ 0x7f077e96786e] (--> /lib64/libgfrpc.so.0(rpc_clnt_ connection_cleanup+0x84)[ 0x7f077e968fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_ notify+0x120)[0x7f077e9698a0] ))))) 0-engine-client-2: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24 12:33:11.096856 (xid=0x82a1e) [2016-11-24 12:33:43.679596] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_ log_callingfn+0x192)[ 0x7f077eba1642] (--> /lib64/libgfrpc.so.0(saved_ frames_unwind+0x1de)[ 0x7f077e96775e] (--> /lib64/libgfrpc.so.0(saved_ frames_destroy+0xe)[ 0x7f077e96786e] (--> /lib64/libgfrpc.so.0(rpc_clnt_ connection_cleanup+0x84)[ 0x7f077e968fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_ notify+0x120)[0x7f077e9698a0] ))))) 0-engine-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2016-11-24 12:33:13.673743 (xid=0x82a1f) [2016-11-24 12:33:43.682310] I [socket.c:3401:socket_submit_request] 0-engine-client-2: not connected (priv->connected = 0) [2016-11-24 12:33:43.682328] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2: failed to submit rpc-request (XID: 0x82a20 Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (engine-client-2) [2016-11-24 12:33:43.682391] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2: failed to submit rpc-request (XID: 0x82a21 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (engine-client-2) [2016-11-24 12:33:43.682441] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2: failed to submit rpc-request (XID: 0x82a22 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (engine-client-2) [2016-11-24 12:33:43.682441] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: remote operation failed. Path: /27f054c3-c245-4039-b42a- c28b37043016 (a64398f5-3fa3-48fe-9d40- d3860876cc2c) [Transport endpoint is not connected] [2016-11-24 12:33:43.682492] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-engine-client-2: socket disconnected [2016-11-24 12:33:43.682536] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-engine-client-2: disconnected from engine-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2016-11-24 12:33:43.682562] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2: failed to submit rpc-request (XID: 0x82a23 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (engine-client-2) The message "W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: remote operation failed. Path: /27f054c3-c245-4039-b42a- c28b37043016 (a64398f5-3fa3-48fe-9d40- d3860876cc2c) [Transport endpoint is not connected]" repeated 2 times between [2016-11-24 12:33:43.682441] and [2016-11-24 12:33:43.682599] [2016-11-24 12:33:43.688324] W [MSGID: 114031] [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000- 000000000000) [Transport endpoint is not connected]
Before that there was a lot of self-healing activities.