[ovirt-users] vm.conf on one of the node is missing

knarra knarra at redhat.com
Thu Nov 24 14:06:37 UTC 2016


On 11/24/2016 07:27 PM, Simone Tiraboschi wrote:
>
>
> On Thu, Nov 24, 2016 at 2:39 PM, knarra <knarra at redhat.com 
> <mailto:knarra at redhat.com>> wrote:
>
>     On 11/24/2016 06:56 PM, Simone Tiraboschi wrote:
>>
>>
>>     On Thu, Nov 24, 2016 at 2:08 PM, knarra <knarra at redhat.com
>>     <mailto:knarra at redhat.com>> wrote:
>>
>>         On 11/24/2016 06:15 PM, Simone Tiraboschi wrote:
>>>
>>>
>>>         On Thu, Nov 24, 2016 at 1:26 PM, knarra <knarra at redhat.com
>>>         <mailto:knarra at redhat.com>> wrote:
>>>
>>>             Hi,
>>>
>>>                 I have three nodes with glusterfs as storage domain.
>>>             For some reason i see that vm.conf from
>>>             /var/run/ovirt-hosted-engine-ha is missing and due to
>>>             this on one of my host i see that Hosted Engine HA : Not
>>>             Active. Once i copy the file from some other node and
>>>             restart ovirt-ha-broker and ovirt-ha-agent services
>>>             everything works fine. But then this happens again. Can
>>>             some please help me identify why this happens. Below is
>>>             the log i see in ovirt-ha-agent.logs.
>>>
>>>
>>>             https://paste.fedoraproject.org/489120/79990345/
>>>             <https://paste.fedoraproject.org/489120/79990345/>
>>>
>>>
>>>         Once the engine correctly imported the hosted-engine storage
>>>         domain, a couple of OVF_STORE volumes will appear there.
>>>         Every modification to the engine VM configuration will be
>>>         written by the engine into that OVF_STORE, so all the
>>>         ovirt-ha-agent running on the hosted-engine hosts will be
>>>         able to re-start the engine VM with a coherent configuration.
>>>
>>>         Till the engine imports the hosted-engine storage domain,
>>>         ovirt-ha-agent will fall back to the initial vm.conf.
>>>
>>>         In you case the OVF_STORE volume is there,
>>>         but the agent fails extracting the engine VM configuration:
>>>         MainThread::INFO::2016-11-24
>>>         17:55:04,914::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>         Extracting Engine VM OVF from the OVF_STORE
>>>         MainThread::INFO::2016-11-24
>>>         17:55:04,919::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>         OVF_STORE volume path:
>>>         /rhev/data-center/mnt/glusterSD/10.70.36.79:_engine/27f054c3-c245-4039-b42a-c28b37043016/images/fdf49778-9a06-49c6-bf7a-a0f12425911c/8c954add-6bcf-47f8-ac2e-4c85fc3f8699
>>>         MainThread::ERROR::2016-11-24
>>>         17:55:04,928::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>         Unable to extract HEVM OVF
>>>
>>>         So it tries to rollback to the initial vm.conf, but also
>>>         that one seams to miss some values and so the agent is failing:
>>>         MainThread::ERROR::2016-11-24
>>>         17:55:04,974::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>         Error: ''Configuration value not found:
>>>         file=/var/run/ovirt-hosted-engine-ha/vm.conf, key=memSize''
>>>         - trying to restart agent
>>>
>>>         Both of the issue seams storage related, could yuo please
>>>         share your gluster logs?
>>>
>>>
>>>             Thanks
>>>
>>>             kasturi
>>>
>>>
>>         Hi Simone,
>>
>>             Below [1] is the link for the sosreports on the first two
>>         hosts. The third host has some issue. Once it is up will give
>>         the sosreport from there as well.
>>
>>
>>     And the host where you see the initial issue was the third one?
>     It is on the first host.
>
>
> It seams that host1 is failing reading from the the hosted-engine 
> storage domain:
>
> [2016-11-24 12:33:43.678467] W [MSGID: 114031] 
> [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: 
> remote operation failed. Path: / 
> (00000000-0000-0000-0000-000000000001) [Transport endpoint is not 
> connected]
> [2016-11-24 12:33:43.678747] E [rpc-clnt.c:365:saved_frames_unwind] 
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642] 
> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e] 
> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e] 
> (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4] 
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0] ))))) 
> 0-engine-client-2: forced unwinding frame type(GlusterFS 3.3) 
> op(LOOKUP(27)) called at 2016-11-24 12:33:07.495178 (xid=0x82a1c)
> [2016-11-24 12:33:43.678982] E [rpc-clnt.c:365:saved_frames_unwind] 
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642] 
> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e] 
> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e] 
> (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4] 
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0] ))))) 
> 0-engine-client-2: forced unwinding frame type(GlusterFS 3.3) 
> op(LOOKUP(27)) called at 2016-11-24 12:33:08.770637 (xid=0x82a1d)
> [2016-11-24 12:33:43.679001] W [MSGID: 114031] 
> [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: 
> remote operation failed. Path: 
> /27f054c3-c245-4039-b42a-c28b37043016/images/39960f40-4aae-4714-ba73-1637785fae7c/38fa3519-f21e-4671-8c69-d1497ff8a490 
> (1090c25b-9c90-434e-a133-faf9647cc992) [Transport endpoint is not 
> connected]
> [2016-11-24 12:33:43.679303] E [rpc-clnt.c:365:saved_frames_unwind] 
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642] 
> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e] 
> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e] 
> (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4] 
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0] ))))) 
> 0-engine-client-2: forced unwinding frame type(GlusterFS 3.3) 
> op(LOOKUP(27)) called at 2016-11-24 12:33:11.096856 (xid=0x82a1e)
> [2016-11-24 12:33:43.679596] E [rpc-clnt.c:365:saved_frames_unwind] 
> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642] 
> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e] 
> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e] 
> (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4] 
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0] ))))) 
> 0-engine-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) 
> called at 2016-11-24 12:33:13.673743 (xid=0x82a1f)
> [2016-11-24 12:33:43.682310] I [socket.c:3401:socket_submit_request] 
> 0-engine-client-2: not connected (priv->connected = 0)
> [2016-11-24 12:33:43.682328] W [rpc-clnt.c:1640:rpc_clnt_submit] 
> 0-engine-client-2: failed to submit rpc-request (XID: 0x82a20 Program: 
> GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (engine-client-2)
> [2016-11-24 12:33:43.682391] W [rpc-clnt.c:1640:rpc_clnt_submit] 
> 0-engine-client-2: failed to submit rpc-request (XID: 0x82a21 Program: 
> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (engine-client-2)
> [2016-11-24 12:33:43.682441] W [rpc-clnt.c:1640:rpc_clnt_submit] 
> 0-engine-client-2: failed to submit rpc-request (XID: 0x82a22 Program: 
> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (engine-client-2)
> [2016-11-24 12:33:43.682441] W [MSGID: 114031] 
> [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: 
> remote operation failed. Path: /27f054c3-c245-4039-b42a-c28b37043016 
> (a64398f5-3fa3-48fe-9d40-d3860876cc2c) [Transport endpoint is not 
> connected]
> [2016-11-24 12:33:43.682492] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 
> 0-engine-client-2: socket disconnected
> [2016-11-24 12:33:43.682536] I [MSGID: 114018] 
> [client.c:2280:client_rpc_notify] 0-engine-client-2: disconnected from 
> engine-client-2. Client process will keep trying to connect to 
> glusterd until brick's port is available
> [2016-11-24 12:33:43.682562] W [rpc-clnt.c:1640:rpc_clnt_submit] 
> 0-engine-client-2: failed to submit rpc-request (XID: 0x82a23 Program: 
> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (engine-client-2)
> The message "W [MSGID: 114031] 
> [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: 
> remote operation failed. Path: /27f054c3-c245-4039-b42a-c28b37043016 
> (a64398f5-3fa3-48fe-9d40-d3860876cc2c) [Transport endpoint is not 
> connected]" repeated 2 times between [2016-11-24 12:33:43.682441] and 
> [2016-11-24 12:33:43.682599]
> [2016-11-24 12:33:43.688324] W [MSGID: 114031] 
> [client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-engine-client-2: 
> remote operation failed. Path: (null) 
> (00000000-0000-0000-0000-000000000000) [Transport endpoint is not 
> connected]
>
> Before that there was a lot of self-healing activities.
>
simone, these logs indicates that first host is not able to connect to 
the brick in the third host since it was powered down.  It reads remote 
operation failed on engine_client_2 .
>
>>         [1]
>>         http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/vm_conf/
>>         <http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/vm_conf/>
>>
>>         Thanks
>>
>>         kasturi
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161124/bb6543da/attachment-0001.html>


More information about the Users mailing list