<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 11/24/2016 07:27 PM, Simone
Tiraboschi wrote:<br>
</div>
<blockquote
cite="mid:CAN8-ONrtn9nx-m7-mBxx0=4=uKXB=mjo3shAWoscmOyXDvMiZA@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Nov 24, 2016 at 2:39 PM,
knarra <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:knarra@redhat.com" target="_blank">knarra@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<div
class="gmail-m_-2672125017527102966moz-cite-prefix">On
11/24/2016 06:56 PM, Simone Tiraboschi wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Nov 24, 2016 at
2:08 PM, knarra <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:knarra@redhat.com"
target="_blank">knarra@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>
<div
class="gmail-m_-2672125017527102966h5">
<div
class="gmail-m_-2672125017527102966m_-6170493833456119351moz-cite-prefix">On
11/24/2016 06:15 PM, Simone
Tiraboschi wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu,
Nov 24, 2016 at 1:26 PM,
knarra <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:knarra@redhat.com"
target="_blank">knarra@redhat.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
I have three nodes with
glusterfs as storage domain.
For some reason i see that
vm.conf from
/var/run/ovirt-hosted-engine-h<wbr>a
is missing and due to this
on one of my host i see that
Hosted Engine HA : Not
Active. Once i copy the file
from some other node and
restart ovirt-ha-broker and
ovirt-ha-agent services
everything works fine. But
then this happens again. Can
some please help me identify
why this happens. Below is
the log i see in
ovirt-ha-agent.logs.<br>
<br>
<br>
<a moz-do-not-send="true"
href="https://paste.fedoraproject.org/489120/79990345/"
rel="noreferrer"
target="_blank">https://paste.fedoraproject.or<wbr>g/489120/79990345/</a><br>
<br>
</blockquote>
<div><br>
</div>
Once the engine correctly
imported the hosted-engine
storage domain, a couple of
OVF_STORE volumes will appear
there.<br>
Every modification to the
engine VM configuration will
be written by the engine into
that OVF_STORE, so all the
ovirt-ha-agent running on the
hosted-engine hosts will be
able to re-start the engine VM
with a coherent configuration.<br>
<br>
Till the engine imports the
hosted-engine storage domain,
ovirt-ha-agent will fall back
to the initial vm.conf.<br>
<br>
In you case the OVF_STORE
volume is there,<br>
but the agent fails extracting
the engine VM configuration:<br>
MainThread::<a
moz-do-not-send="true"
class="gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext">INFO::2016-11-24</a>
17:55:04,914::ovf_store::112::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
Extracting Engine VM OVF from
the OVF_STORE<br>
MainThread::<a
moz-do-not-send="true"
class="gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext">INFO::2016-11-24</a>
17:55:04,919::ovf_store::119::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
OVF_STORE volume path:
/rhev/data-center/mnt/glusterS<wbr>D/10.70.36.79:_engine/27f054c3<wbr>-c245-4039-b42a-c28b37043016/<wbr>images/fdf49778-9a06-49c6-<wbr>bf7a-a0f12425911c/8c954add-<wbr>6bcf-47f8-ac2e-4c85fc3f8699<br>
MainThread::ERROR::2016-11-24
17:55:04,928::ovf_store::124::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
Unable to extract HEVM OVF<br>
<br>
So it tries to rollback to the
initial vm.conf, but also that
one seams to miss some values
and so the agent is failing:<br>
MainThread::ERROR::2016-11-24
17:55:04,974::agent::205::ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(_run_agent)
Error: ''Configuration value
not found:
file=/var/run/ovirt-hosted-eng<wbr>ine-ha/vm.conf,
key=memSize'' - trying to
restart agent<br>
<br>
Both of the issue seams
storage related, could yuo
please share your gluster
logs?<br>
<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"> <br>
Thanks<br>
<br>
kasturi<br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
<p>Hi Simone,</p>
<p> Below [1] is the link for the
sosreports on the first two hosts. The
third host has some issue. Once it is up
will give the sosreport from there as
well.</p>
</div>
</blockquote>
<div><br>
</div>
<div>And the host where you see the initial
issue was the third one? <br>
</div>
</div>
</div>
</div>
</blockquote>
</span> It is on the first host.<span class="gmail-"><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
</div>
</div>
</div>
</blockquote>
</span></div>
</blockquote>
<div><br>
</div>
<div>It seams that host1 is failing reading from the the
hosted-engine storage domain:</div>
<div><br>
</div>
<div>
<div>[2016-11-24 12:33:43.678467] W [MSGID: 114031]
[client-rpc-fops.c:2938:client3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path: /
(00000000-0000-0000-0000-000000000001) [Transport
endpoint is not connected]</div>
<div>[2016-11-24 12:33:43.678747] E
[rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
(-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
(-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
))))) 0-engine-client-2: forced unwinding frame
type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
12:33:07.495178 (xid=0x82a1c)</div>
<div>[2016-11-24 12:33:43.678982] E
[rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
(-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
(-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
))))) 0-engine-client-2: forced unwinding frame
type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
12:33:08.770637 (xid=0x82a1d)</div>
<div>[2016-11-24 12:33:43.679001] W [MSGID: 114031]
[client-rpc-fops.c:2938:client3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path:
/27f054c3-c245-4039-b42a-c28b37043016/images/39960f40-4aae-4714-ba73-1637785fae7c/38fa3519-f21e-4671-8c69-d1497ff8a490
(1090c25b-9c90-434e-a133-faf9647cc992) [Transport
endpoint is not connected]</div>
<div>[2016-11-24 12:33:43.679303] E
[rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
(-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
(-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
))))) 0-engine-client-2: forced unwinding frame
type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
12:33:11.096856 (xid=0x82a1e)</div>
<div>[2016-11-24 12:33:43.679596] E
[rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
(-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
(-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
))))) 0-engine-client-2: forced unwinding frame
type(GF-DUMP) op(NULL(2)) called at 2016-11-24
12:33:13.673743 (xid=0x82a1f)</div>
<div>[2016-11-24 12:33:43.682310] I
[socket.c:3401:socket_submit_request] 0-engine-client-2:
not connected (priv->connected = 0)</div>
<div>[2016-11-24 12:33:43.682328] W
[rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
failed to submit rpc-request (XID: 0x82a20 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport
(engine-client-2)</div>
<div>[2016-11-24 12:33:43.682391] W
[rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
failed to submit rpc-request (XID: 0x82a21 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(engine-client-2)</div>
<div>[2016-11-24 12:33:43.682441] W
[rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
failed to submit rpc-request (XID: 0x82a22 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(engine-client-2)</div>
<div>[2016-11-24 12:33:43.682441] W [MSGID: 114031]
[client-rpc-fops.c:2938:client3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path:
/27f054c3-c245-4039-b42a-c28b37043016
(a64398f5-3fa3-48fe-9d40-d3860876cc2c) [Transport
endpoint is not connected]</div>
<div>[2016-11-24 12:33:43.682492] W
[rpc-clnt-ping.c:203:rpc_clnt_ping_cbk]
0-engine-client-2: socket disconnected</div>
<div>[2016-11-24 12:33:43.682536] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-engine-client-2:
disconnected from engine-client-2. Client process will
keep trying to connect to glusterd until brick's port is
available</div>
<div>[2016-11-24 12:33:43.682562] W
[rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
failed to submit rpc-request (XID: 0x82a23 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(engine-client-2)</div>
<div>The message "W [MSGID: 114031]
[client-rpc-fops.c:2938:client3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path:
/27f054c3-c245-4039-b42a-c28b37043016
(a64398f5-3fa3-48fe-9d40-d3860876cc2c) [Transport
endpoint is not connected]" repeated 2 times between
[2016-11-24 12:33:43.682441] and [2016-11-24
12:33:43.682599]</div>
<div>[2016-11-24 12:33:43.688324] W [MSGID: 114031]
[client-rpc-fops.c:2938:client3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path: (null)
(00000000-0000-0000-0000-000000000000) [Transport
endpoint is not connected]</div>
</div>
<div><br>
</div>
<div>Before that there was a lot of self-healing activities.</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
simone, these logs indicates that first host is not able to connect
to the brick in the third host since it was powered down. It reads
remote operation failed on engine_client_2 .<br>
<blockquote
cite="mid:CAN8-ONrtn9nx-m7-mBxx0=4=uKXB=mjo3shAWoscmOyXDvMiZA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>[1] <a moz-do-not-send="true"
class="gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext"
href="http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/vm_conf/"
target="_blank">http://rhsqe-repo.lab.eng.blr.<wbr>redhat.com/sosreports/HC/vm_co<wbr>nf/</a></p>
<p>Thanks</p>
<p>kasturi<br>
</p>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<p><br>
</p>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>