This is a multi-part message in MIME format.
--------------F9EAA1F1E33BF85629810339
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
On 11/25/2016 03:35 AM, Simone Tiraboschi wrote:
On Thu, Nov 24, 2016 at 3:35 PM, Simone Tiraboschi
<stirabos(a)redhat.com <mailto:stirabos@redhat.com>> wrote:
On Thu, Nov 24, 2016 at 3:23 PM, knarra <knarra(a)redhat.com
<mailto:knarra@redhat.com>> wrote:
On 11/24/2016 07:47 PM, Simone Tiraboschi wrote:
>
>
> On Thu, Nov 24, 2016 at 3:06 PM, knarra <knarra(a)redhat.com
> <mailto:knarra@redhat.com>> wrote:
>
> On 11/24/2016 07:27 PM, Simone Tiraboschi wrote:
>>
>>
>> On Thu, Nov 24, 2016 at 2:39 PM, knarra
>> <knarra(a)redhat.com <mailto:knarra@redhat.com>> wrote:
>>
>> On 11/24/2016 06:56 PM, Simone Tiraboschi wrote:
>>>
>>>
>>> On Thu, Nov 24, 2016 at 2:08 PM, knarra
>>> <knarra(a)redhat.com <mailto:knarra@redhat.com>>
wrote:
>>>
>>> On 11/24/2016 06:15 PM, Simone Tiraboschi wrote:
>>>>
>>>>
>>>> On Thu, Nov 24, 2016 at 1:26 PM, knarra
>>>> <knarra(a)redhat.com
<mailto:knarra@redhat.com>>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have three nodes with glusterfs as
>>>> storage domain. For some reason i see that
>>>> vm.conf from
>>>> /var/run/ovirt-hosted-engine-ha is missing
>>>> and due to this on one of my host i see
>>>> that Hosted Engine HA : Not Active. Once i
>>>> copy the file from some other node and
>>>> restart ovirt-ha-broker and ovirt-ha-agent
>>>> services everything works fine. But then
>>>> this happens again. Can some please help
>>>> me identify why this happens. Below is the
>>>> log i see in ovirt-ha-agent.logs.
>>>>
>>>>
>>>>
https://paste.fedoraproject.org/489120/79990345/
>>>>
<
https://paste.fedoraproject.org/489120/79990345/>
>>>>
>>>>
>>>> Once the engine correctly imported the
>>>> hosted-engine storage domain, a couple of
>>>> OVF_STORE volumes will appear there.
>>>> Every modification to the engine VM
>>>> configuration will be written by the engine
>>>> into that OVF_STORE, so all the ovirt-ha-agent
>>>> running on the hosted-engine hosts will be
>>>> able to re-start the engine VM with a coherent
>>>> configuration.
>>>>
>>>> Till the engine imports the hosted-engine
>>>> storage domain, ovirt-ha-agent will fall back
>>>> to the initial vm.conf.
>>>>
>>>> In you case the OVF_STORE volume is there,
>>>> but the agent fails extracting the engine VM
>>>> configuration:
>>>> MainThread::INFO::2016-11-24
>>>>
17:55:04,914::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>> Extracting Engine VM OVF from the OVF_STORE
>>>> MainThread::INFO::2016-11-24
>>>>
17:55:04,919::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>> OVF_STORE volume path:
>>>>
/rhev/data-center/mnt/glusterSD/10.70.36.79:_engine/27f054c3-c245-4039-b42a-c28b37043016/images/fdf49778-9a06-49c6-bf7a-a0f12425911c/8c954add-6bcf-47f8-ac2e-4c85fc3f8699
>>>> MainThread::ERROR::2016-11-24
>>>>
17:55:04,928::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>> Unable to extract HEVM OVF
>>>>
>>>> So it tries to rollback to the initial
>>>> vm.conf, but also that one seams to miss some
>>>> values and so the agent is failing:
>>>> MainThread::ERROR::2016-11-24
>>>>
17:55:04,974::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>> Error: ''Configuration value not found:
>>>> file=/var/run/ovirt-hosted-engine-ha/vm.conf,
>>>> key=memSize'' - trying to restart agent
>>>>
>>>> Both of the issue seams storage related, could
>>>> yuo please share your gluster logs?
>>>>
>>>>
>>>> Thanks
>>>>
>>>> kasturi
>>>>
>>>>
>>> Hi Simone,
>>>
>>> Below [1] is the link for the sosreports on
>>> the first two hosts. The third host has some
>>> issue. Once it is up will give the sosreport
>>> from there as well.
>>>
>>>
>>> And the host where you see the initial issue was
>>> the third one?
>> It is on the first host.
>>
>>
>> It seams that host1 is failing reading from the the
>> hosted-engine storage domain:
>>
>> [2016-11-24 12:33:43.678467] W [MSGID: 114031]
>> [client-rpc-fops.c:2938:client3_3_lookup_cbk]
>> 0-engine-client-2: remote operation failed. Path: /
>> (00000000-0000-0000-0000-000000000001) [Transport
>> endpoint is not connected]
>> [2016-11-24 12:33:43.678747] E
>> [rpc-clnt.c:365:saved_frames_unwind] (-->
>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
>> (-->
>>
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
>> (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
>> ))))) 0-engine-client-2: forced unwinding frame
>> type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
>> 12:33:07.495178 (xid=0x82a1c)
>> [2016-11-24 12:33:43.678982] E
>> [rpc-clnt.c:365:saved_frames_unwind] (-->
>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
>> (-->
>>
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
>> (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
>> ))))) 0-engine-client-2: forced unwinding frame
>> type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
>> 12:33:08.770637 (xid=0x82a1d)
>> [2016-11-24 12:33:43.679001] W [MSGID: 114031]
>> [client-rpc-fops.c:2938:client3_3_lookup_cbk]
>> 0-engine-client-2: remote operation failed. Path:
>>
/27f054c3-c245-4039-b42a-c28b37043016/images/39960f40-4aae-4714-ba73-1637785fae7c/38fa3519-f21e-4671-8c69-d1497ff8a490
>> (1090c25b-9c90-434e-a133-faf9647cc992) [Transport
>> endpoint is not connected]
>> [2016-11-24 12:33:43.679303] E
>> [rpc-clnt.c:365:saved_frames_unwind] (-->
>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
>> (-->
>>
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
>> (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
>> ))))) 0-engine-client-2: forced unwinding frame
>> type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
>> 12:33:11.096856 (xid=0x82a1e)
>> [2016-11-24 12:33:43.679596] E
>> [rpc-clnt.c:365:saved_frames_unwind] (-->
>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f077eba1642]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f077e96775e]
>> (-->
>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f077e96786e]
>> (-->
>>
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f077e968fc4]
>> (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f077e9698a0]
>> ))))) 0-engine-client-2: forced unwinding frame
>> type(GF-DUMP) op(NULL(2)) called at 2016-11-24
>> 12:33:13.673743 (xid=0x82a1f)
>> [2016-11-24 12:33:43.682310] I
>> [socket.c:3401:socket_submit_request] 0-engine-client-2:
>> not connected (priv->connected = 0)
>> [2016-11-24 12:33:43.682328] W
>> [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
>> failed to submit rpc-request (XID: 0x82a20 Program:
>> GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport
>> (engine-client-2)
>> [2016-11-24 12:33:43.682391] W
>> [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
>> failed to submit rpc-request (XID: 0x82a21 Program:
>> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>> (engine-client-2)
>> [2016-11-24 12:33:43.682441] W
>> [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
>> failed to submit rpc-request (XID: 0x82a22 Program:
>> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>> (engine-client-2)
>> [2016-11-24 12:33:43.682441] W [MSGID: 114031]
>> [client-rpc-fops.c:2938:client3_3_lookup_cbk]
>> 0-engine-client-2: remote operation failed. Path:
>> /27f054c3-c245-4039-b42a-c28b37043016
>> (a64398f5-3fa3-48fe-9d40-d3860876cc2c) [Transport
>> endpoint is not connected]
>> [2016-11-24 12:33:43.682492] W
>> [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk]
>> 0-engine-client-2: socket disconnected
>> [2016-11-24 12:33:43.682536] I [MSGID: 114018]
>> [client.c:2280:client_rpc_notify] 0-engine-client-2:
>> disconnected from engine-client-2. Client process will
>> keep trying to connect to glusterd until brick's port is
>> available
>> [2016-11-24 12:33:43.682562] W
>> [rpc-clnt.c:1640:rpc_clnt_submit] 0-engine-client-2:
>> failed to submit rpc-request (XID: 0x82a23 Program:
>> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>> (engine-client-2)
>> The message "W [MSGID: 114031]
>> [client-rpc-fops.c:2938:client3_3_lookup_cbk]
>> 0-engine-client-2: remote operation failed. Path:
>> /27f054c3-c245-4039-b42a-c28b37043016
>> (a64398f5-3fa3-48fe-9d40-d3860876cc2c) [Transport
>> endpoint is not connected]" repeated 2 times between
>> [2016-11-24 12:33:43.682441] and [2016-11-24
>> 12:33:43.682599]
>> [2016-11-24 12:33:43.688324] W [MSGID: 114031]
>> [client-rpc-fops.c:2938:client3_3_lookup_cbk]
>> 0-engine-client-2: remote operation failed. Path: (null)
>> (00000000-0000-0000-0000-000000000000) [Transport
>> endpoint is not connected]
>>
>> Before that there was a lot of self-healing activities.
>>
> simone, these logs indicates that first host is not able
> to connect to the brick in the third host since it was
> powered down. It reads remote operation failed on
> engine_client_2 .
>
>
> Can you please share the output of
> source /etc/ovirt-hosted-engine/hosted-engine.conf
> find /rhev/data-center/ -path
> "*/${sdUUID}/images/${conf_image_UUID}/${conf_volume_UUID}"
> -type f -exec sh -c 'sudo -u vdsm dd if=$1 2>/dev/null | tar
> -xOvf - vm.conf 2>/dev/null' {} {} \;
> executed on your first host?
output for the above command
https://paste.fedoraproject.org/489159/97176147/
<
https://paste.fedoraproject.org/489159/97176147/>
This one looks fine,
now the point is why the OVF_STORE doesn't seams so,
Could you please try executing:
sudo -u vdsm dd
if=/rhev/data-center/mnt/glusterSD/10.70.36.79:_engine/27f054c3-c245-4039-b42a-c28b37043016/images/fdf49778-9a06-49c6-bf7a-a0f12425911c/8c954add-6bcf-47f8-ac2e-4c85fc3f8699
|
tar -xOvf - 21e0e248-19bf-47b3-b72f-6a3740d9ff43.ovf
OK, found: it's definitively a regression introduced (just on master)
with
https://gerrit.ovirt.org/#/c/66103/
The code that read and writes the HE configuration and the OVF_STORE
from/to the volume on shared storage (with dd) is shared between
ovirt-hosted-engine-setup (which runs as root) and ovirt-ha-agent
(which runs as vdsm user).
In order to behave correctly (against root squash on NFC for
instance), it was always running 'sudo -u vdsm dd ...' instead of just
'dd' regardless of the active user.
But since
https://gerrit.ovirt.org/#/c/66103/ , vdsm user cannot run
anymore 'sudo -u vdsm dd ...' and so this issue.
I opened a bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1398443 Is it the same reason why
vm.conf file is missing from
/var/run/ovirt-hosted-engine-ha folder ?
Kasturi, you got it just on one of your hosts since the other two run
an older VDSM build.
Simone, i ran the following steps and i see that vm.conf file
is missing.
1) Installed three nodes with glusterfs as storage domains.
2) Configured power management on all the nodes and ran the
command 'poweroff -f' on one of the node.
3) Engine tried to fence the host and it logged the message
'Host is rebooting' but this never came up.
4) After some time i manually powered on the host and once the
host is up i see that Hosted Engine HA : Not active.
5) I see the following error message in broker.log and vm.conf
file is missing.
Thread-1::ERROR::2016-11-24
19:32:11,651::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Error handling request, data: 'set-storage-domain
FilesystemBackend dom_type=glusterfs
sd_uuid=27f054c3-c245-4039-b42a-c28b37043016'
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 166, in handle
data)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 299, in _dispatch
.set_storage_domain(client, sd_type, **options)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 66, in set_storage_domain
self._backends[client].connect()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 461, in connect
self._dom_type)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 108, in get_domain_path
" in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain
27f054c3-c245-4039-b42a-c28b37043016 not found in
/rhev/data-center/mnt/glusterSD
Thanks
kasturi
>
>
>>
>>> [1]
>>>
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/vm_conf/
>>>
<
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/vm_conf/>
>>>
>>> Thanks
>>>
>>> kasturi
>>>
>>>
>>
>>
>
>
--------------F9EAA1F1E33BF85629810339
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 11/25/2016 03:35 AM, Simone
Tiraboschi wrote:<br>
</div>
<blockquote
cite="mid:CAN8-ONo7LBndPE9PCkai5mNWONEVZwFC-EjiMHuc-q3n2=wzgw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Nov 24, 2016 at 3:35 PM,
Simone Tiraboschi <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:stirabos@redhat.com"
target="_blank">stirabos(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div class="gmail-h5">On Thu, Nov 24, 2016 at 3:23
PM, knarra <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:knarra@redhat.com"
target="_blank">knarra(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>
<div
class="gmail-m_8283562645140910209gmail-h5">
<div
class="gmail-m_8283562645140910209gmail-m_4061716222045576993moz-cite-prefix">On
11/24/2016 07:47 PM, Simone Tiraboschi
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu,
Nov 24, 2016 at 3:06 PM, knarra
<span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:knarra@redhat.com"
target="_blank">knarra(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>
<div
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-h5">
<div
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601moz-cite-prefix">On
11/24/2016 07:27 PM,
Simone Tiraboschi
wrote:<br>
</div>
<blockquote type="cite">
<div
dir="ltr"><br>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
Thu, Nov 24,
2016 at 2:39 PM,
knarra <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:knarra@redhat.com"
target="_blank">knarra(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF"><span
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-">
<div
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-m_-2672125017527102966moz-cite-prefix">On
11/24/2016
06:56 PM,
Simone
Tiraboschi
wrote:<br>
</div>
<blockquote
type="cite">
<div
dir="ltr"><br>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
Thu, Nov 24,
2016 at 2:08
PM, knarra <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:knarra@redhat.com"
target="_blank">knarra(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">
<div>
<div
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-m_-2672125017527102966h5">
<div
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-cite-prefix">On
11/24/2016
06:15 PM,
Simone
Tiraboschi
wrote:<br>
</div>
<blockquote
type="cite">
<div
dir="ltr"><br>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
Thu, Nov 24,
2016 at 1:26
PM, knarra <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:knarra@redhat.com"
target="_blank">knarra(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
I have
three nodes
with glusterfs
as storage
domain. For
some reason i
see that
vm.conf from
/var/run/ovirt-hosted-engine-h<wbr>a
is missing and
due to this on
one of my host
i see that
Hosted Engine
HA : Not
Active. Once i
copy the file
from some
other node and
restart
ovirt-ha-broker
and
ovirt-ha-agent
services
everything
works fine.
But then this
happens again.
Can some
please help me
identify why
this happens.
Below is the
log i see in
ovirt-ha-agent.logs.<br>
<br>
<br>
<a
moz-do-not-send="true"
href="https://paste.fedoraproject.org/489120/79990345/"
rel="noreferrer"
target="_blank">https://paste.fedoraproject.or<wbr>g/489120/79990345/</a><br>
<br>
</blockquote>
<div><br>
</div>
Once the
engine
correctly
imported the
hosted-engine
storage
domain, a
couple of
OVF_STORE
volumes will
appear there.<br>
Every
modification
to the engine
VM
configuration
will be
written by the
engine into
that
OVF_STORE, so
all the
ovirt-ha-agent
running on the
hosted-engine
hosts will be
able to
re-start the
engine VM with
a coherent
configuration.<br>
<br>
Till the
engine imports
the
hosted-engine
storage
domain,
ovirt-ha-agent
will fall back
to the initial
vm.conf.<br>
<br>
In you case
the OVF_STORE
volume is
there,<br>
but the agent
fails
extracting the
engine VM
configuration:<br>
MainThread::<a
moz-do-not-send="true"
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext">INFO::2016-11-24</a>
17:55:04,914::ovf_store::112::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
Extracting
Engine VM OVF
from the
OVF_STORE<br>
MainThread::<a
moz-do-not-send="true"
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext">INFO::2016-11-24</a>
17:55:04,919::ovf_store::119::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
OVF_STORE
volume path:
/rhev/data-center/mnt/glusterS<wbr>D/10.70.36.79:_engine/27f054c3<wbr>-c245-4039-b42a-c28b37043016/i<wbr>mages/fdf49778-9a06-49c6-bf7a-<wbr>a0f12425911c/8c954add-6bcf-47f<wbr>8-ac2e-4c85fc3f8699<br>
MainThread::ERROR::2016-11-24
17:55:04,928::ovf_store::124::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
Unable to
extract HEVM
OVF<br>
<br>
So it tries to
rollback to
the initial
vm.conf, but
also that one
seams to miss
some values
and so the
agent is
failing:<br>
MainThread::ERROR::2016-11-24
17:55:04,974::agent::205::ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(_run_agent)
Error:
''Configuration
value not
found:
file=/var/run/ovirt-hosted-eng<wbr>ine-ha/vm.conf,
key=memSize''
- trying to
restart agent<br>
<br>
Both of the
issue seams
storage
related, could
yuo please
share your
gluster logs?<br>
<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<br>
Thanks<br>
<br>
kasturi<br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
<p>Hi Simone,</p>
<p> Below
[1] is the
link for the
sosreports on
the first two
hosts. The
third host has
some issue.
Once it is up
will give the
sosreport from
there as well.</p>
</div>
</blockquote>
<div><br>
</div>
<div>And the
host where you
see the
initial issue
was the third
one? <br>
</div>
</div>
</div>
</div>
</blockquote>
</span> It is
on the first
host.<span
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-"><br>
<blockquote
type="cite">
<div dir="ltr">
<div
class="gmail_extra">
<div
class="gmail_quote">
<div> </div>
</div>
</div>
</div>
</blockquote>
</span></div>
</blockquote>
<div><br>
</div>
<div>It seams
that host1 is
failing
reading from
the the
hosted-engine
storage
domain:</div>
<div><br>
</div>
<div>
<div>[2016-11-24
12:33:43.678467] W [MSGID: 114031]
[client-rpc-fops.c:2938:client<wbr>3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path: /
(00000000-0000-0000-0000-00000<wbr>0000001)
[Transport
endpoint is
not connected]</div>
<div>[2016-11-24
12:33:43.678747] E [rpc-clnt.c:365:saved_frames_u<wbr>nwind] (-->
/lib64/libglusterfs.so.0(_gf_l<wbr>og_callingfn+0x192)[0x7f077eba<wbr>1642]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_unwind+0x1de)[0x7f077e9677<wbr>5e]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_destroy+0xe)[0x7f077e96786<wbr>e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[0x7f0<wbr>77e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
)))))
0-engine-client-2:
forced
unwinding
frame
type(GlusterFS
3.3)
op(LOOKUP(27))
called at
2016-11-24
12:33:07.495178
(xid=0x82a1c)</div>
<div>[2016-11-24
12:33:43.678982] E [rpc-clnt.c:365:saved_frames_u<wbr>nwind] (-->
/lib64/libglusterfs.so.0(_gf_l<wbr>og_callingfn+0x192)[0x7f077eba<wbr>1642]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_unwind+0x1de)[0x7f077e9677<wbr>5e]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_destroy+0xe)[0x7f077e96786<wbr>e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[0x7f0<wbr>77e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
)))))
0-engine-client-2:
forced
unwinding
frame
type(GlusterFS
3.3)
op(LOOKUP(27))
called at
2016-11-24
12:33:08.770637
(xid=0x82a1d)</div>
<div>[2016-11-24
12:33:43.679001] W [MSGID: 114031]
[client-rpc-fops.c:2938:client<wbr>3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path:
/27f054c3-c245-4039-b42a-c28b3<wbr>7043016/images/39960f40-4aae-4<wbr>714-ba73-1637785fae7c/38fa3519<wbr>-f21e-4671-8c69-d1497ff8a490
(1090c25b-9c90-434e-a133-faf96<wbr>47cc992) [Transport endpoint is not
connected]</div>
<div>[2016-11-24
12:33:43.679303] E [rpc-clnt.c:365:saved_frames_u<wbr>nwind] (-->
/lib64/libglusterfs.so.0(_gf_l<wbr>og_callingfn+0x192)[0x7f077eba<wbr>1642]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_unwind+0x1de)[0x7f077e9677<wbr>5e]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_destroy+0xe)[0x7f077e96786<wbr>e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[0x7f0<wbr>77e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
)))))
0-engine-client-2:
forced
unwinding
frame
type(GlusterFS
3.3)
op(LOOKUP(27))
called at
2016-11-24
12:33:11.096856
(xid=0x82a1e)</div>
<div>[2016-11-24
12:33:43.679596] E [rpc-clnt.c:365:saved_frames_u<wbr>nwind] (-->
/lib64/libglusterfs.so.0(_gf_l<wbr>og_callingfn+0x192)[0x7f077eba<wbr>1642]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_unwind+0x1de)[0x7f077e9677<wbr>5e]
(-->
/lib64/libgfrpc.so.0(saved_fra<wbr>mes_destroy+0xe)[0x7f077e96786<wbr>e]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[0x7f0<wbr>77e968fc4]
(-->
/lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
)))))
0-engine-client-2:
forced
unwinding
frame
type(GF-DUMP)
op(NULL(2))
called at
2016-11-24
12:33:13.673743
(xid=0x82a1f)</div>
<div>[2016-11-24
12:33:43.682310] I [socket.c:3401:socket_submit_r<wbr>equest]
0-engine-client-2:
not connected
(priv->connected = 0)</div>
<div>[2016-11-24
12:33:43.682328] W [rpc-clnt.c:1640:rpc_clnt_subm<wbr>it]
0-engine-client-2:
failed to
submit
rpc-request
(XID: 0x82a20
Program:
GlusterFS 3.3,
ProgVers: 330,
Proc: 30) to
rpc-transport
(engine-client-2)</div>
<div>[2016-11-24
12:33:43.682391] W [rpc-clnt.c:1640:rpc_clnt_subm<wbr>it]
0-engine-client-2:
failed to
submit
rpc-request
(XID: 0x82a21
Program:
GlusterFS 3.3,
ProgVers: 330,
Proc: 27) to
rpc-transport
(engine-client-2)</div>
<div>[2016-11-24
12:33:43.682441] W [rpc-clnt.c:1640:rpc_clnt_subm<wbr>it]
0-engine-client-2:
failed to
submit
rpc-request
(XID: 0x82a22
Program:
GlusterFS 3.3,
ProgVers: 330,
Proc: 27) to
rpc-transport
(engine-client-2)</div>
<div>[2016-11-24
12:33:43.682441] W [MSGID: 114031]
[client-rpc-fops.c:2938:client<wbr>3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path:
/27f054c3-c245-4039-b42a-c28b3<wbr>7043016
(a64398f5-3fa3-48fe-9d40-d3860<wbr>876cc2c) [Transport endpoint is not
connected]</div>
<div>[2016-11-24
12:33:43.682492] W [rpc-clnt-ping.c:203:rpc_clnt_<wbr>ping_cbk]
0-engine-client-2:
socket
disconnected</div>
<div>[2016-11-24
12:33:43.682536] I [MSGID: 114018] [client.c:2280:client_rpc_noti<wbr>fy]
0-engine-client-2: disconnected from engine-client-2. Client process
will keep
trying to
connect to
glusterd until
brick's port
is available</div>
<div>[2016-11-24
12:33:43.682562] W [rpc-clnt.c:1640:rpc_clnt_subm<wbr>it]
0-engine-client-2:
failed to
submit
rpc-request
(XID: 0x82a23
Program:
GlusterFS 3.3,
ProgVers: 330,
Proc: 27) to
rpc-transport
(engine-client-2)</div>
<div>The
message "W
[MSGID:
114031]
[client-rpc-fops.c:2938:client<wbr>3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path:
/27f054c3-c245-4039-b42a-c28b3<wbr>7043016
(a64398f5-3fa3-48fe-9d40-d3860<wbr>876cc2c) [Transport endpoint is not
connected]"
repeated 2
times between
[2016-11-24
12:33:43.682441]
and
[2016-11-24
12:33:43.682599]</div>
<div>[2016-11-24
12:33:43.688324] W [MSGID: 114031]
[client-rpc-fops.c:2938:client<wbr>3_3_lookup_cbk]
0-engine-client-2: remote operation failed. Path: (null)
(00000000-0000-0000-0000-00000<wbr>0000000)
[Transport
endpoint is
not connected]</div>
</div>
<div><br>
</div>
<div>Before that
there was a
lot of
self-healing
activities.</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
simone, these logs indicates
that first host is not able
to connect to the brick in
the third host since it was
powered down. It reads
remote operation failed on
engine_client_2 .</div>
</blockquote>
<div><br>
</div>
<div>Can you please share the
output of</div>
source
/etc/ovirt-hosted-engine/hoste<wbr>d-engine.conf<br>
find /rhev/data-center/ -path
"*/${sdUUID}/images/${conf_ima<wbr>ge_UUID}/${conf_volume_UUID}"
-type f -exec sh -c 'sudo -u
vdsm dd if=$1 2>/dev/null |
tar -xOvf - vm.conf
2>/dev/null' {} {} \;
<div>executed on your first
host?</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
output for the above command <a
moz-do-not-send="true"
class="gmail-m_8283562645140910209gmail-m_4061716222045576993moz-txt-link-freetext"
href="https://paste.fedoraproject.org/489159/97176147/"
target="_blank">https://paste.fedoraproject.or<wbr>g/489159/97176147/</a><br>
<br>
</div>
</blockquote>
<div><br>
</div>
</div>
</div>
<div>This one looks fine,</div>
<div>now the point is why the OVF_STORE doesn't
seams so,</div>
<div><br>
</div>
<div>Could you please try executing:</div>
<div>sudo -u vdsm dd if=<span
style="font-size:12.8px">/rhev/data-center/mnt/</span><span
style="font-size:12.8px">glust<wbr>erSD/10.70.36.79:_engine/</span><span
style="font-size:12.8px">27f05<wbr>4c3-c245-4039-b42a-</span><span
style="font-size:12.8px">c28b3704301<wbr>6/images/fdf49778-</span><span
style="font-size:12.8px">9a06-49c6-<wbr>bf7a-a0f12425911c/</span><span
style="font-size:12.8px">8c954add-<wbr>6bcf-47f8-ac2e-</span><span
style="font-size:12.8px">4c85fc3f8699</span> |
tar -xOvf -
21e0e248-19bf-47b3-b72f-<wbr>6a3740d9ff43.ovf<br>
</div>
<div>
<div class="gmail-h5">
<div> </div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>OK, found: it's definitively a regression introduced
(just on master) with <a moz-do-not-send="true"
href="https://gerrit.ovirt.org/#/c/66103/">https://gerrit.ov...
<div><br>
</div>
<div>The code that read and writes the HE configuration and
the OVF_STORE from/to the volume on shared storage (with
dd) is shared between ovirt-hosted-engine-setup (which
runs as root) and ovirt-ha-agent (which runs as vdsm
user).</div>
<div>In order to behave correctly (against root squash on
NFC for instance), it was always running 'sudo -u vdsm dd
...' instead of just 'dd' regardless of the active
user.</div>
<div>But since <a moz-do-not-send="true"
href="https://gerrit.ovirt.org/#/c/66103/">https://gerrit.ov...
, vdsm user cannot run anymore 'sudo -u vdsm dd ...' and
so this issue.</div>
<div><br>
</div>
<div>I opened a bug:</div>
<div><a moz-do-not-send="true"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1398443">h...
</div>
</div>
</div>
</div>
</blockquote>
Is it the same reason why vm.conf file is missing from
/var/run/ovirt-hosted-engine-ha folder ?<br>
<blockquote
cite="mid:CAN8-ONo7LBndPE9PCkai5mNWONEVZwFC-EjiMHuc-q3n2=wzgw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Kasturi, you got it just on one of your hosts since the
other two run an older VDSM build.<br>
</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>
<div class="gmail-h5">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Simone, i ran the
following steps and i see that vm.conf file
is missing.<br>
<br>
1) Installed three nodes with glusterfs as
storage domains.<br>
<br>
2) Configured power management on all the
nodes and ran the command 'poweroff -f' on
one of the node.<br>
<br>
3) Engine tried to fence the host and it
logged the message 'Host is rebooting' but
this never came up.<br>
<br>
4) After some time i manually powered on the
host and once the host is up i see that
Hosted Engine HA : Not active.<br>
<br>
5) I see the following error message in
broker.log and vm.conf file is missing.<br>
<br>
Thread-1::ERROR::2016-11-24
19:32:11,651::listener::192::o<wbr>virt_hosted_engine_ha.broker.l<wbr>istener.ConnectionHandler::(ha<wbr>ndle)
Error handling request, data:
'set-storage-domain<br>
FilesystemBackend dom_type=glusterfs
sd_uuid=27f054c3-c245-4039-b42<wbr>a-c28b37043016'<br>
Traceback (most recent call last):<br>
File
"/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/<wbr>broker/listener.py",
line 166, in handle<br>
data)<br>
File
"/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/<wbr>broker/listener.py",
line 299, in _dispatch<br>
.set_storage_domain(client, sd_type,
**options)<br>
File
"/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/<wbr>broker/storage_broker.py",
line 66, in set_storage_domain<br>
self._backends[client].connect<wbr>()<br>
File
"/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/<wbr>lib/storage_backends.py",
line 461, in connect<br>
self._dom_type)<br>
File
"/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/<wbr>lib/storage_backends.py",
line 108, in get_domain_path<br>
" in {1}".format(sd_uuid, parent))<br>
BackendFailureException: path to storage
domain 27f054c3-c245-4039-b42a-c28b37<wbr>043016
not found in
/rhev/data-center/mnt/glusterS<wbr>D<br>
<br>
Thanks<br>
kasturi<span
class="gmail-m_8283562645140910209gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-"><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div
class="gmail_quote">
<div><br>
</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF"><span
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-">
<blockquote
type="cite">
<div dir="ltr">
<div
class="gmail_extra">
<div
class="gmail_quote">
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">
<p>[1] <a
moz-do-not-send="true"
class="gmail-m_8283562645140910209gmail-m_4061716222045576993gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext"
href="http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/vm_conf...
target="_blank">http://rhsqe-repo.lab.eng.blr.<wbr>red...
<p>Thanks</p>
<p>kasturi<br>
</p>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<p><br>
</p>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<p><br>
</p>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<p><br>
</p>
</span></div>
</blockquote>
</div>
</div>
</div>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>
--------------F9EAA1F1E33BF85629810339--