<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 24, 2016 at 3:06 PM, knarra <span dir="ltr">&lt;<a href="mailto:knarra@redhat.com" target="_blank">knarra@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF"><div><div class="gmail-h5">
    <div class="gmail-m_-1038946535864355601moz-cite-prefix">On 11/24/2016 07:27 PM, Simone
      Tiraboschi wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Thu, Nov 24, 2016 at 2:39 PM,
            knarra <span dir="ltr">&lt;<a href="mailto:knarra@redhat.com" target="_blank">knarra@redhat.com</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-m_-1038946535864355601gmail-">
                  <div class="gmail-m_-1038946535864355601gmail-m_-2672125017527102966moz-cite-prefix">On
                    11/24/2016 06:56 PM, Simone Tiraboschi wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr"><br>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Thu, Nov 24, 2016 at
                          2:08 PM, knarra <span dir="ltr">&lt;<a href="mailto:knarra@redhat.com" target="_blank">knarra@redhat.com</a>&gt;</span>
                          wrote:<br>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF">
                              <div>
                                <div class="gmail-m_-1038946535864355601gmail-m_-2672125017527102966h5">
                                  <div class="gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-cite-prefix">On
                                    11/24/2016 06:15 PM, Simone
                                    Tiraboschi wrote:<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr"><br>
                                      <div class="gmail_extra"><br>
                                        <div class="gmail_quote">On Thu,
                                          Nov 24, 2016 at 1:26 PM,
                                          knarra <span dir="ltr">&lt;<a href="mailto:knarra@redhat.com" target="_blank">knarra@redhat.com</a>&gt;</span>
                                          wrote:<br>
                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
                                            <br>
                                                I have three nodes with
                                            glusterfs as storage domain.
                                            For some reason i see that
                                            vm.conf from
                                            /var/run/ovirt-hosted-engine-h<wbr>a
                                            is missing and due to this
                                            on one of my host i see that
                                            Hosted Engine HA : Not
                                            Active. Once i copy the file
                                            from some other node and
                                            restart ovirt-ha-broker and
                                            ovirt-ha-agent services
                                            everything works fine. But
                                            then this happens again. Can
                                            some please help me identify
                                            why this happens. Below is
                                            the log i see in
                                            ovirt-ha-agent.logs.<br>
                                            <br>
                                            <br>
                                            <a href="https://paste.fedoraproject.org/489120/79990345/" rel="noreferrer" target="_blank">https://paste.fedoraproject.or<wbr>g/489120/79990345/</a><br>
                                            <br>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          Once the engine correctly
                                          imported the hosted-engine
                                          storage domain, a couple of
                                          OVF_STORE volumes will appear
                                          there.<br>
                                          Every modification to the
                                          engine VM configuration will
                                          be written by the engine into
                                          that OVF_STORE, so all the
                                          ovirt-ha-agent running on the
                                          hosted-engine hosts will be
                                          able to re-start the engine VM
                                          with a coherent configuration.<br>
                                          <br>
                                          Till the engine imports the
                                          hosted-engine storage domain,
                                          ovirt-ha-agent will fall back
                                          to the initial vm.conf.<br>
                                          <br>
                                          In you case the OVF_STORE
                                          volume is there,<br>
                                          but the agent fails extracting
                                          the engine VM configuration:<br>
                                          MainThread::<a class="gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext">INFO::2016-11-24</a>
                                          17:55:04,914::ovf_store::112::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
                                          Extracting Engine VM OVF from
                                          the OVF_STORE<br>
                                          MainThread::<a class="gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext">INFO::2016-11-24</a>
                                          17:55:04,919::ovf_store::119::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
                                          OVF_STORE volume path:
                                          /rhev/data-center/mnt/glusterS<wbr>D/10.70.36.79:_engine/27f054c3<wbr>-c245-4039-b42a-c28b37043016/i<wbr>mages/fdf49778-9a06-49c6-bf7a-<wbr>a0f12425911c/8c954add-6bcf-<wbr>47f8-ac2e-4c85fc3f8699<br>
                                          MainThread::ERROR::2016-11-24
                                          17:55:04,928::ovf_store::124::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)
                                          Unable to extract HEVM OVF<br>
                                          <br>
                                          So it tries to rollback to the
                                          initial vm.conf, but also that
                                          one seams to miss some values
                                          and so the agent is failing:<br>
                                          MainThread::ERROR::2016-11-24
                                          17:55:04,974::agent::205::ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(_run_agent)
                                          Error: &#39;&#39;Configuration value
                                          not found:
                                          file=/var/run/ovirt-hosted-eng<wbr>ine-ha/vm.conf,
                                          key=memSize&#39;&#39; - trying to
                                          restart agent<br>
                                          <br>
                                          Both of the issue seams
                                          storage related, could yuo
                                          please share your gluster
                                          logs?<br>
                                          <br>
                                           
                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>
                                            Thanks<br>
                                            <br>
                                            kasturi<br>
                                            <br>
                                          </blockquote>
                                        </div>
                                        <br>
                                      </div>
                                    </div>
                                  </blockquote>
                                </div>
                              </div>
                              <p>Hi Simone,</p>
                              <p>    Below [1] is the link for the
                                sosreports on the first two hosts. The
                                third host has some issue. Once it is up
                                will give the sosreport from there as
                                well.</p>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>And the host where you see the initial
                            issue was the third one? <br>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span> It is on the first host.<span class="gmail-m_-1038946535864355601gmail-"><br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div> </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span></div>
            </blockquote>
            <div><br>
            </div>
            <div>It seams that host1 is failing reading from the the
              hosted-engine storage domain:</div>
            <div><br>
            </div>
            <div>
              <div>[2016-11-24 12:33:43.678467] W [MSGID: 114031]
                [client-rpc-fops.c:2938:<wbr>client3_3_lookup_cbk]
                0-engine-client-2: remote operation failed. Path: /
                (00000000-0000-0000-0000-<wbr>000000000001) [Transport
                endpoint is not connected]</div>
              <div>[2016-11-24 12:33:43.678747] E
                [rpc-clnt.c:365:saved_frames_<wbr>unwind] (--&gt;
                /lib64/libglusterfs.so.0(_gf_<wbr>log_callingfn+0x192)[<wbr>0x7f077eba1642]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_unwind+0x1de)[<wbr>0x7f077e96775e]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_destroy+0xe)[<wbr>0x7f077e96786e]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[<wbr>0x7f077e968fc4]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
                ))))) 0-engine-client-2: forced unwinding frame
                type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
                12:33:07.495178 (xid=0x82a1c)</div>
              <div>[2016-11-24 12:33:43.678982] E
                [rpc-clnt.c:365:saved_frames_<wbr>unwind] (--&gt;
                /lib64/libglusterfs.so.0(_gf_<wbr>log_callingfn+0x192)[<wbr>0x7f077eba1642]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_unwind+0x1de)[<wbr>0x7f077e96775e]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_destroy+0xe)[<wbr>0x7f077e96786e]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[<wbr>0x7f077e968fc4]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
                ))))) 0-engine-client-2: forced unwinding frame
                type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
                12:33:08.770637 (xid=0x82a1d)</div>
              <div>[2016-11-24 12:33:43.679001] W [MSGID: 114031]
                [client-rpc-fops.c:2938:<wbr>client3_3_lookup_cbk]
                0-engine-client-2: remote operation failed. Path:
/27f054c3-c245-4039-b42a-<wbr>c28b37043016/images/39960f40-<wbr>4aae-4714-ba73-1637785fae7c/<wbr>38fa3519-f21e-4671-8c69-<wbr>d1497ff8a490
                (1090c25b-9c90-434e-a133-<wbr>faf9647cc992) [Transport
                endpoint is not connected]</div>
              <div>[2016-11-24 12:33:43.679303] E
                [rpc-clnt.c:365:saved_frames_<wbr>unwind] (--&gt;
                /lib64/libglusterfs.so.0(_gf_<wbr>log_callingfn+0x192)[<wbr>0x7f077eba1642]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_unwind+0x1de)[<wbr>0x7f077e96775e]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_destroy+0xe)[<wbr>0x7f077e96786e]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[<wbr>0x7f077e968fc4]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
                ))))) 0-engine-client-2: forced unwinding frame
                type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-11-24
                12:33:11.096856 (xid=0x82a1e)</div>
              <div>[2016-11-24 12:33:43.679596] E
                [rpc-clnt.c:365:saved_frames_<wbr>unwind] (--&gt;
                /lib64/libglusterfs.so.0(_gf_<wbr>log_callingfn+0x192)[<wbr>0x7f077eba1642]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_unwind+0x1de)[<wbr>0x7f077e96775e]
                (--&gt;
                /lib64/libgfrpc.so.0(saved_<wbr>frames_destroy+0xe)[<wbr>0x7f077e96786e]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>connection_cleanup+0x84)[<wbr>0x7f077e968fc4]
                (--&gt;
                /lib64/libgfrpc.so.0(rpc_clnt_<wbr>notify+0x120)[0x7f077e9698a0]
                ))))) 0-engine-client-2: forced unwinding frame
                type(GF-DUMP) op(NULL(2)) called at 2016-11-24
                12:33:13.673743 (xid=0x82a1f)</div>
              <div>[2016-11-24 12:33:43.682310] I
                [socket.c:3401:socket_submit_<wbr>request] 0-engine-client-2:
                not connected (priv-&gt;connected = 0)</div>
              <div>[2016-11-24 12:33:43.682328] W
                [rpc-clnt.c:1640:rpc_clnt_<wbr>submit] 0-engine-client-2:
                failed to submit rpc-request (XID: 0x82a20 Program:
                GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport
                (engine-client-2)</div>
              <div>[2016-11-24 12:33:43.682391] W
                [rpc-clnt.c:1640:rpc_clnt_<wbr>submit] 0-engine-client-2:
                failed to submit rpc-request (XID: 0x82a21 Program:
                GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
                (engine-client-2)</div>
              <div>[2016-11-24 12:33:43.682441] W
                [rpc-clnt.c:1640:rpc_clnt_<wbr>submit] 0-engine-client-2:
                failed to submit rpc-request (XID: 0x82a22 Program:
                GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
                (engine-client-2)</div>
              <div>[2016-11-24 12:33:43.682441] W [MSGID: 114031]
                [client-rpc-fops.c:2938:<wbr>client3_3_lookup_cbk]
                0-engine-client-2: remote operation failed. Path:
                /27f054c3-c245-4039-b42a-<wbr>c28b37043016
                (a64398f5-3fa3-48fe-9d40-<wbr>d3860876cc2c) [Transport
                endpoint is not connected]</div>
              <div>[2016-11-24 12:33:43.682492] W
                [rpc-clnt-ping.c:203:rpc_clnt_<wbr>ping_cbk]
                0-engine-client-2: socket disconnected</div>
              <div>[2016-11-24 12:33:43.682536] I [MSGID: 114018]
                [client.c:2280:client_rpc_<wbr>notify] 0-engine-client-2:
                disconnected from engine-client-2. Client process will
                keep trying to connect to glusterd until brick&#39;s port is
                available</div>
              <div>[2016-11-24 12:33:43.682562] W
                [rpc-clnt.c:1640:rpc_clnt_<wbr>submit] 0-engine-client-2:
                failed to submit rpc-request (XID: 0x82a23 Program:
                GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
                (engine-client-2)</div>
              <div>The message &quot;W [MSGID: 114031]
                [client-rpc-fops.c:2938:<wbr>client3_3_lookup_cbk]
                0-engine-client-2: remote operation failed. Path:
                /27f054c3-c245-4039-b42a-<wbr>c28b37043016
                (a64398f5-3fa3-48fe-9d40-<wbr>d3860876cc2c) [Transport
                endpoint is not connected]&quot; repeated 2 times between
                [2016-11-24 12:33:43.682441] and [2016-11-24
                12:33:43.682599]</div>
              <div>[2016-11-24 12:33:43.688324] W [MSGID: 114031]
                [client-rpc-fops.c:2938:<wbr>client3_3_lookup_cbk]
                0-engine-client-2: remote operation failed. Path: (null)
                (00000000-0000-0000-0000-<wbr>000000000000) [Transport
                endpoint is not connected]</div>
            </div>
            <div><br>
            </div>
            <div>Before that there was a lot of self-healing activities.</div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div></div>
    simone, these logs indicates that first host is not able to connect
    to the brick in the third host since it was powered down.  It reads
    remote operation failed on engine_client_2 .</div></blockquote><div><br></div><div>Can you please share the output of</div> source /etc/ovirt-hosted-engine/hosted-engine.conf<br> find /rhev/data-center/ -path &quot;*/${sdUUID}/images/${conf_image_UUID}/${conf_volume_UUID}&quot; -type f -exec sh -c &#39;sudo -u vdsm dd if=$1 2&gt;/dev/null | tar -xOvf - vm.conf  2&gt;/dev/null&#39; {} {} \;<div>executed on your first host?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-"><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-m_-1038946535864355601gmail-">
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF">
                              <p>[1] <a class="gmail-m_-1038946535864355601gmail-m_-2672125017527102966m_-6170493833456119351moz-txt-link-freetext" href="http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/vm_conf/" target="_blank">http://rhsqe-repo.lab.eng.blr.<wbr>redhat.com/sosreports/HC/vm_co<wbr>nf/</a></p>
                              <p>Thanks</p>
                              <p>kasturi<br>
                              </p>
                            </div>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </div>
                  </blockquote>
                  <p><br>
                  </p>
                </span></div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </span></div>

</blockquote></div><br></div></div>