<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Hello,</p>
    <p>I also put host in Maintenance and restarted vdsm while
      ovirt-ha-agent is running. I can mount the gluster Volume "engine"
      manually in the host.<br>
    </p>
    <p>I get this repeatedly in /var/log/vdsm.log:</p>
    <p><tt>2017-02-03 15:29:28,891 INFO  (MainThread) [vds] Exiting
        (vdsm:167)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:30,974 INFO  (MainThread) [vds] (PID:
        11456) I am the actual vdsm 4.19.4-1.el7.centos microcloud27
        (3.10.0-514.6.1.el7.x86_64) (vdsm:145)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:30,974 INFO  (MainThread) [vds] VDSM
        will run with cpu affinity: frozenset([1]) (vdsm:251)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,013 INFO  (MainThread)
        [storage.check] Starting check service (check:91)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,017 INFO  (MainThread)
        [storage.Dispatcher] Starting StorageDispatcher...
        (dispatcher:47)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,017 INFO  (check/loop)
        [storage.asyncevent] Starting &lt;EventLoop running=True
        closed=False at 0x37480464&gt; (asyncevent:122)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,156 INFO  (MainThread) [dispatcher]
        Run and protect:
        registerDomainStateChangeCallback(callbackFunc=&lt;functools.partial
        object at 0x2881fc8&gt;) (logUtils:49)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,156 INFO  (MainThread) [dispatcher]
        Run and protect: registerDomainStateChangeCallback, Return
        response: None (logUtils:52)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,160 INFO  (MainThread) [MOM]
        Preparing MOM interface (momIF:49)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,161 INFO  (MainThread) [MOM] Using
        named unix socket /var/run/vdsm/mom-vdsm.sock (momIF:58)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,162 INFO  (MainThread) [root]
        Unregistering all secrets (secret:91)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,164 INFO  (MainThread) [vds] Setting
        channels' timeout to 30 seconds. (vmchannels:223)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,165 INFO  (MainThread)
        [vds.MultiProtocolAcceptor] Listening at :::54321
        (protocoldetector:185)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,354 INFO  (vmrecovery) [vds]
        recovery: completed in 0s (clientIF:495)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,371 INFO  (BindingXMLRPC) [vds]
        XMLRPC server running (bindingxmlrpc:63)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,471 INFO  (periodic/1) [dispatcher]
        Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,472 INFO  (periodic/1) [dispatcher]
        Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,472 WARN  (periodic/1) [MOM] MOM not
        available. (momIF:116)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,473 WARN  (periodic/1) [MOM] MOM not
        available, KSM stats will be missing. (momIF:79)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:31,474 ERROR (periodic/1) [root] failed
        to retrieve Hosted Engine HA info (api:252)</tt><tt><br>
      </tt><tt>Traceback (most recent call last):</tt><tt><br>
      </tt><tt>  File
        "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231,
        in _getHaInfo</tt><tt><br>
      </tt><tt>    stats = instance.get_all_stats()</tt><tt><br>
      </tt><tt>  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
        line 103, in get_all_stats</tt><tt><br>
      </tt><tt>    self._configure_broker_conn(broker)</tt><tt><br>
      </tt><tt>  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
        line 180, in _configure_broker_conn</tt><tt><br>
      </tt><tt>    dom_type=dom_type)</tt><tt><br>
      </tt><tt>  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
        line 177, in set_storage_domain</tt><tt><br>
      </tt><tt>    .format(sd_type, options, e))</tt><tt><br>
      </tt><tt>RequestError: Failed to set storage domain
        FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid':
        '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed:
        &lt;class 'ovirt_hos</tt><tt><br>
      </tt><tt>ted_engine_ha.lib.storage_backends.BackendFailureException'&gt;</tt><tt><br>
      </tt><tt>2017-02-03 15:29:35,920 INFO  (Reactor thread)
        [ProtocolDetector.AcceptorImpl] Accepted connection from
        ::1:49506 (protocoldetector:72)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:35,929 INFO  (Reactor thread)
        [ProtocolDetector.Detector] Detected protocol stomp from
        ::1:49506 (protocoldetector:127)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:35,930 INFO  (Reactor thread)
        [Broker.StompAdapter] Processing CONNECT request
        (stompreactor:102)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:35,930 INFO  (JsonRpc (StompReactor))
        [Broker.StompAdapter] Subscribe command received
        (stompreactor:129)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:36,067 INFO  (jsonrpc/0)
        [jsonrpc.JsonRpcServer] RPC call Host.ping succeeded in 0.00
        seconds (__init__:515)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:36,071 INFO  (jsonrpc/1) [throttled]
        Current getAllVmStats: {} (throttledlog:105)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:36,071 INFO  (jsonrpc/1)
        [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in
        0.00 seconds (__init__:515)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:46,435 INFO  (periodic/0) [dispatcher]
        Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:46,435 INFO  (periodic/0) [dispatcher]
        Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:46,439 ERROR (periodic/0) [root] failed
        to retrieve Hosted Engine HA info (api:252)</tt><tt><br>
      </tt><tt>Traceback (most recent call last):</tt><tt><br>
      </tt><tt>  File
        "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231,
        in _getHaInfo</tt><tt><br>
      </tt><tt>    stats = instance.get_all_stats()</tt><tt><br>
      </tt><tt>  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
        line 103, in get_all_stats</tt><tt><br>
      </tt><tt>    self._configure_broker_conn(broker)</tt><tt><br>
      </tt><tt>  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
        line 180, in _configure_broker_conn</tt><tt><br>
      </tt><tt>    dom_type=dom_type)</tt><tt><br>
      </tt><tt>  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
        line 177, in set_storage_domain</tt><tt><br>
      </tt><tt>    .format(sd_type, options, e))</tt><tt><br>
      </tt><tt>RequestError: Failed to set storage domain
        FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid':
        '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed:
        &lt;class 'ovirt_hos</tt><tt><br>
      </tt><tt>ted_engine_ha.lib.storage_backends.BackendFailureException'&gt;</tt><tt><br>
      </tt><tt>2017-02-03 15:29:51,095 INFO  (jsonrpc/2)
        [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in
        0.00 seconds (__init__:515)</tt><tt><br>
      </tt><tt>2017-02-03 15:29:51,219 INFO  (jsonrpc/3)
        [jsonrpc.JsonRpcServer] RPC call Host.setKsmTune succeeded in
        0.00 seconds (__init__:515)</tt><tt><br>
      </tt><tt>2017-02-03 15:30:01,444 INFO  (periodic/1) [dispatcher]
        Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br>
      </tt><tt>2017-02-03 15:30:01,444 INFO  (periodic/1) [dispatcher]
        Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br>
      </tt><tt>2017-02-03 15:30:01,448 ERROR (periodic/1) [root] failed
        to retrieve Hosted Engine HA info (api:252)</tt><br>
      <br>
    </p>
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">Am 03.02.2017 um 13:39 schrieb Simone
      Tiraboschi:<br>
    </div>
    <blockquote
cite="mid:CAN8-ONrThxOsyRJRkPXVK8=Tot0OVW+bbN7pY2gJD4SihDxzHw@mail.gmail.com"
      type="cite">
      <div dir="ltr">I see there an ERROR on stopMonitoringDomain but I
        cannot see the correspondent  startMonitoringDomain; could you
        please look for it?</div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Fri, Feb 3, 2017 at 1:16 PM, Ralf
          Schenk <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:rs@databay.de" target="_blank">rs@databay.de</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000">
              <p>Hello,</p>
              <p>attached is my vdsm.log from the host with
                hosted-engine-ha around the time-frame of agent timeout
                that is not working anymore for engine (it works in
                Ovirt and is active). It simply isn't working for
                engine-ha anymore after Update.</p>
              <p>At 2017-02-02 19:25:34,248 you'll find an error
                corresponoding to agent timeout error.</p>
              <p>Bye<br>
              </p>
              <div>
                <div class="h5">
                  <p><br>
                  </p>
                  <br>
                  <div class="m_-5371711976759655950moz-cite-prefix">Am
                    03.02.2017 um 11:28 schrieb Simone Tiraboschi:<br>
                  </div>
                  <blockquote type="cite">
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote"><span>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <div bgcolor="#FFFFFF" text="#000000">
                                  <p>3. Three of my hosts have the
                                    hosted engine deployed for ha. First
                                    all three where marked by a crown
                                    (running was gold and others where
                                    silver). After upgrading the 3 Host
                                    deployed hosted engine ha is not
                                    active anymore.</p>
                                  <p>I can't get this host back with
                                    working ovirt-ha-agent/broker. I
                                    already rebooted, manually restarted
                                    the services but It isn't able to
                                    get cluster state according to <br>
                                    "hosted-engine --vm-status". The
                                    other hosts state the host status as
                                    "unknown stale-data"</p>
                                  <p>I already shut down all agents on
                                    all hosts and issued a
                                    "hosted-engine
                                    --reinitialize-lockspace" but that
                                    didn't help.<br>
                                  </p>
                                  <p>Agents stops working after a
                                    timeout-error according to log:</p>
                                  <p><tt>MainThread::INFO::2017-02-02
                                      19:24:52,040::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
                                      VDSM domain monitor status:
                                      PENDING</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:24:59,185::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
                                      VDSM domain monitor status:
                                      PENDING</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:25:06,333::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
                                      VDSM domain monitor status:
                                      PENDING</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:25:13,554::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
                                      VDSM domain monitor status:
                                      PENDING</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:25:20,710::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
                                      VDSM domain monitor status:
                                      PENDING</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:25:27,865::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
                                      VDSM domain monitor status:
                                      PENDING</tt><tt><br>
                                    </tt><tt>MainThread::ERROR::2017-02-02
                                      19:25:27,866::hosted_engine::8<wbr>15::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_domain_monitor)
                                      Failed to start monitoring domain
                                      (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96,
                                      host_id=3): timeout during domain
                                      acquisition</tt><tt><br>
                                    </tt><tt>MainThread::WARNING::2017-02-0<wbr>2
                                      19:25:27,866::hosted_engine::4<wbr>69::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)
                                      Error while monitoring engine:
                                      Failed to start monitoring domain
                                      (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96,
                                      host_id=3): timeout during domain
                                      acquisition</tt><tt><br>
                                    </tt><tt>MainThread::WARNING::2017-02-0<wbr>2
                                      19:25:27,866::hosted_engine::4<wbr>72::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)
                                      Unexpected error</tt><tt><br>
                                    </tt><tt>Traceback (most recent call
                                      last):</tt><tt><br>
                                    </tt><tt>  File
                                      "/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/hosted_engine.py",
                                      line 443, in start_monitoring</tt><tt><br>
                                    </tt><tt>   
                                      self._initialize_domain_monito<wbr>r()</tt><tt><br>
                                    </tt><tt>  File
                                      "/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/hosted_engine.py",
                                      line 816, in
                                      _initialize_domain_monitor</tt><tt><br>
                                    </tt><tt>    raise Exception(msg)</tt><tt><br>
                                    </tt><tt>Exception: Failed to start
                                      monitoring domain
                                      (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96,
                                      host_id=3): timeout during domain
                                      acquisition</tt><tt><br>
                                    </tt><tt>MainThread::ERROR::2017-02-02
                                      19:25:27,866::hosted_engine::4<wbr>85::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)
                                      Shutting down the agent because of
                                      3 failures in a row!</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:25:32,087::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
                                      VDSM domain monitor status:
                                      PENDING</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:25:34,250::hosted_engine::7<wbr>69::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_stop_domain_monitor)
                                      Failed to stop monitoring domain
                                      (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96):
                                      Storage domain is member of pool:
                                      u'domain=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96'</tt><tt><br>
                                    </tt><tt>MainThread::INFO::2017-02-02
                                      19:25:34,254::agent::143::ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(run)
                                      Agent shutting down</tt></p>
                                </div>
                              </blockquote>
                            </span>
                            <div>Simone, Martin, can you please follow
                              up on this?</div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                    <div>Ralph, could you please attach vdsm logs from
                      on of your hosts for the relevant time frame?</div>
                  </blockquote>
                  <br>
                </div>
              </div>
              <span class="">
                <div class="m_-5371711976759655950moz-signature">-- <br>
                  <p> </p>
                  <table border="0" cellpadding="0" cellspacing="0">
                    <tbody>
                      <tr>
                        <td colspan="3"><img
                            src="cid:part2.442CE625.84474DBE@databay.de"
                            height="30" border="0" width="151"></td>
                      </tr>
                      <tr>
                        <td valign="top"> <font face="Verdana, Arial,
                            sans-serif" size="-1"><br>
                            <b>Ralf Schenk</b><br>
                            fon <a moz-do-not-send="true"
                              href="tel:+49%202405%20408370"
                              value="+492405408370" target="_blank">+49
                              (0) 24 05 / 40 83 70</a><br>
                            fax <a moz-do-not-send="true"
                              href="tel:+49%202405%204083759"
                              value="+4924054083759" target="_blank">+49
                              (0) 24 05 / 40 83 759</a><br>
                            mail <a moz-do-not-send="true"
                              href="mailto:rs@databay.de"
                              target="_blank"><font color="#FF0000"><b>rs@databay.de</b></font></a><br>
                          </font> </td>
                        <td width="30"> </td>
                        <td valign="top"> <font face="Verdana, Arial,
                            sans-serif" size="-1"><br>
                            <b>Databay AG</b><br>
                            Jens-Otto-Krag-Straße 11<br>
                            D-52146 Würselen<br>
                            <a moz-do-not-send="true"
                              href="http://www.databay.de"
                              target="_blank"><font color="#FF0000"><b>www.databay.de</b></font></a>
                          </font> </td>
                      </tr>
                      <tr>
                        <td colspan="3" valign="top"> <font
                            face="Verdana, Arial, sans-serif" size="1"><br>
                            Sitz/Amtsgericht Aachen • HRB:8437 •
                            USt-IdNr.: DE 210844202<br>
                            Vorstand: Ralf Schenk, Dipl.-Ing. Jens
                            Conze, Aresch Yavari, Dipl.-Kfm. Philipp
                            Hermanns<br>
                            Aufsichtsratsvorsitzender: Wilhelm Dohmen </font>
                        </td>
                      </tr>
                    </tbody>
                  </table>
                  <hr color="#000000" noshade="noshade" size="1"
                    width="100%"> </div>
              </span></div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
    <div class="moz-signature">-- <br>
      <p>
      </p>
      <table border="0" cellpadding="0" cellspacing="0">
        <tbody>
          <tr>
            <td colspan="3"><img
                src="cid:part7.257995E9.651557B2@databay.de" height="30"
                border="0" width="151"></td>
          </tr>
          <tr>
            <td valign="top"> <font face="Verdana, Arial, sans-serif"
                size="-1"><br>
                <b>Ralf Schenk</b><br>
                fon +49 (0) 24 05 / 40 83 70<br>
                fax +49 (0) 24 05 / 40 83 759<br>
                mail <a href="mailto:rs@databay.de"><font
                    color="#FF0000"><b>rs@databay.de</b></font></a><br>
              </font> </td>
            <td width="30"> </td>
            <td valign="top"> <font face="Verdana, Arial, sans-serif"
                size="-1"><br>
                <b>Databay AG</b><br>
                Jens-Otto-Krag-Straße 11<br>
                D-52146 Würselen<br>
                <a href="http://www.databay.de"><font color="#FF0000"><b>www.databay.de</b></font></a>
              </font> </td>
          </tr>
          <tr>
            <td colspan="3" valign="top"> <font face="Verdana, Arial,
                sans-serif" size="1"><br>
                Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE
                210844202<br>
                Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch
                Yavari, Dipl.-Kfm. Philipp Hermanns<br>
                Aufsichtsratsvorsitzender: Wilhelm Dohmen </font> </td>
          </tr>
        </tbody>
      </table>
      <hr color="#000000" noshade="noshade" size="1" width="100%">
    </div>
  </body>
</html>