<div dir="ltr"><div><div><div><div>Hi all, <br><br></div>I had put a specif email alert during the deploy and then I wanted to change it. <br></div>I did the following: <br><br></div>At one of the hosts ra: <br><br>

<span style="font-family: courier\ new, courier, monospace;">hosted-engine --set-shared-config destination-emails <a href="mailto:alerts@domain.com">alerts@domain.com</a> --type=broker<br><br></span></div><div><span style="font-family: courier\ new, courier, monospace;">systemctl restart ovirt-ha-broker.service<br><span style="font-family:arial,helvetica,sans-serif"><br></span></span></div><div><span style="font-family: courier\ new, courier, monospace;"><span style="font-family:arial,helvetica,sans-serif">I had to do the above since changing the email from GUI did not have any effect. <br><br>After the above the emails are received at the new email address but the cluster seems to have some issue recognizing the state of engine. i am flooded with emails that &quot;
<font size="2"><span style="font-size:11pt">EngineMaybeAway-EngineUnexpectedlyDown</span></font>

&quot;</span><br><br></span></div><div>I have restarted at each host also the ovirt-ha-agent.service. <br></div><div>Did put the cluster to global maintenance and then disabled global maintenance. <br><br></div><div>host agent logs I have: <br><br><span style="font-family:monospace,monospace">MainThread::ERROR::2018-02-18 11:12:20,751::hosted_engine::720::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) cannot get lock on host id 1: host already holds lock on a different host id</span><br><br></div><div>One other host logs: <br></div><div><span style="font-family:monospace,monospace">MainThread::INFO::2018-02-18 11:20:23,692::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Sun Feb 18 11:15:13 2018<br>MainThread::INFO::2018-02-18 11:20:23,692::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0)</span><br><br></div><div>The engine status on 3 hosts is: <br><span style="font-family:monospace,monospace">hosted-engine --vm-status<br><br><br>--== Host 1 status ==--<br><br>conf_on_shared_storage             : True<br>Status up-to-date                  : True<br>Hostname                           : v0<br>Host ID                            : 1<br>Engine status                      : {&quot;reason&quot;: &quot;vm not running on this host&quot;, &quot;health&quot;: &quot;bad&quot;, &quot;vm&quot;: &quot;down&quot;, &quot;detail&quot;: &quot;unknown&quot;}<br>Score                              : 0<br>stopped                            : False<br>Local maintenance                  : False<br>crc32                              : cfd15dac<br>local_conf_timestamp               : 4721144<br>Host timestamp                     : 4721144<br>Extra metadata (valid at timestamp):<br>        metadata_parse_version=1<br>        metadata_feature_version=1<br>        timestamp=4721144 (Sun Feb 18 11:20:33 2018)<br>        host-id=1<br>        score=0<br>        vm_conf_refresh_time=4721144 (Sun Feb 18 11:20:33 2018)<br>        conf_on_shared_storage=True<br>        maintenance=False<br>        state=EngineUnexpectedlyDown<br>        stopped=False<br>        timeout=Tue Feb 24 15:29:44 1970<br><br><br>--== Host 2 status ==--<br><br>conf_on_shared_storage             : True<br>Status up-to-date                  : True<br>Hostname                           : v1<br>Host ID                            : 2<br>Engine status                      : {&quot;reason&quot;: &quot;vm not running on this host&quot;, &quot;health&quot;: &quot;bad&quot;, &quot;vm&quot;: &quot;down&quot;, &quot;detail&quot;: &quot;unknown&quot;}<br>Score                              : 0<br>stopped                            : False<br>Local maintenance                  : False<br>crc32                              : 5cbcef4c<br>local_conf_timestamp               : 2499416<br>Host timestamp                     : 2499416<br>Extra metadata (valid at timestamp):<br>        metadata_parse_version=1<br>        metadata_feature_version=1<br>        timestamp=2499416 (Sun Feb 18 11:20:46 2018)<br>        host-id=2<br>        score=0<br>        vm_conf_refresh_time=2499416 (Sun Feb 18 11:20:46 2018)<br>        conf_on_shared_storage=True<br>        maintenance=False<br>        state=EngineUnexpectedlyDown<br>        stopped=False<br>        timeout=Thu Jan 29 22:18:42 1970<br><br><br>--== Host 3 status ==--<br><br>conf_on_shared_storage             : True<br>Status up-to-date                  : False<br>Hostname                           : v2<br>Host ID                            : 3<br>Engine status                      : unknown stale-data<br>Score                              : 3400<br>stopped                            : False<br>Local maintenance                  : False<br>crc32                              : f064d529<br>local_conf_timestamp               : 2920612<br>Host timestamp                     : 2920611<br>Extra metadata (valid at timestamp):<br>        metadata_parse_version=1<br>        metadata_feature_version=1<br>        timestamp=2920611 (Sun Feb 18 10:47:31 2018)<br>        host-id=3<br>        score=3400<br>        vm_conf_refresh_time=2920612 (Sun Feb 18 10:47:32 2018)<br>        conf_on_shared_storage=True<br>        maintenance=False<br>        state=GlobalMaintenance<br>        stopped=False<br></span><br><br></div><div>Putting each host at maintenance then activating them back does not resolve the issue. Seems I have to avoid defining email address during deploy and have it set only later at GUI. <br><br></div><div>How one can recover from this situation?<br><br><br></div><div>Thanx, <br></div><div>Alex<br></div></div>